Difference between revisions of "FlyBase:ID Validator"

From FlyBase Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
===Overview===
 
===Overview===
  
You can use the ID Converter tool to:
+
You can use the ID Validator tool to:
  
 
(i) ''validate'' a set of symbols/IDs, which will update any old symbols/IDs to their current equivalents (where possible); or
 
(i) ''validate'' a set of symbols/IDs, which will update any old symbols/IDs to their current equivalents (where possible); or

Revision as of 11:33, 15 April 2020

Overview

You can use the ID Validator tool to:

(i) validate a set of symbols/IDs, which will update any old symbols/IDs to their current equivalents (where possible); or

(ii) validate and convert a set of symbols/IDs, which will additionally convert the submitted list from one data class to another (where feasible), such as converting a list of allele or transcript IDs to their corresponding gene IDs OR convert certain external IDs (GenBank, UniProt, PubMed) into FlyBase IDs.

(iii) to simply upload ID lists into a HitList for further analysis/processing within FlyBase.


Usage

1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols:' box, or choose to Upload any file of IDs by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators). The supported input types include:

  • FlyBase IDs (for most data classes)
  • FlyBase symbols (for most data classes)
  • FlyBase gene annotation symbols (CG#)
  • clone symbols
  • PubMed IDs
  • GenBank nucleotide/protein accessions
  • UniProt (Swiss-Prot/TrEMBL) accessions


2. Choose to 'Validate Only' or 'Validate and Convert', choosing the desired conversion data class from the drop-down menu. (Note that IDs/symbols pertaining to different data classes (e.g. gene and alleles) may be submitted if choosing to 'Validate Only', but will result in conversion errors if choosing to 'Validate and Convert'.) The available 'convert to' options are:

  • Genes
  • Alleles
  • Aberrations
  • Balancers
  • Transgenic constructs
  • Natural transposons
  • Insertions
  • Transcripts
  • Polypeptides
  • Clones
  • References

A table showing common/useful conversion types is shown below:

Input (example) Output (example) Logic
Genes (FBgn ID or FlyBase symbol) Polypeptides (FBpp ID and symbol) List all polypeptides corresponding to each gene
Genes (FBgn ID or FlyBase symbol) Alleles (FBal ID and symbol) List all alleles corresponding to each gene
Alleles (FBal ID or FlyBase symbol) Genes (FBgn ID and FlyBase symbol) List all genes corresponding to each allele
Clones (FBcl ID or FlyBase symbol) Genes (FBgn ID and FlyBase symbol) List all genes associated with each clone
Genes (FBgn ID or FlyBase symbol) References (FBrf ID) List all references associated with each gene
References (FBrf ID or PMID) Genes (FBgn ID and FlyBase symbol) List all genes associated with each reference
References (PMID) References (FBrf ID) Convert PubMed IDs (PMIDs) to FlyBase reference (FBrf) IDs
Genes (GenBank nucleotide accession) Genes (FBgn ID and FlyBase symbol) Convert GenBank nucleotide accessions to FlyBase gene IDs
Proteins (UniProt or GenBank Protein accession) Genes (FBgn ID and FlyBase symbol) Convert external protein accessions to FlyBase gene IDs


3. Click on the 'Submit Query' button.


4. The resulting table has three sections:

i) A header line listing the number of:

  • Submitted IDs
  • Validated/Updated IDs
  • Unknown IDs
  • Unique converted IDs

ii) Buttons to export/download the final list of converted IDs to:

  • a FlyBase HitList
  • a local file of unique FB IDs only
  • a local file of the conversion table in TSV format

iii) The conversion table, comprising 4 columns showing the:

  • Submitted symbol/ID
  • Current FlyBase ID
  • Converted FlyBase ID
  • The FlyBase symbol, hyperlinked to the relevant FlyBase record


The table is color-coded as follows:

  • converted FlyBase IDs are colored green
  • other recognized (i.e. updateable/convertible) IDs/symbols are colored yellow
  • unknown (ie. unconvertible) IDs/symbols are colored red

Caveats/Disclaimers

  • Only a subset of all possible conversions make sense - attempting to make non-sensical conversions (e.g. 'transcripts' converted to 'alleles') will only result in a blank output table.
  • Entering an FBgn ID or CG annotation symbol that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG annotation symbol that is current for one gene but a synonym of another (e.g. CG10602), generates two separate rows in the output table - one for each matching gene.
  • Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.