Difference between revisions of "FlyBase:ID Validator"

From FlyBase Wiki
Jump to navigation Jump to search
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
===Overview===
 
===Overview===
  
You can use the ID Converter tool to either:
+
This tool will accept a list of FlyBase symbols/IDs (for any data type) and, where necessary/possible, update them to their current versions. It will also convert certain external IDs (GenBank nucleotide/protein accessions, UniProt accessions, PubMed IDs) into their equivalent FlyBase IDs. The output is provided as a validation table that can either be downloaded as a file or exported to a FlyBase HitList for futher processing (including conversion between data types).
  
(i) ''validate'' a set of symbols/IDs, which will update any old symbols/IDs to their current equivalents (where possible); or
+
===Usage===
  
(ii) ''validate and convert'' a set of symbols/IDs, which will additionally convert the submitted list from one data class to another (where feasible), such as converting a list of allele or transcript IDs to their corresponding gene IDs.
+
1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols' box, or choose to 'Upload File of Identifiers' by clicking the Browse button.  Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators).  The supported input types include:
 +
* FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs)
 +
* FlyBase symbols (for most data classes)
 +
* FlyBase gene annotation symbols (eg. CG# or CR# for ''D. melanogaster'')
 +
* PubMed IDs (with or without a 'PMID' prefix)
 +
* GenBank nucleotide/protein accessions
 +
* UniProt (Swiss-Prot/TrEMBL) accessions
  
This tool can also be used to simply upload ID lists, as the set of validated/converted IDs can be exported to a HitList for further analysis/processing within FlyBase.
 
  
 +
2. Choose whether to 'Return non-''melanogaster''' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than ''D. melanogaster'') and whether to 'Match synonyms' (i.e. include ID/symbol synonyms when matching the submitted and FlyBase entries.) The default setting is to 'match synonyms' but '''not''' 'Return non-''melanogaster'' matches'.
  
===Usage===
 
  
1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols:' box, or choose to Upload an file of IDs by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators).  The supported input types include:
+
3. Click on the 'Submit Query' button.
* FlyBase IDs (for most data classes)
 
* FlyBase symbols (for most data classes)
 
* FlyBase gene annotation symbols (CG#)
 
* clone names
 
* PubMed IDs
 
* GenBank nucleotide/protein accessions
 
* Uniprot (Swiss-Prot/TrEMBL) accessions
 
  
2. Choose to 'Validate Only' or 'Validate and Convert', choosing the desired conversion data class from the drop-down menu.  (Note that IDs/symbols pertaining to different data classes (e.g. gene and alleles) may be submitted if choosing to 'Validate Only', but will results in conversion errors if chossing to 'Validate and Convert'.) The available 'convert to' options are:
 
* Genes
 
* Alleles
 
* Aberrations
 
* Balancers
 
* Transgenic constructs
 
* Natural transposons
 
* Insertions
 
* Transcripts
 
* Polypeptides
 
* Clones
 
* References
 
Note that only a subset of all possible conversions make sense - attempting to make non-sensical conversions (e.g. transcripts -> alleles) will result in a blank output table.  A table showing common/useful conversion types is shown below.
 
  
3. Click on the 'Submit Query' button.
+
4. The resulting table has four sections:
  
4. The resulting table has three sections:
+
i) A header line listing the number of:
i) A header line listing the number of
 
 
* Submitted IDs
 
* Submitted IDs
* Validated/Updated IDs
+
* Unique Validated/Updated IDs
 
* Unknown IDs
 
* Unknown IDs
* Unique converted IDs
+
ii) Buttons to export the list of validated IDs to:
ii) Buttons to export/download the final list of converted IDs to:
+
* a FlyBase [[FlyBase:Tools_Overview#HitList_Refinement|HitList]]  (for further processing, including conversion between data types)
* a FlyBase HitList
+
* the FlyBase [[FlyBase:Batch_Download|Batch Download]] tool (to obtain and download additional data associated with each entry - '''NOTE:''' this option is enabled only for output lists comprising a single data class, such as 'genes')
* a local file of unique FB IDs only
+
iii) Buttons to download/save a file of:
* a local file of the conversion table in TSV format
+
* all unique validated IDs
iii) The conversion table, comprising 4 columns showing the:
+
* all unknown (unvalidated) IDs
* Submitted ID
+
* a TSV file of the entire validation report
* Current ID
+
iv) The validation table, comprising 4 columns showing:
* Converted ID
+
* a checkbox indicated whether that row should be included in any 'export' request (see (ii) above)
* Related record
+
* the submitted symbol/ID
 +
* the validated (current) FlyBase ID
 +
* the current FlyBase symbol, hyperlinked to the relevant FlyBase record
 +
 
 +
If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!).
 +
 
 +
The rows of the validation table are color-coded as follows:
 +
* entered symbols/IDs that match current FlyBase symbols/IDs are colored '''green'''
 +
* entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored '''yellow'''
 +
* entered symbols/IDs that were unknown/unvalidated are colored '''red'''
 +
 
 +
===Caveats/Disclaimers===
 +
 
 +
* If the 'Match synonyms' box is checked, then entering an FBgn ID or CG number that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG number that is current for one gene but a synonym of another (e.g. CG10602), will generate two separate rows in the output table - one for each matching gene. A warning will appear above the validation table and the affected rows will be marked with an exclamation mark (!).
 +
 
 +
* Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.

Latest revision as of 12:44, 15 April 2020

Overview

This tool will accept a list of FlyBase symbols/IDs (for any data type) and, where necessary/possible, update them to their current versions. It will also convert certain external IDs (GenBank nucleotide/protein accessions, UniProt accessions, PubMed IDs) into their equivalent FlyBase IDs. The output is provided as a validation table that can either be downloaded as a file or exported to a FlyBase HitList for futher processing (including conversion between data types).

Usage

1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols' box, or choose to 'Upload File of Identifiers' by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators). The supported input types include:

  • FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs)
  • FlyBase symbols (for most data classes)
  • FlyBase gene annotation symbols (eg. CG# or CR# for D. melanogaster)
  • PubMed IDs (with or without a 'PMID' prefix)
  • GenBank nucleotide/protein accessions
  • UniProt (Swiss-Prot/TrEMBL) accessions


2. Choose whether to 'Return non-melanogaster' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than D. melanogaster) and whether to 'Match synonyms' (i.e. include ID/symbol synonyms when matching the submitted and FlyBase entries.) The default setting is to 'match synonyms' but not 'Return non-melanogaster matches'.


3. Click on the 'Submit Query' button.


4. The resulting table has four sections:

i) A header line listing the number of:

  • Submitted IDs
  • Unique Validated/Updated IDs
  • Unknown IDs

ii) Buttons to export the list of validated IDs to:

  • a FlyBase HitList (for further processing, including conversion between data types)
  • the FlyBase Batch Download tool (to obtain and download additional data associated with each entry - NOTE: this option is enabled only for output lists comprising a single data class, such as 'genes')

iii) Buttons to download/save a file of:

  • all unique validated IDs
  • all unknown (unvalidated) IDs
  • a TSV file of the entire validation report

iv) The validation table, comprising 4 columns showing:

  • a checkbox indicated whether that row should be included in any 'export' request (see (ii) above)
  • the submitted symbol/ID
  • the validated (current) FlyBase ID
  • the current FlyBase symbol, hyperlinked to the relevant FlyBase record

If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!).

The rows of the validation table are color-coded as follows:

  • entered symbols/IDs that match current FlyBase symbols/IDs are colored green
  • entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored yellow
  • entered symbols/IDs that were unknown/unvalidated are colored red

Caveats/Disclaimers

  • If the 'Match synonyms' box is checked, then entering an FBgn ID or CG number that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG number that is current for one gene but a synonym of another (e.g. CG10602), will generate two separate rows in the output table - one for each matching gene. A warning will appear above the validation table and the affected rows will be marked with an exclamation mark (!).
  • Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.