Difference between revisions of "FlyBase:ID Validator"

From FlyBase Wiki
Jump to navigation Jump to search
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
You can use the ID Converter tool to either:
+
===Overview===
(i) ''validate'' a set of symbols/IDs, which will update any old symbols/IDs to their current equivalents (where possible); or
 
(ii) ''validate and convert'' a set of symbols/IDs, which will additionally convert the submitted list from one data class to another (where feasible), such as converting a list of allele or transcript IDs to their corresponding gene IDs.
 
  
This tool can also be used to simply upload ID lists, as the set of validated/converted IDs can be exported to a HitList for further analysis/processing within FlyBase.
+
This tool will accept a list of FlyBase symbols/IDs (for any data type) and, where necessary/possible, update them to their current versions. It will also convert certain external IDs (GenBank nucleotide/protein accessions, UniProt accessions, PubMed IDs) into their equivalent FlyBase IDs. The output is provided as a validation table that can either be downloaded as a file or exported to a FlyBase HitList for futher processing (including conversion between data types).
  
To use,
+
===Usage===
  
 +
1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols' box, or choose to 'Upload File of Identifiers' by clicking the Browse button.  Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators).  The supported input types include:
 +
* FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs)
 +
* FlyBase symbols (for most data classes)
 +
* FlyBase gene annotation symbols (eg. CG# or CR# for ''D. melanogaster'')
 +
* PubMed IDs (with or without a 'PMID' prefix)
 +
* GenBank nucleotide/protein accessions
 +
* UniProt (Swiss-Prot/TrEMBL) accessions
  
Validate Only (Update to Current IDs)
 
or
 
Validate and Convert into:
 
Genes
 
Alleles
 
Aberrations
 
Balancers
 
Transgenic constructs
 
Natural transposons
 
Insertions
 
Transcripts
 
Polypeptides
 
Clones
 
References
 
  
 +
2. Choose whether to 'Return non-''melanogaster''' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than ''D. melanogaster'') and whether to 'Match synonyms' (i.e. include ID/symbol synonyms when matching the submitted and FlyBase entries.) The default setting is to 'match synonyms' but '''not''' 'Return non-''melanogaster'' matches'.
  
  
Enter IDs or Symbols:
+
3. Click on the 'Submit Query' button.
You may enter (or upload) FlyBase IDs, symbols, annotation symbols (CG#), clone names, PubMed IDs, or GenBank/Uniprot/Swiss-Prot/TrEMBL accessions.
 
Please use spaces or returns to separate the identifiers (no commas or other text spearators).
 
  
or Upload File of IDs
+
 
 +
4. The resulting table has four sections:
 +
 
 +
i) A header line listing the number of:
 +
* Submitted IDs
 +
* Unique Validated/Updated IDs
 +
* Unknown IDs
 +
ii) Buttons to export the list of validated IDs to:
 +
* a FlyBase [[FlyBase:Tools_Overview#HitList_Refinement|HitList]]  (for further processing, including conversion between data types)
 +
* the FlyBase [[FlyBase:Batch_Download|Batch Download]] tool (to obtain and download additional data associated with each entry - '''NOTE:''' this option is enabled only for output lists comprising a single data class, such as 'genes')
 +
iii) Buttons to download/save a file of:
 +
* all unique validated IDs
 +
* all unknown (unvalidated) IDs
 +
* a TSV file of the entire validation report
 +
iv) The validation table, comprising 4 columns showing:
 +
* a checkbox indicated whether that row should be included in any 'export' request (see (ii) above)
 +
* the submitted symbol/ID
 +
* the validated (current) FlyBase ID
 +
* the current FlyBase symbol, hyperlinked to the relevant FlyBase record
 +
 
 +
If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!).
 +
 
 +
The rows of the validation table are color-coded as follows:
 +
* entered symbols/IDs that match current FlyBase symbols/IDs are colored '''green'''
 +
* entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored '''yellow'''
 +
* entered symbols/IDs that were unknown/unvalidated are colored '''red'''
 +
 
 +
===Caveats/Disclaimers===
 +
 
 +
* If the 'Match synonyms' box is checked, then entering an FBgn ID or CG number that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG number that is current for one gene but a synonym of another (e.g. CG10602), will generate two separate rows in the output table - one for each matching gene. A warning will appear above the validation table and the affected rows will be marked with an exclamation mark (!).
 +
 
 +
* Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.

Latest revision as of 12:44, 15 April 2020

Overview

This tool will accept a list of FlyBase symbols/IDs (for any data type) and, where necessary/possible, update them to their current versions. It will also convert certain external IDs (GenBank nucleotide/protein accessions, UniProt accessions, PubMed IDs) into their equivalent FlyBase IDs. The output is provided as a validation table that can either be downloaded as a file or exported to a FlyBase HitList for futher processing (including conversion between data types).

Usage

1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols' box, or choose to 'Upload File of Identifiers' by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators). The supported input types include:

  • FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs)
  • FlyBase symbols (for most data classes)
  • FlyBase gene annotation symbols (eg. CG# or CR# for D. melanogaster)
  • PubMed IDs (with or without a 'PMID' prefix)
  • GenBank nucleotide/protein accessions
  • UniProt (Swiss-Prot/TrEMBL) accessions


2. Choose whether to 'Return non-melanogaster' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than D. melanogaster) and whether to 'Match synonyms' (i.e. include ID/symbol synonyms when matching the submitted and FlyBase entries.) The default setting is to 'match synonyms' but not 'Return non-melanogaster matches'.


3. Click on the 'Submit Query' button.


4. The resulting table has four sections:

i) A header line listing the number of:

  • Submitted IDs
  • Unique Validated/Updated IDs
  • Unknown IDs

ii) Buttons to export the list of validated IDs to:

  • a FlyBase HitList (for further processing, including conversion between data types)
  • the FlyBase Batch Download tool (to obtain and download additional data associated with each entry - NOTE: this option is enabled only for output lists comprising a single data class, such as 'genes')

iii) Buttons to download/save a file of:

  • all unique validated IDs
  • all unknown (unvalidated) IDs
  • a TSV file of the entire validation report

iv) The validation table, comprising 4 columns showing:

  • a checkbox indicated whether that row should be included in any 'export' request (see (ii) above)
  • the submitted symbol/ID
  • the validated (current) FlyBase ID
  • the current FlyBase symbol, hyperlinked to the relevant FlyBase record

If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!).

The rows of the validation table are color-coded as follows:

  • entered symbols/IDs that match current FlyBase symbols/IDs are colored green
  • entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored yellow
  • entered symbols/IDs that were unknown/unvalidated are colored red

Caveats/Disclaimers

  • If the 'Match synonyms' box is checked, then entering an FBgn ID or CG number that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG number that is current for one gene but a synonym of another (e.g. CG10602), will generate two separate rows in the output table - one for each matching gene. A warning will appear above the validation table and the affected rows will be marked with an exclamation mark (!).
  • Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.