Difference between revisions of "FlyBase:ID Validator"
(→Usage) |
|||
Line 5: | Line 5: | ||
===Usage=== | ===Usage=== | ||
− | 1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols:' box, or choose to Upload | + | 1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols:' box, or choose to 'Upload File of Identifiers' by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators). The supported input types include: |
− | * FlyBase IDs (for most data classes) | + | * FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs) |
* FlyBase symbols (for most data classes) | * FlyBase symbols (for most data classes) | ||
− | * FlyBase gene annotation symbols (CG#) | + | * FlyBase gene annotation symbols (eg. CG# or CR# for ''D. melanogaster'') |
− | |||
* PubMed IDs | * PubMed IDs | ||
* GenBank nucleotide/protein accessions | * GenBank nucleotide/protein accessions | ||
Line 15: | Line 14: | ||
− | 2. Choose to ' | + | 2. Choose whether to 'Return non-melanogaster' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than D. melanogaster) and whether to 'Match synonyms' (i.e. include ID/symbol synonyms in the search to match the submitted entry and database entries.) The default setting is to 'match synonyms' but '''not''' 'Return non-melanogaster matches'. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 57: | Line 20: | ||
− | 4. The resulting table has | + | 4. The resulting table has four sections: |
i) A header line listing the number of: | i) A header line listing the number of: | ||
* Submitted IDs | * Submitted IDs | ||
− | * Validated/Updated IDs | + | * Unique Validated/Updated IDs |
* Unknown IDs | * Unknown IDs | ||
− | + | ii) Buttons to export the list of validated IDs to: | |
− | ii) Buttons to export | ||
* a FlyBase HitList | * a FlyBase HitList | ||
− | * a | + | * the FlyBase BatchDownload tool |
− | * a | + | iii) Buttons to download/save a file of: |
− | + | * all unique validated IDs | |
− | * | + | * all unknown (unvalidated) IDs |
− | * | + | * a TSV file of the entire validation report |
− | * | + | iv) The validation table, comprising 4 columns showing: |
− | + | * a checkbox indicated whether that row should be included in any 'export' request (see (ii) above) | |
+ | * the submitted symbol/ID | ||
+ | * the validated (current) FlyBase ID | ||
+ | * the current FlyBase symbol, hyperlinked to the relevant FlyBase record | ||
+ | If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!). | ||
− | The table | + | The rows of the validation table are color-coded as follows: |
− | * | + | * entered symbols/IDs that match current FlyBase symbols/IDs are colored '''green''' |
− | * | + | * entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored '''yellow''' |
− | * unknown | + | * entered symbols/IDs that were unknown/unvalidated are colored '''red''' |
===Caveats/Disclaimers=== | ===Caveats/Disclaimers=== |
Revision as of 12:16, 15 April 2020
Overview
This tool will accept a list of FlyBase symbols/IDs (for any data type) and, where necessary/possible, update them to their current versions. It will also convert certain external IDs (GenBank nucleotide/protein accessions, UniProt accessions, PubMed IDs) into their equivalent FlyBase IDs. The output is provided as a validation table that can either be downloaded as a file or exported to a FlyBase HitList for futher processing (including conversion between data types).
Usage
1. Either type/paste in a set of IDs/symbols into the 'Enter IDs or Symbols:' box, or choose to 'Upload File of Identifiers' by clicking the Browse button. Spaces or returns should be used to separate the IDs/symbols (no commas or other text separators). The supported input types include:
- FlyBase IDs (for most data classes; e.g. FBgn (gene), FBal (allele), FBrf (reference) IDs)
- FlyBase symbols (for most data classes)
- FlyBase gene annotation symbols (eg. CG# or CR# for D. melanogaster)
- PubMed IDs
- GenBank nucleotide/protein accessions
- UniProt (Swiss-Prot/TrEMBL) accessions
2. Choose whether to 'Return non-melanogaster' matches' (i.e. FlyBase entries matching the query ID/symbol from a species other than D. melanogaster) and whether to 'Match synonyms' (i.e. include ID/symbol synonyms in the search to match the submitted entry and database entries.) The default setting is to 'match synonyms' but not 'Return non-melanogaster matches'.
3. Click on the 'Submit Query' button.
4. The resulting table has four sections:
i) A header line listing the number of:
- Submitted IDs
- Unique Validated/Updated IDs
- Unknown IDs
ii) Buttons to export the list of validated IDs to:
- a FlyBase HitList
- the FlyBase BatchDownload tool
iii) Buttons to download/save a file of:
- all unique validated IDs
- all unknown (unvalidated) IDs
- a TSV file of the entire validation report
iv) The validation table, comprising 4 columns showing:
- a checkbox indicated whether that row should be included in any 'export' request (see (ii) above)
- the submitted symbol/ID
- the validated (current) FlyBase ID
- the current FlyBase symbol, hyperlinked to the relevant FlyBase record
If one or more entered symbols/IDs mapped to multiple current FlyBase entries, then a WARNING message is displayed above the validation table, and the affected entries are marked with an exclamation mark (!).
The rows of the validation table are color-coded as follows:
- entered symbols/IDs that match current FlyBase symbols/IDs are colored green
- entered symbols/IDs that match non-current FlyBase symbol/ID synonyms and weresucessfully updated are colored yellow
- entered symbols/IDs that were unknown/unvalidated are colored red
Caveats/Disclaimers
- Only a subset of all possible conversions make sense - attempting to make non-sensical conversions (e.g. 'transcripts' converted to 'alleles') will only result in a blank output table.
- Entering an FBgn ID or CG annotation symbol that has become a secondary ID for two current genes (e.g. FBgn0053520 or CG33520), or entering a CG annotation symbol that is current for one gene but a synonym of another (e.g. CG10602), generates two separate rows in the output table - one for each matching gene.
- Secondary IDs from 3rd party sources (UniProt, GenBank, PubMed) do not work (e.g. UniProt Q9VE67 does not work, but Q8IN81 does) - such IDs need to be updated at the 3rd party site before using the FlyBase ID converter.