FlyBase:QuickSearch

From FlyBase Wiki
Revision as of 07:33, 5 August 2015 by Steven marygold (talk | contribs) (Orthology tab)

Jump to: navigation, search

General help

Overview

The QuickSearch tool on the FlyBase home page allows searching across all FlyBase reports. Forms for searching specific types of data have been separated into ‘tabs’, arrayed at the top of the QuickSearch window. Several of the tabs contain entirely new search tools, such as a new ‘Simple’ search form, an easy–to–use tool with access to all the data types in FlyBase. More information on how to use QuickSearch can be found below.

Links to specific help for each tab:

Simple tab |  Data Type tab | Expression tab | Phenotype tab | References tab | GO tab | Protein Domain tab | Gene Groups tab | Human Disease tab | Orthology tab

Species Searched

Several tabs search data that may be species-specific. In these tabs, a Species checkbox appears giving you the option to ‘include non-Dmel species’ in your search results. The default behavior is to return only Drosophila melanogaster (Dmel) data.

In the Data Type tab, an override behavior is available. To search for data in a non-Dmel species you can add a 4-letter species prefix to the symbol you are using to search, separated by a backslash (‘\’). For example, if you type Dvir\dpp, the search results for the gene symbol dpp will be filtered for those associated with D. virilis only.

Controlled Vocabularies

Several QuickSearch tabs search FlyBase data by making use of controlled vocabulary (CV) terms. These tabs provide intuitive domain-specific searches of FlyBase reports based on the Gene Ontology (GO) controlled vocabulary, on anatomical, developmental-stage-specific or phenotypic class terms used to annotate phenotypes, and on anatomical and/or developmental-stage-specific terms used to annotate gene expression. Combinations of CV terms can be searched using the forms in these tabs. An auto-completion feature is active wherever a search term should come from a CV, to assist you in choosing terms that will match records in FlyBase.

Auto-completion

The QuickSearch auto-completion feature is active in tabs that search FlyBase using controlled Vocabulary terms. Since only terms that are in the controlled Vocabulary will match records in FlyBase, the auto-completion feature suggests CV terms that are compatible with what you have typed. Selecting a term from the suggestion list reduces the possibility of a search returning nothing because the search term is not one that is used by FlyBase curators.

Some tabs for non-CV-based searches also use the auto-complete feature. Several of the searchable fields available in the References tab are enhanced with auto-completion, which helps prevent searches that fail due to mis-spelled names or mis-remembered journal titles. Most of the data classes searchable under the Data Type tab have auto-completion associated with them as well.

The QuickSearch auto-completion feature overrides your browser’s auto-completion function.

Coordinated Auto-completion

The coordinated auto-completion feature is active for tabs in which several search terms may be used simultaneously for a search. When a term has been entered in one of these fields, the coordinated auto-completion for the other fields is aware of the term already typed, and suggests only terms that actually occur in combination with the first term in FlyBase reports. Here is an example of how it works in the Expression tab:

When the expression pattern (lit. curated) data class is selected, text box fields for Stage, Tissue, and Cell Loc. (cell location) are displayed. The auto-completion for these three fields is coordinated in the following sense: Suppose you enter "fertilized egg stage" in the Stage text box. When you move your focus to the Tissue text box, auto-complete there will show only four options; "egg", "female pronucleus", "fertilized egg", and "male pronucleus". This is because, out of the multitude of CV terms available for the Tissue field, only these four terms have actually been used in combination with "fertilized egg stage" by curators in an annotation captured in the FlyBase database. If you enter any other term in the Tissue text box, even though it may be a valid CV term for that field, your search would return zero hits, because there are no FlyBase reports containing that combination of CV terms.

Using the terms suggested by the auto-completion feature ensures that you do not enter terms that would be mutually exclusive (or are simply not used by curators) in FlyBase reports. Terms suggested by the auto-completion should always return results. If the coordinated auto-completion does not offer a term you wish to enter in a field, it is because this term does not appear in combination with some other term you have entered elsewhere on the form. In this case you should try another combination.

Wild Cards

When you use QuickSearch you can add the asterisk character ( * ) to the beginning or the end of a search term. This is recognized as a “wild card” and will find all terms that contain your search term at the end or beginning of a phrase, respectively. You can also flank your search term with wild card characters to find all phrases containing your search term. For example, you can find the genes that start with 'ft' by entering 'ft*'. (Search the Genes data class either under the Simple tab by selecting the 'Genes' data class from the result summary table, or under the Data Type tab by selecting 'genes' from the Data Class drop-down menu.) The result of this search lists fat (ft) and fushi tarazu (ftz), as you would expect, and also fruitless (fru), because it has the synonym 'fty'.

Tab Descriptions

Simple tab

This tab performs a comprehensive search of text-searchable FlyBase data. This includes most fields from sixteen data classes of reports. The search returns a result page summarizing the matching records by data type. Clicking on one of these data types takes you to a secondary result page containing a table of individual matches within that data type. QuickSearch also places your query text in a resubmission form on the result summary page, where you can edit or refine the phrase directly and search again, without having to start over.

The QuickSearch auto-completion feature is not active in this tab.

Data Class tab

This tab contains a subset of the previous version of QuickSearch, and is laid out with very little change from that version. The Data Class drop-down menu restricts searches to only the single data type chosen, as before.

The QuickSearch auto-completion feature is active for most of the data classes in this tab.

Expression tab

Search for genes according to expression patterns:

At the top of this tab is a link to the RNA-Seq Profile Search tool. This tool can be used to search for genes by specifying a pattern of expression, as evidenced by high-throughput RNA-seq experiments.

Use the form below this link to search curated statements that describe published accounts of transcript and polypeptide expression. The input form has input boxes for developmental Stage, body part or Tissue, and subcellular localization (Cell Loc.). The coordinated auto-completion feature will assist you in finding the appropriate controlled vocabulary (CV) terms that have been used during the curation of each descriptor.

You can refine this search further by choosing to add qualifier terms. The coordinated auto-completion feature will provide you with a list of CV terms that have been used by curators to modify or limit the associated main term. The auto-completion for the qualifier terms is fully coordinated across all of these fields, in the sense that choosing a term for (e.g.) the Stage input will affect which qualifier terms are suggested for the Tissue or Cell Loc qualifier fields.

Phenotype tab

Search for alleles that have particular phenotypes. The form is divided into two portions, which may be used independently or in combination. The coordinated auto-completion feature will assist you in finding the appropriate controlled vocabulary (CV) terms that have been used during the curation of each phenotype.

The top section searches for alleles with a particular phenotypic class, e.g. "lethal" or "behavior defective". You can refine this search further using the refinement boxes, searching for a phenotype that occurs at a particular developmental stage, e.g. an embryonic stage and/or under particular conditions, e.g. "recessive" or "heat sensitive".

The bottom section searches for alleles that show a phenotype in a particular tissue or cell type, e.g. "wing" or "RP2 neuron". In this case, terms from the controlled vocabulary or cellular component terms from the Gene Ontology controlled vocabulary are used. Again, you can refine this search further using the refinement boxes.

Please note that the coordinated auto-completion works within the two sections, but not between them. This means it is possible even when using auto-completion suggestions, to search on a combination of terms entered in both sections of this form that will return zero hits.

References tab

This tab searches the extensive FlyBase bibliography. Searches can be filtered by title/abstract text, journal name, publication type, and reference IDs (PubMed or FlyBase), in addition to the author and date filters. Appropriate fields also allow the use of Boolean operators, so you can search for papers authored by e.g. “Smith NOT Johnson”. In addition to Boolean operators the year field supports mathematical comparison symbols (>,>=,<,<=) and range indicators (-,--,..). For example,

>2003 <=1945 1999-2003 1970-1990 NOT 1976 1992 OR 1995 OR 1998 The QuickSearch auto-completion feature is active for the fields in this tab where it will be helpful, such as the journal name field. These fields are indicated with a superscript.

GO tab

Search the Gene Ontology (GO) controlled vocabulary directly. Results are a CV term report, or list of reports. Once you are looking at the term report, you can then get a list of genes that are annotated with that GO term (look in the Records annotated with this exact term section), among other things. Please see the CV term report help page for more information.

The QuickSearch auto-completion feature is active in this tab.

Protein Domains tab

Search using InterPro IDs or signatures, including protein domains, families, repeats, and sites.

Either start typing and select a term from the drop-down menu, or enter your own search term using wildcard(s) (*) if desired. Resulting hits will be genes whose protein products are annotated with an InterPro signature that wholly or partially matches your term. N.B. Search with an InterPro ID (e.g. 'IPR019956') if you wish to retrieve hits annotated with a specific InterPro signature.

See the InterPro [FAQ] page for an explanation of these different signature types.

The QuickSearch auto-completion feature is active in this tab.

Gene Groups tab

Search the FlyBase-curated Gene Group data class using a gene or Gene Group symbol, name, synonym or ID.

The QuickSearch auto-completion feature is active in this tab.

Click the 'Browse' button to see a full list of Gene Groups.

Human Disease tab

Search the Disease Ontology (DO) controlled vocabulary to find alleles that have been used as disease models. Enter the name of a disease into the search box. The results are a list of Disease Ontology CV reports that match the inputted term. From the CV report you can get a list of all genes or alleles that have been used to model, or interact with a model, of that disease in flies. Please see the CV term report help page for more information.

The QuickSearch auto-completion feature is active in this tab.

Click the 'Browse' button to see a full list of Human Disease Model Reports.

Orthology tab

This tab can be used to quickly search for orthologs of D. melanogaster, human or other model organism genes, as provided by the DRSC Integrative Ortholog Prediction Tool (DIOPT) or OrthoDB. The DIOPT dataset integrates ortholog predictions for 8 model organisms from multiple tools and algorithms. (Further documentation is here.) The OrthoDB dataset (as implemented in FlyBase) comprises ~40 species, biased towards those that are closely related to D. melanogaster, and arranged into 5 ‘orthology groups’: Drosophila species, non-Drosophila Dipterans, non-Dipteran Insects, non-Insect Arthropods, non-Arthropod Metazoa.

To use, first select the input species by clicking on the Species drop-down menu. Next, enter one or more gene symbols/IDs in the adjacent Gene(s) box - multiple entries are accepted and need to be separated by spaces. (Response time will be proportional to the number of entries.) Then, select one or more output species using the check-boxes - where the input species is D. melanogaster, there is a choice between searching the DIOPT or OrthoDB datasets. Finally, click the green Search button or press ‘enter’.

The symbols/IDs that may be entered in the Gene(s) box depends on the 'input species', as follows:

Input species Allowable symbols/IDs (example)
H. sapiens HGNC gene symbol (e.g. CDK1) or gene ID (e.g. HGNC:1722); OMIM ID (e.g. OMIM_GENE:116940); NCBI Gene ID (e.g. '983); Ensembl ID (e.g. ENSG00000170312)
M. musculus MGI gene symbol (e.g. Cdk1) or gene ID (e.g. MGI:88351); NCBI Gene ID (e.g. 12534)
X. tropicalis XenBase gene symbol (e.g. cdk1) or gene ID (e.g. XB-GENE-482750); NCBI Gene ID (e.g. 394503)
D. rerio ZFIN gene symbol (e.g. cdk1) or gene ID (e.g. ZDB-GENE-010320-1); NCBI Gene ID (e.g. 80973)
D. melanogaster FlyBase gene symbol (e.g. Cdk1), annotation symbol (e.g. CG5363), or gene ID (e.g. FBgn0004106); NCBI Gene ID (e.g. 34411)
C. elegans WormBase gene symbol (e.g. cdk-1) or gene ID (e.g. WBGene00000405); NCBI Gene ID (e.g. 176374)
S. cerevisiae SGD gene symbol (e.g. CDC28) or gene ID (e.g. S000000364); NCBI Gene ID (e.g. 852457)
S. pombe PomBase gene symbol (e.g. cdc2); NCBI Gene ID (e.g. 2539869)

Note that symbol-based searches are case-sensitive - to ensure validity, users should select a gene symbol from the auto-suggest list that appears when typing. (Auto-suggest works only for the first entered symbol.) Also note that this tool does not support searching using gene fullnames.


On the results page, the top row shows the search term, species, the matched gene symbol, and any relevant links to Gene Reports. Below this are the column headers, followed by the list of ortholog predictions arranged by species. For DIOPT-based searches, the columns are:

  • Ortholog Gene: official gene symbol, as used in the relevant model organism database
  • Ortholog Gene Reports: links to report pages at model organism databases, NCBI, Ensembl and/or OMIM
  • Score: a simple score indicating the number of tools that support a given orthologous gene-pair relationship
  • Best Score: either ‘yes’ or ‘no’ to indicate whether the given ortholog has the highest score for the query gene
  • Best Rev Score: either ‘yes’ or ‘no’ to indicate whether the query gene has the highest score for the given ortholog in the reciprocal search; also includes a link to show the full results of performing the reciprocal search (among those species selected in the original query)
  • Source: list of individual ortholog prediction tools that support a given orthologous gene-pair relationship
  • Align: link to an alignment between the given orthologous gene-pairs on the DIOPT site
  • Transgene in Fly: link to a FlyBase Gene Report for a non-Drosophila gene, indicating that that gene has been expressed transgenically in Drosophila

The results page for OrthoDB-based searches is similar, except that the DIOPT-specific columns are absent and the ‘Source’ column lists only ‘OrthoDB’. The 'orthology group' to which the species belongs is shown on the right side of each species line.

In cases where there are multiple hits to a single search term (as may happen when a numerical ID is entered), then all hits together with their predicted orthologs are shown in the results table.

Clicking on the Save results as tsv file text at the top of the results page will download all the results shown in that page to a file in tab separated value format, with one orthologous gene-pair per line. It has the following columns:

  • query_context: the entered search term
  • query_species: the selected input species
  • query_gene: the matched input gene symbol
  • target_species: the selected output species
  • ortholog_gene: official gene symbol, as used in the relevant model organism database
  • ortholog_gene_reports: gene IDs at model organism databases, NCBI, Ensembl and/or OMIM
  • source: list of individual ortholog prediction tools that support a given orthologous gene-pair relationship
  • score: a simple score indicating the number of tools that support a given orthologous gene-pair relationship
  • best_score: either ‘yes’ or ‘no’ to indicate whether the given ortholog has the highest score for the query gene
  • best_reverse_score: either ‘yes’ or ‘no’ to indicate whether the query gene has the highest score for the given ortholog in the reciprocal search
  • transgene_in_fly: Where applicable, the FlyBase gene ID and symbol for a non-Drosophila gene where that gene has been expressed transgenically in Drosophila

The columns for OrthoDB-based searches are similar, except that the DIOPT-specific columns are absent and the ‘Source’ column lists only ‘OrthoDB’.