FlyBase:Tools Overview

From FlyBase Wiki
Jump to navigation Jump to search

General Search Help and Tips

Last Updated: 29 January 2010

FlyBase can be searched for genes, alleles, aberrations and other genetic objects, phenotypes, sequences, stocks, images and movies, controlled terms, and Drosophila researchers using the tools available from the 'Tools' drop-down menu in the Navigation bar. In addition to the Navigation bar, which can be accessed from any FlyBase page, the homepage also has direct links to the most commonly used tools.

Below are summaries of each of the tools, which have been split into five main sections:

Overview of Search Strategies

Searching 12 species

Individual gene reports for genes from the 12 originally sequenced Drosophila genomes are now available in FlyBase. There are four main ways in which this data can be browsed and queried in FlyBase:

  • Gene Report Pages

For those interested in genome-wide analyses, bioinformatics and comparative genomics, there are a selection of pre-computed files available for download from our precomputed files page (in the Genomes:Annotation and Sequence section, for example), found in the 'Files' menu.

For those with an interest in a specific gene/protein/region across the different species, there are a number of ways to query the data. Our BLAST server allows querying of numerous sequenced insect genomes, either individually, as a subset, or all together. Each BLAST hit can then be localised and shown on the genome through GBrowse. Orthoview in GBrowse allows movement through the different genomes, illustrating the same region (where possible) in different genomes. In the near future, multiple alignments will be available, enabling direct sequence comparison between the different genomes.

Aberrations - deficiencies, inversions, translocations

One of the problems in a field of the size and complexity of Drosophila genetics is the use of nomenclature. This can lead to a number of names being given to the same object, and to the valid FlyBase name or symbol of an object being quite confusing or indeed not in common lab parlance. Aberration naming is no exception. The simplest ways to search for an aberration are either using CytoSearch, when you want to find an aberration that removes a particular gene or uncovers a cytological band, or using QuickSearch (selecting aberrations as the data class). Remember to use wildcards (i.e. *) to allow for slight differences in naming. FlyBase records all mentions of an aberration, so if an aberration is given a particular symbol in a paper, this name will be recorded as a synonym of the FlyBase 'valid' symbol (see the nomenclature document for more details). Alternatively, you can browse the molecularly localized aberrations for each chromosome or arm from the the Aberrations Maps page or by scanning GBrowse after selecting all "Aberrations" tracks.

Cytologically mapped features

When looking for cytology, you have a choice of a number of tools on FlyBase, including QueryBuilder. The easiest tools to use however, are CytoSearch or GBrowse. GBrowse is especially useful when looking for molecularly mapped sequences, insertions, or Affymetrix probes. CytoSearch comes into its own when searching for cytologically defined features, such as cytologically-mapped genes or deficiencies, that haven't been molecularly mapped to the sequence. Of course, as with many aspects of research, complimentary methods should be used. Therefore, we recommend you use both GBrowse and CytoSearch to analyse cytology.

Expression Data

Browsing Expression Data

Expression patterns are captured by FlyBase curators for transcripts, proteins, and "reporters" (i.e. enhancer trap insertions and reporter constructs). Information about transcript and protein expression patterns can be found on gene reports (e.g. the cnn gene), data for reporter constructs can be found on recombinant construct reports (e.g. P{dpp-lacZ.B}), and data for enhancer traps can be found on insertion reports (e.g. P{GawB}elav[C155]). In all cases, expression data will be found in the "Expression Data" section of the report. Please note that, in the case of transcript and protein expression, text descriptions of expression patterns are available on the Gene Expression Report (e.g. cnn), which is linked to at the top of the Expression Data section.

We also cooperate with several other databases of expression data and either display a portion of their data within FlyBase (e.g. FlyExpress) and/or link to their database (e.g. FlyAtlas). These types of data can be found in the "External Data & Images" subsection of the "Expression Data" section.

High throughput expression data can also be viewed using the GBrowse tool. By selecting the tracks found under the "High Throughput data (Arrays)" section, the available data will be shown on GBrowse.

Searching for Expression Patterns

Expression data can be searched most easily and accurately by using QueryBuilder or Vocabularies, depending on how you'd like to search. You'll want to use QueryBuilder if you're interested in a multipart query (e.g. generate a list of genes which have the GO term "transcription factor activity" and whose protein products are expressed in the central nervous system). However, if you're interested in all genes expressed in a bodypart, tissue, or developmental stage, you can find that using Vocabularies. For example, by entering the term "adult mushroom body" into Vocabularies, you can obtain a list of genes expressed in that tissue.

Searching for High-Throughput Expression Patterns

RNA-Seq expression data can be searched to identify genes with specific expression characteristics using the FlyBase RNA-Seq_Profile_Search tool.

Mutant phenotype data

Mutant phenotype data is associated with alleles in FlyBase, so you need to search allele data if you are interested in mutant phenotype. In addition to free text describing the phenotype, the alleles are indexed with controlled vocabulary (CV) terms, which makes it easier for you to search for a particular phenotype, e.g. searching for mutants that affect the wing. You can search with these CV terms using either Vocabularies or QueryBuilder.

You can find mutant alleles affecting the wing from all species using Vocabularies. If you enter the term "wing" into Vocabularies search page and then click on the "Alleles" button in the report page, you will obtain a list of mutant alleles that affect the wing. However, to search in a specific species, or to search for mutant phenotypes as part of a multipart query, QueryBuilder must be used. In this case, you should pick the "CV Hierarchy (GO/etc.)" dataset and then use the term picker to choose the body part, e.g. wing. In both cases, the default is to search both for alleles specifically labelled with the CV term, e.g. wing and also with child CV terms that are a subset of the term chosen, e.g. wing vein. If you want to restrict your search to just the precise term chosen, use QueryBuilder and select 'Retrieve records annotated with "This CV term only"' before you run the query.

References

FlyBase is an excellent source of Drosophila references. References can be searched in a number of ways. The easiest way is through QuickSearch, on our homepage. Changing the 'Data Class' to search to references alters the layout of the QuickSearch box, providing a text box for Author, Year(s), and All text.

More refined reference searches can be performed using QueryBuilder (QB). Click on the box titled 'Query is empty.. Click here to start building' on the QB start page to being the search. At this stage the window will be displaying all the fields available to search for the 'Genes' dataset. Change the dataset to 'References'. Now the fields found in the reference reports are displayed. From here, you can search all the data found in the reference report, including pubmed ID, author, and type (e.g. review).

A popular way to search for references is to search for a (list of) objects (e.g. genes, GO terms) and then to use the 'Show related' toggle on the hits page to change the hit list to the related references. The 'Results Analysis/Refinement' button, found on the hit list page, can be used to analyse the distribution of the references over year, journal, author, and type of publication (e.g. review, paper, abstract).

Stocks

One of the easiest ways to search for a stock in FlyBase is to use QuickSearch. Simply change the data class to 'stocks', type in the feature of interest (e.g. a gene symbol, allele symbol), and search. A further way to identify stocks is through the hit list produced after a search. At the top of the hit list there is a toggle allowing you to 'Show related' stocks. Stocks can also be found for individual alleles by clicking on the Stocks matryoska on the allele report page.

Main Query Tools

Jump to Gene

Jump to Gene (J2G) is found in the top-right of the blue navigation bar on every page in FlyBase. It is a NAVIGATION tool, not a search tool, and thus should be used when you know the symbol or ID for your gene, and you simply want to go to the report page.

You can type a gene symbol or synonym, valid gene name, or FBid into the J2G box. (Hint: FBids for non-gene entities will also work, for example FBgn0086782, FBal0090485 or FBab0002363)

J2G searches with your query in the following order:

  1. primary FlyBase ID (FBgn) any hits? return hit(s), end search
  2. symbol (case-sensitive) any hits? return hit(s), end search
  3. symbol (case-insensitive) any hits? return hit(s), end search
  4. synonym (case-sensitive) any hits? return hit(s), end search
  5. synonym (case-insensitive) any hits? return hit(s), end search
  6. name (case-sensitive) any hits? return hit(s), end search
  7. secondary FlyBase ID any hits? return hit(s), end search

If nothing found, return error page

Note: J2G searches D. melanogaster genes by default. If you would like to search for a non-melanogaster gene, you need to use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp), or use the FBgn.

Note: J2G does NOT search name synonyms!

Note: Wildcards (*) are allowed in J2G entries, but non-unique results may be returned.

QuickSearch

QuickSearch is the search tool included on the FlyBase homepage. This is the quickest way to search FlyBase for an object, where you may not know the correct FlyBase symbol or you may want to generate a list of objects with a shared feature (e.g. all the alleles of a particular gene).

QuickSearch is useful when you want to quickly look something up, and perhaps you know a little bit about it. For example, if you want to quickly check if there are any stocks for "dpp", you can select "stocks" from the Data Class button, enter "dpp", click Search, and see how many stocks are available.

QuickSearch provides access to the FlyBase report pages. Searching can be performed in either D.melanogaster only, or in 'All species'. Data other than genes can be queried by selecting one of the options from the 'Data Class' drop-down menu.

A search using the default "ID/Symbol/Name" option is case-insensitive and restricted to FlyBase IDs, valid symbols and synonyms, such as CG numbers, and names. If the "All text" option is selected the search will be more comprehensive but slightly slower. A unique match for the search string produces the relevant report page, whereas more than one match will generate a list of results linked to the report pages. An error only occurs if nothing matches the input string.

QuickSearch searches with your query in the following order:

  1. primary FlyBase ID (FBgn) any hits? add hit(s) to list, keep searching
  2. symbol (case-sensitive) any hits? add hit(s) to list, keep searching
  3. symbol (case-insensitive) any hits? add hit(s) to list, keep searching
  4. synonym (case-sensitive) any hits? add hit(s) to list, keep searching
  5. synonym (case-insensitive) any hits? add hit(s) to list, keep searching
  6. name (case-sensitive) any hits? add hit(s) to list, keep searching
  7. name (case-insensitive) any hits? add hit(s) to list, keep searching
  8. secondary FlyBase ID any hits? add hit(s) to list, end search

Sum up all hits and return hits list. If nothing found, return error page QuickSearch searches D. melanogaster data by default. If you would like to search the data for all species, you can select that option. To search data on a specific species, you will need to use QueryBuilder. However, like J2G, you can use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp).

For more information, including specific search examples, please go to the QuickSearch Help Page.

QueryBuilder

QueryBuilder (QB) provides the most powerful way to search FlyBase on a field-by-field level. QB presents a simple user interface that supports powerful searches by offering access to every DataSet|Field pair (for example, Genes|CV:GO:Molecular Function) in FlyBase along with the ability to include any combination of datasets in the same search. QB automatically creates sets of records that are cross-referenced to the records that match your query, providing links to all related records in FlyBase from a single page. Both simple and complex queries can be built in a few steps. A search can be focused to a particular piece of data within a report page, such as the 'mapped features and mutations' associated with a gene, and Boolean operators can be used to combine two or more searches. QB allows a user to perform much more sophisticated searches compared to QuickSearch or other search tools on FlyBase, that take full advantage of how the data is stored in FlyBase. A useful feature of QB is that a list of FlyBase identifiers or valid symbols can be imported from an external file to use as a query segment. In addition, a set of results can be exported to QB, as described in the 'Hit list refinement' section, and then modified to refine the search by adding additional query segments. Thus, QB is a very powerful tool that can be used in many different ways to explore the data in FlyBase.

The 'Query Builder Help' section on the 'QB homepage' outlines the basic search strategy. There are three options on the QB start page: select a pre-constructed query, import a previously saved query, and build a new query. Help for all of these options is available further down the page as well as a description of how to carry out an expression data search.

Vocabularies (previously known as TermLink)

Vocabularies provides easy access to data annotated with a particular controlled term or one of its synonyms. For example, you can use Vocabularies to retrieve a list of all the genes annotated with a particular GO term, or all the transcripts expressed in a particular body part. You do not need to know the precise term that FlyBase uses to store the data; the search box on the Vocabularies page retrieves controlled vocabulary terms that contain your query or terms that list a synonym containing the search term. For example, if you enter wing you will obtain a list that includes the controlled terms wing, anterior wing margin, and dorsal mesothoracic disc, which has the synonym wing disc. The controlled terms in the list are hyperlinks to TermReport pages that describe a single term in detail. Alternatively, you can also browse various controlled vocabulary hierarchies, by using the trees displayed on the main Vocabularies page.

Vocabularies is the only search tool in FlyBase that allows users to search directly for controlled vocabulary (CV) term reports from any of the controlled vocabularies (CVs) used by FlyBase. This includes the GO and anatomy hierarchies, among others. Wildcards are automatically added to the beginning and the end of a search term. For each search performed, Vocabularies returns a hit list of CV term reports that match the search term. These are listed according to CV type, in the following order: anatomy term reports, FlyBase controlled vocabulary term reports, development term reports, GO term reports and SO term reports. Each term report allows the user to retrieve gene, allele, transcript, polypeptide or image reports associated with the term.

Please see the tool help on the bottom of the Vocabularies entry page for more information.

Query Results Analysis Tools

HitList Refinement

When you perform any search that returns multiple hits, you are presented with a hit list, that can be modified or refined. By default all records are selected for inclusion in subsequent manipulations, but the checkboxes allow user-defined subsets to be created. The first data column links directly to the report for each record that matched your search. Other columns link to GBrowse or to searches that return hits directly related to that record. In addition to these links, the hit list provides a set of powerful tools for query refinement or batch processing.

The 'Show related' drop down menu enables you to see all objects of a particular class that are related to the hits selected in your list. For example, selecting 'clones' from the 'Show related' menu of a gene search will return a list of clones that are related to the selected genes.

The 'Results Analysis/Refinement' button allows you to see the frequency of values within your selected hits for a predefined list of fields. Selecting 'Biological process', for example, from the Results Analysis/Refinement tool for a list of genes involved in the Notch signalling pathway will result in a page listing the distribution of the different biological process controlled vocabulary terms associated with the list. Clicking on the number in the 'Related records' column will return the genes from your hitlist that are annotated to be involved in that GO term.

Lastly, the 'HitList Conversion Tools' button allows you to send the selected hits to our Batch Download tool for use offline, to a new QueryBuilder session for further querying, or to link-out HTML tables of various third party data sources with data linked to the hits in your result list.

Batch Download

The Batch Download tool provides bulk access to a variety of data and data formats, such as FASTA sequence data and XML files, for a specified list of unique IDs (please note: secondary IDs, synonyms, or full names are not allowed because they are not unique).

IDs can be sent from a FlyBase hit list, uploaded from a local file, or entered manually.

The Field Data output format provides access to two types of data: data from our set of precomputed flat files and data from the HTML reports. Any line from a precomputed file that matches the lists of IDs supplied can be downloaded using the precomputed file option.

The HTML table option allows you to create a custom report with only the fields you want while preserving hyperlinks for direct navigation to other FlyBase data. Recently the HTML table option has been improved by listing all fields as they appear on the report pages, and making them easier to identify by categorising them as CV (controlled vocabulary), Symbol, Date, or Text.

Genomic Search Tools and Browsers

BLAST

BLAST (Basic Local Alignment Search Tool), provides a method for rapid searching of nucleotide and protein databases. FlyBase BLAST allows the opportunity to BLAST query the 12 completed Drosophila genomes, along with related insect species for which full genomes have been sequenced. BLAST provides access to the FASTA sequences of all sequenced Drosophila sequences, as well as providing links to GenBank. In addition, you can BLAST an unknown sequence and identify its position on GBrowse.

The BLAST homepage is split into three sections; the first allows the user to input the query sequence and set-up the standard BLAST parameters (e.g. Expectation value, database to be searched); the second section allows the species to be selected; while the third allows the user to specify advanced BLAST options.

Clicking on the hyperlinks provides hints and tips for the BLAST search.

GBrowse

FlyBase GBrowse provides a graphical or tabular representation of the 12 sequenced Drosophila genomes. Genes, insertions, deficiencies, orthologous regions in other Drosophila genomes, and a wide array of other mapped features can be selected and viewed along a genome coordinate scale. You can navigate to a specific location by entering a precise sequence range, any valid FlyBase identifier for a gene, gene product, or insertion, or a cytological band in the 'Landmark or Region' box. Additionally, FlyBase BLAST output includes GBrowse links that display each BLAST alignment as a highlighted feature in the context of neighbouring gene models and other features of the region.

By default FlyBase presents a view of D.melanogaster that displays gene models, transcripts, natural transposon insertion sites, repeat regions, estimated cytological bands, cDNAs, transgenic insertion sites, Gnomon gene predictions, regions with orthologs in other drosophila species and the modENCODE Developmental stage RNA-seq track. Tracks can be easily reordered by clicking on the track name and dragging to a new location on the viewer. Additional tracks can be selected from the 'Select Tracks' link at the top of the Browser. Descriptions of individual tracks can be accessed from the "?" icon next to each track on the 'Select 'tracks' page or at the GBrowse tracks document at the FlyBase wiki. A set of icons next to the track names in the GBrowse viewer provide options for managing tracks including the option to show, hide, turn off, get information about and configure each track. The configure tracks options for the RNA-seq expression tracks are particularly helpful, including the ability to choose a log2 or linear view and track spacing; see description at Custom configurations of RNA-Seq profiles in GBrowse. Together, these options allow a highly customized view of the data.

See the FlyBase GBrowse Help wiki page for FlyBase-specific tips (a link is also at the top right of the GBrowse window). A more generic GBrowse help manual, 'Help with this browser', provides additional details on other very useful features of the GBrowse viewer, and can be accessed from the Help menu in the upper right corner of the GBrowse page. Additional representations of the genome data including a tabular view of mapped features or decorated FASTA can be selected and configured from the drop down menu in the upper right corner of the viewer.

Finding orthologs using GBrowse

By adding 'Similarity' tracks to the D.melanogaster genome view you can use the resulting ortholog links to navigate to orthologs in the other species. You can also find an ortholog by selecting the species from the 'Data Source' menu and entering the D.melanogaster gene symbol or FBgn ID in the 'Landmark or Region' box. In addition, as described above, FlyBase BLAST output provides links to GBrowse. This is an extremely useful entry path into the sequence data of species other than D.melanogaster, which in some cases is comprised of a large number of relatively short unlinked scaffolds.

Aberration Maps

The Aberration Maps show molecularly localized genes and aberrations aligned with the sequence scaffolds for the Drosophila melanogaster arms and chromosomes. The links take you to the left end of an arm or chromosome, and you can browse the molecularly localized genes and aberrations by scrolling to the right. Hovering over a gene will produce a pop-up containing a short automatically generated summary of the information known about the gene. Placing your mouse over an aberration will produce a pop-up listing all the genes predicted to be deleted or truncated by the aberration (in alphabetical order). If you click on a gene or an aberration you will move to a detailed report page with more information on the gene or aberration. Clicking on a cytological band will produce a list of all the insertions, genes, and aberrations predicted to affect the band, including ones that are not molecularly localized, ordered by position along the chromsome. A copy of the images of the arms and chromosomes can be downloaded from the aberrations section of the precomputed files page.

Chromosome Maps

The chromosome maps show sequence scaffolds aligned to polytene chromosome maps for the Muller elements of the sequenced Drosophila species. For more information on the syntenic relationships among the 12 sequenced genomes, their standard chromosomal numbering and corresponding Muller element please see the Muller Element Arm Synteny Table. The aligned sequence scaffolds, shown in blue on the maps, provide access to the sequence data and gene models. When you move your cursor over one of the blue scaffolds a yellow box appears that corresponds to a GBrowse window, and clicking on the box will take you to the corresponding location in GBrowse.

CytoSearch

CytoSearch lists are regional maps of the Drosophila melanogaster genome incorporating both sequence-based and cytology-based map data. Sequence-based data trumps cytology when both are available, cytology trumps meiotic data when both are available, and estimated cytology is used when only meiotic data are available. The FlyBase correspondence tables for cytological and sequence level maps are used to estimate cytology from sequence range and sequence range from cytology, for both the underlying data and the query input.

CytoSearch is useful for searching for genetic objects mapped to a particular genomic region (but not necessarily mapped to the sequence).

Coordinate Converter

The Coordinates Converter allows you to convert genomic coordinates between different genome releases. Just select the input and output assembly, enter your list of coordinates (or load them from a file), and away you go! It's that simple.

RNA-Seq Expression Profile Search

RNA-Seq Profile Search is a fine grained query tool, powered by modENCODE high-throughput RNA-Seq expression data, that allows you to find genes with specific patterns of expression across several variables. Interested in development of the central nervous system? Search for genes that are expressed in these tissues during a specific developmental stage. Curious how toxins affect the fly reproductive system? Search for genes expressed in fly gonads that are activated (or suppressed) by exposure to Paraquat or Rotenone.

Choose datasets for expression by stage, tissue, treatments, or cell lines, or use several datasets in conjunction. Each dataset is presented in a form that allows you to select either narrow slices of the data, or larger sections for more coverage. You also have control over the levels of expression used in the search, allowing you to define distinct thresholds for the ON and OFF states. Keep in mind that extremely narrow search conditions may produce sparse or empty result sets. Feel free to experiment; the tool will remember your settings so that you can adjust, instead of needing to re-enter them. Search results can be exported, as usual, for further analysis or download.

NB: The group check box selectors are interpreted differently depending on whether you are making selections from the 'Expression ON' or 'Expression OFF' sections. 'Expression ON' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'OR'. This means that if a gene is expressed at or above the chosen expression level in any one or more of the selected stages it will be returned in the result list. To get 'AND' behavior (i.e., return only those genes which are expressed at the chosen level in each one of the selected stages) you must select each of the stages individually. 'Expression OFF' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'AND'. This means that for a gene to be returned in the result list, the observed level of expression must be at or below the selected level in all of the selected group stages. Therefore, for the 'expression OFF' selectors, checking a group check box is functionally identical to selecting each individual sub-category.

The modENCODE high-throughput RNA-seq data were originally published in Graveley et al., 2011 and Brown et al., 2014, comprising 30 developmental stage expression profiles, 29 tissue expression profiles, 25 treatment/condition expression profiles and 24 cell line expression profiles. RNA-Seq reads were mapped to the Release 6 genome assembly as described in Brown et al., 2014. For each RNA-Seq sample, gene expression level was calculated as RPKM, and assigned to one of eight bins (from low to extremely high), as described in Gelbart and Emmert, 2013.

Other Tools

GoogleTM FlyBase

We have a Google search box (found in box the Tools and Help menus) that can be used to search the entire FlyBase site in a Google-style manner. Google FlyBase is best used to search documentation, but not necessarily to search data about a gene, as it does not restrict its search to specific data fields, and results depend upon Google indexing which cannot be controlled by FlyBase (i.e. you may not find results specific to the newest release).

Interactions Browser

The Interactions Browser, found under the 'Tools' menu, provides a graphical way of exploring the genetic interactions reported in the allele reports. The browser works in two modes: You can either search for the interactions of an allele, or the interactions of a gene. The latter will show the interactions of all alleles of the gene. Each node of an interaction diagram is a hyperlink, which enables you to navigate and browse the complex web of known genetic interactions. Placing your cursor over the center of a node activates a pop-up window that in the case of a network of gene interactions contains a summary of the function of that particular gene, while in the case of interactions between alleles shows the context in which the interactions of that allele have been reported. For more information, go to the Interactions Browser help documentation.

ImageBrowse

ImageBrowse allows the user to browse through image reports by organ system, life-cycle, tagma, or germ layer, as well as browsing images of different Drosophilids. Miscellaneous images and quick-time films are also accessible from this section. Controlled vocabulary terms are used to annotate and label the images. To search images, and to link relevant gene, allele, transcript and protein records to stages of development, a region of the body or to a specific body part, go to Vocabularies.

Find a Person

FlyBase compiles Drosophila Researcher information to aid networking and communication in the community. To add yourself to the database, use the Add a new Person tool, found in the Tools menu.

Find a Person allows you to select which field of the personal data you want to search. For example, to search for all the registered Drosophila Researchers in a particular city, you can select the city field and search for the city of interest.

Simple combinatorial searches are also possible, for example you can search for 'Smith' in 'Texas', if you so desire.

The search can also be to Principal Investigators (PIs) by ticking the 'Search for PIs only' box.

Update an Address

To update an existing address, the name of the person concerned should be typed into the text box. If the name is ambiguous, e.g. Smith, then a list of full names containing the name is provided.

From here, clicking on the name to change allows the details to be altered. A confirmation e-mail will be sent to the given e-mail address to confirm the changes.