FlyBase:Tools Overview

From FlyBase Wiki
Revision as of 15:53, 10 September 2014 by Laura Ponting (talk | contribs)
Jump to navigation Jump to search

General Search Help and Tips

FlyBase can be searched for genes, alleles, aberrations and other genetic objects as well as phenotypes, sequences, stocks, images and movies, controlled terms, and Drosophila researchers. Search tools are available in the 'Tools' drop-down menu in the Navigation bar, which can be accessed from any FlyBase page. In addition, the homepage has direct links to the most commonly used tools. The FlyBase Site Map gives a comprehensive listing of the searches and resources available on FlyBase.

Below are summaries of each of the tools, which have been split into five main sections:

  • Overview of Search Strategies (for example, how to search for expression data) (need to delete this line if we delete that section)
  • Main Query Tools (Jump to Gene, QueryBuilder, etc.)
  • Query Results Analysis Tools (Hit list refinement, Batch Download)
  • Genomic Search Tools and Browsers (GBrowse, BLAST etc.)
  • Other Tools (Interactions Browser, Find a Person etc.)

Overview of Search Strategies

Individual gene reports for genes from all 12 sequenced Drosophila genomes are now available in FlyBase. There are four main ways in which this data can be browsed and queried in FlyBase:

Precomputed files

BLAST

GBrowse

Gene Report Pages


For those interested in genome-wide analyses, bioinformatics and comparative genomics, there are a selection of pre-computed files available for download from our precomputed files page (in the Genomes:Annotation and Sequence section, for example), found in the 'Files' menu.

For those with an interest in a specific gene/protein/region across the different species, there are a number of ways to query the data. Our BLAST server allows querying of all 12 genomes, either individually, as a subset, or all together. Each BLAST hit can then be localised and shown on the genome through GBrowse. Orthoview in GBrowse allows movement through the different genomes, illustrating the same region (where possible) in different genomes. In the near future, multiple alignments will be available, enabling direct sequence comparison between the different genomes.

Main Query Tools

Jump to Gene

Jump to Gene (J2G) is found in the top-right of the blue navigation bar on every page in FlyBase. It is a NAVIGATION tool, not a search tool, and thus should be used when you know the symbol or ID for your gene, and you simply want to go to the report page.

You can type a gene symbol or synonym, valid gene name, or FBid into the J2G box. (Hint: FBids for non-gene entities will also work, for example FBgn0086782, FBal0090485 or FBab0002363)

J2G searches with your query in the following order:

  1. primary FlyBase ID (FBgn) any hits? return hit(s), end search
  2. symbol (case-sensitive) any hits? return hit(s), end search
  3. symbol (case-insensitive) any hits? return hit(s), end search
  4. synonym (case-sensitive) any hits? return hit(s), end search
  5. synonym (case-insensitive) any hits? return hit(s), end search
  6. name (case-sensitive) any hits? return hit(s), end search
  7. secondary FlyBase ID any hits? return hit(s), end search

If nothing found, return error page

Note: J2G searches D. melanogaster genes by default. If you would like to search for a non-melanogaster gene, you need to use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp), or use the FBgn.

Note: J2G does NOT search name synonyms!

Note: Wildcards (*) are allowed in J2G entries, but non-unique results may be returned.

QuickSearch

QuickSearch is the search tool included on the FlyBase homepage. This is the quickest way to search FlyBase for an object, where you may not know the correct FlyBase symbol or you may want to generate a list of objects with a shared feature (e.g. all the alleles of a particular gene).

QuickSearch is useful when you want to quickly look something up, and perhaps you know a little bit about it. For example, if you want to quickly check if there are any stocks for "dpp", you can select "stocks" from the Data Class button, enter "dpp", click Search, and see how many stocks are available.

QuickSearch provides access to the FlyBase report pages. Searching can be performed in either D.melanogaster only, or in 'All species'. Data other than genes can be queried by selecting one of the options from the 'Data Class' drop-down menu.

A search using the default "ID/Symbol/Name" option is case-insensitive and restricted to FlyBase IDs, valid symbols and synonyms, such as CG numbers, and names. If the "All text" option is selected the search will be more comprehensive but slightly slower. A unique match for the search string produces the relevant report page, whereas more than one match will generate a list of results linked to the report pages. An error only occurs if nothing matches the input string.

QuickSearch searches with your query in the following order:

  1. primary FlyBase ID (FBgn) any hits?add hit(s) to list, keep searching
  2. symbol (case-sensitive) any hits?add hit(s) to list, keep searching
  3. symbol (case-insensitive) any hits?add hit(s) to list, keep searching
  4. synonym (case-sensitive) any hits?add hit(s) to list, keep searching
  5. synonym (case-insensitive) any hits?add hit(s) to list, keep searching
  6. name (case-sensitive) any hits?add hit(s) to list, keep searching
  7. name (case-insensitive) any hits?add hit(s) to list, keep searching
  8. secondary FlyBase ID any hits?add hit(s) to list, end search

Sum up all hits and return hits list. If nothing found, return error page QuickSearch searches D. melanogaster data by default. If you would like to search the data for all species, you can select that option. To search data on a specific species, you will need to use QueryBuilder. However, like J2G, you can use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp).

For more information, including specific search examples, please go to the QuickSearch Help Page.

QueryBuilder

QueryBuilder (QB) provides the most powerful way to search FlyBase on a field-by-field level. QB presents a simple user interface that supports powerful searches by offering access to every DataSet|Field pair (for example, Genes|CV:GO:Molecular Function) in FlyBase along with the ability to include any combination of datasets in the same search. QB automatically creates sets of records that are cross-referenced to the records that match your query, providing links to all related records in FlyBase from a single page. Both simple and complex queries can be built in a few steps. A search can be focused to a particular piece of data within a report page, such as the 'mapped features and mutations' associated with a gene, and Boolean operators can be used to combine two or more searches. QB allows a user to perform much more sophisticated searches compared to QuickSearch or other search tools on FlyBase, that take full advantage of how the data is stored in FlyBase. A useful feature of QB is that a list of FlyBase identifiers or valid symbols can be imported from an external file to use as a query segment. In addition, a set of results can be exported to QB, as described in the 'Hit list refinement' section, and then modified to refine the search by adding additional query segments. Thus, QB is a very powerful tool that can be used in many different ways to explore the data in FlyBase.

The 'Getting Started' section on the 'QB homepage' outlines the basic search strategy.

Vocabularies

Vocabularies provides easy access to data annotated with a particular controlled term or one of its synonyms. For example, you can use Vocabularies to retrieve a list of all the genes annotated with a particular GO term, or all the transcripts expressed in a particular body part. You do not need to know the precise term that FlyBase uses to store the data; the search box on the Vocabularies page retrieves controlled vocabulary terms that contain your query or terms that list a synonym containing the search term. For example, if you enter wing you will obtain a list that includes the controlled terms wing, anterior wing margin, and dorsal mesothoracic disc, which has the synonym wing disc. The controlled terms in the list are hyperlinks to TermReport pages that describe a single term in detail. Alternatively, you can also browse various controlled vocabulary hierarchies, by using the trees displayed on the main Vocabularies page.

Vocabularies is the only search tool in FlyBase that allows users to search directly for controlled vocabulary (CV) term reports from any of the controlled vocabularies (CVs) used by FlyBase. This includes the GO and anatomy hierarchies, among others. Wildcards are automatically added to the beginning and the end of a search term. For each search performed, Vocabularies returns a hit list of CV term reports that match the search term. These are listed according to CV type, in the following order: anatomy term reports, FlyBase controlled vocabulary term reports, development term reports, GO term reports and SO term reports. Each term report allows the user to retrieve gene, allele, transcript, polypeptide or image reports associated with the term.

Please see the tool help on the bottom of the Vocabularies entry page for more information.

RNA-Seq Search

Interactions Browser

The Interactions Browser, found under the 'Tools' menu, provides a graphical way of exploring the genetic interactions reported in the allele reports. The browser works in two modes: You can either search for the interactions of an allele, or the interactions of a gene. The latter will show the interactions of all alleles of the gene. Each node of an interaction diagram is a hyperlink, which enables you to navigate and browse the complex web of known genetic interactions. Placing your cursor over the center of a node activates a pop-up window that in the case of a network of gene interactions contains a summary of the function of that particular gene, while in the case of interactions between alleles shows the context in which the interactions of that allele have been reported. For more information, go to the Interactions Browser help documentation.

Query Results Analysis Tools

HitList Refinement

When you perform any search that returns multiple hits, you are presented with a hit list, that can be modified or refined. By default all records are selected for inclusion in subsequent manipulations, but the checkboxes allow user-defined subsets to be created. The first data column links directly to the report for each record that matched your search. Other columns link to GBrowse or to searches that return hits directly related to that record. In addition to these links, the hit list provides a set of powerful tools for query refinement or batch processing.

The 'Show related' drop down menu enables you to see all objects of a particular class that are related to the hits selected in your list. For example, selecting 'clones' from the 'Show related' menu of a gene search will return a list of clones that are related to the selected genes.

The 'Results Analysis/Refinement' button allows you to see the frequency of values within your selected hits for a predefined list of fields. Selecting 'Biological process', for example, from the Results Analysis/Refinement tool for a list of genes involved in the Notch signalling pathway will result in a page listing the distribution of the different biological process controlled vocabulary terms associated with the list. Clicking on the number in the 'Related records' column will return the genes from your hitlist that are annotated to be involved in that GO term.

Lastly, the 'HitList Conversion Tools' button allows you to send the selected hits to our Batch Download tool for use offline, to a new QueryBuilder session for further querying, or to link-out HTML tables of various third party data sources with data linked to the hits in your result list.

Batch Download

The Batch Download tool provides bulk access to a variety of data and data formats, such as FASTA sequence data and XML files, for a specified list of unique IDs (please note: secondary IDs, synonyms, or full names are not allowed because they are not unique).

IDs can be sent from a FlyBase hit list, uploaded from a local file, or entered manually.

The Field Data output format provides access to two types of data: data from our set of precomputed flat files and data from the HTML reports. Any line from a precomputed file that matches the lists of IDs supplied can be downloaded using the precomputed file option.

The HTML table option allows you to create a custom report with only the fields you want while preserving hyperlinks for direct navigation to other FlyBase data. Recently the HTML table option has been improved by listing all fields as they appear on the report pages, and making them easier to identify by categorising them as CV (controlled vocabulary), Symbol, Date, or Text.

Genomic Search Tools and Browsers

BLAST

BLAST (Basic Local Alignment Search Tool), provides a method for rapid searching of nucleotide and protein databases. FlyBase BLAST allows the opportunity to BLAST query the 12 completed Drosophila genomes, along with related insect species for which full genomes have been sequenced. BLAST provides access to the FASTA sequences of all sequenced Drosophila sequences, as well as providing links to GenBank. In addition, you can BLAST an unknown sequence and identify its position on GBrowse.

The BLAST homepage is split into three sections; the first allows the user to input the query sequence and set-up the standard BLAST parameters (e.g. Expectation value, database to be searched); the second section allows the species to be selected; while the third allows the user to specify advanced BLAST options.

Clicking on the hyperlinks provides hints and tips for the BLAST search.

GBrowse

FlyBase GBrowse provides a graphical or tabular representation of the 12 sequenced Drosophila genomes. Genes, insertions, and deficiencies, along with other mapped objects are illustrated along with orthologous regions in other Drosophila genomes, and affymetrix probes. There is a separate help manual for GBrowse that can be accessed from the GBrowse pages, along with information about the different evidence tiers available.

By default FlyBase present a view of D.melanogaster that displays gene models, transcript and polypeptide data, natural transposon insertion sites, and cDNAs. These and many additional tracks are easily configured to create a customised view of the data. You can navigate to a specific location by entering a precise sequence range, or any valid FlyBase identifier for a gene, gene product, or insertion in the 'Landmark or Region' box. 'Advanced Search' enables you to move to a particular cytological location. Additionally, FlyBase BLAST output includes GBrowse links that display each BLAST alignment as a highlighted feature in the context of neighbouring gene models and other features of the region. This is an extremely useful entry path into the sequence data of species other than D.melanogaster, which in some cases is comprised of a large number of relatively short unlinked scaffolds.

Aberration Maps

The Aberration Maps show molecularly localized genes and aberrations aligned with the sequence scaffolds for the Drosophila melanogaster arms and chromosomes. The links take you to the left end of an arm or chromosome, and you can browse the molecularly localized genes and aberrations by scrolling to the right. Hovering over a gene will produce a pop-up containing a short automatically generated summary of the information known about the gene. Placing your mouse over an aberration will produce a pop-up listing all the genes predicted to be deleted or truncated by the aberration (in alphabetical order). If you click on a gene or an aberration you will move to a detailed report page with more information on the gene or aberration. Clicking on a cytological band will produce a list of all the insertions, genes, and aberrations predicted to affect the band, including ones that are not molecularly localized, ordered by position along the chromsome. A copy of the images of the arms and chromosomes can be downloaded from the aberrations section of the precomputed files page.

Chromosome Maps

The chromosome maps show sequence scaffolds aligned to polytene chromosome maps for the Muller elements of the sequenced Drosophila species. For more information on the syntenic relationships among the 12 sequenced genomes, their standard chromosomal numbering and corresponding Muller element please see the Muller Element Arm Synteny Table. The aligned sequence scaffolds, shown in blue on the maps, provide access to the sequence data and gene models. When you move your cursor over one of the blue scaffolds a yellow box appears that corresponds to a GBrowse window, and clicking on the box will take you to the corresponding location in GBrowse.

CytoSearch

CytoSearch lists are regional maps of the Drosophila melanogaster genome incorporating both sequence-based and cytology-based map data. Sequence-based data trumps cytology when both are available, cytology trumps meiotic data when both are available, and estimated cytology is used when only meiotic data are available. The FlyBase correspondence tables for cytological and sequence level maps are used to estimate cytology from sequence range and sequence range from cytology, for both the underlying data and the query input.

CytoSearch is useful for searching for genetic objects mapped to a particular genomic region (but not necessarily mapped to the sequence).

Other Tools

Coordinate Converter

The Coordinates Converter allows you to convert genomic coordinates between different genome releases. Just select the input and output assembly, enter your list of coordinates (or load them from a file), and away you go! It's that simple.

ID Converter

ImageBrowse

ImageBrowse allows the user to browse through image reports by organ system, life-cycle, tagma, or germ layer, as well as browsing images of different Drosophilids. Miscellaneous images and quick-time films are also accessible from this section. Controlled vocabulary terms are used to annotate and label the images. To search images, and to link relevant gene, allele, transcript and protein records to stages of development, a region of the body or to a specific body part, go to Vocabularies.

GoogleTM FlyBase

We have a Google search box (found in box the Tools and Help menus) that can be used to search the entire FlyBase site in a Google-style manner. Google FlyBase is best used to search documentation, but not necessarily to search data about a gene, as it does not restrict its search to specific data fields, and results depend upon Google indexing which cannot be controlled by FlyBase (i.e. you may not find results specific to the newest release).

Find a Person/Add yourself to the database

FlyBase compiles Drosophila Researcher information to aid networking and communication in the community. To add yourself to the database, use the Add a new Person tool, found in the Tools menu.

Find a Person allows you to select which field of the personal data you want to search. For example, to search for all the registered Drosophila Researchers in a particular city, you can select the city field and search for the city of interest.

Simple combinatorial searches are also possible, for example you can search for 'Smith' in 'Texas', if you so desire.

The search can also be to Principal Investigators (PIs) by ticking the 'Search for PIs only' box.

To update an existing address, the name of the person concerned should be typed into the text box. If the name is ambiguous, e.g. Smith, then a list of full names containing the name is provided.

From here, clicking on the name to change allows the details to be altered. A confirmation e-mail will be sent to the given e-mail address to confirm the changes.