Difference between revisions of "FlyBase:Tools Overview"

From FlyBase Wiki
Jump to navigation Jump to search
Line 201: Line 201:
 
==GBrowse/JBrowse==
 
==GBrowse/JBrowse==
  
FlyBase [http://{{flybaseorg}}/cgi-bin/gbrowse/dmel/ GBrowse] has several tracks that display RNA-Seq expression profiles, which give coverage values base-by-base across the genome. Choose datasets for expression by stage, tissue, treatments, or cell lines. By default, the many tracks are displayed in the layered FlyBase “TopoView” format, and data are shown on a log2 scale, since they range over many orders of magnitude. There are many customization options, including using a linear scale, viewing a subset of tracks within an RNA-Seq dataset, and removing the view offset; details are provided [http://{{SERVERNAME}}/static_pages/feature/previous/articles/2014_11/topoview_config.html here]. Currently, the RNA-Seq data presentation in FlyBase JBrowse is more limited (restricted to the modENCODE developmental stages) and can not be customized.
+
FlyBase [http://{{flybaseorg}}/cgi-bin/gbrowse/dmel/ GBrowse] has several tracks that display RNA-Seq expression profiles, which give coverage values base-by-base across the genome. Choose datasets for expression by stage, tissue, treatments, or cell lines. By default, the many tracks are displayed in the layered FlyBase “TopoView” format, and data are shown on a log2 scale, since they range over many orders of magnitude. There are many customization options, including using a linear scale, viewing a subset of tracks within an RNA-Seq dataset, and removing the view offset; details are provided [http://{{SERVERNAME}}/static_pages/feature/previous/articles/2014_11/topoview_config.html here]. Currently, the RNA-Seq data presentation in FlyBase JBrowse is more limited (restricted to the modENCODE developmental stages) and cannot be customized.
  
 
==RNA-Seq Profile==
 
==RNA-Seq Profile==

Revision as of 19:57, 12 December 2017

General Search Help and Tips

Last Updated: 29 January 2010

FlyBase can be searched for genes, alleles, aberrations and other genetic objects, phenotypes, sequences, stocks, images and movies, controlled terms, and Drosophila researchers using the tools available from the 'Tools' drop-down menu in the Navigation bar. In addition to the Navigation bar, which can be accessed from any FlyBase page, the homepage also has direct links to the most commonly used tools.

Below are summaries of each of the tools, which have been split into five main sections:

Overview of Search Strategies

Searching 12 species

Individual gene reports for genes from the 12 originally sequenced Drosophila genomes are now available in FlyBase. There are four main ways in which this data can be browsed and queried in FlyBase:

  • Gene Report Pages

For those interested in genome-wide analyses, bioinformatics and comparative genomics, there are a selection of pre-computed files available for download from our precomputed files page (in the Genomes:Annotation and Sequence section, for example), found in the 'Files' menu.

For those with an interest in a specific gene/protein/region across the different species, there are a number of ways to query the data. Our BLAST server allows querying of numerous sequenced insect genomes, either individually, as a subset, or all together. Each BLAST hit can then be localised and shown on the genome through GBrowse. Orthoview in GBrowse allows movement through the different genomes, illustrating the same region (where possible) in different genomes. In the near future, multiple alignments will be available, enabling direct sequence comparison between the different genomes.

Aberrations - deficiencies, dupications, inversions, translocations

One of the problems in a field of the size and complexity of Drosophila genetics is the use of nomenclature. This can lead to a number of names being given to the same object, and to the valid FlyBase name or symbol of an object being quite confusing or indeed not in common lab parlance. Aberration naming is no exception. The simplest ways to search for an aberration are either using CytoSearch, when you want to find an aberration that removes a particular gene or uncovers a cytological band, or using QuickSearch (selecting aberrations as the data class). Remember to use wildcards (i.e. *) to allow for slight differences in naming. FlyBase records all mentions of an aberration, so if an aberration is given a particular symbol in a paper, this name will be recorded as a synonym of the FlyBase 'valid' symbol (see the nomenclature document for more details). Alternatively, you can browse the molecularly localized aberrations for each chromosome by scanning GBrowse after selecting all "Aberrations" tracks.

Cytologically Mapped Features

When looking for cytology, you have a choice of a number of tools on FlyBase, including QueryBuilder. The easiest tools to use however, are CytoSearch or GBrowse. GBrowse is especially useful when looking for molecularly mapped sequences, insertions, or Affymetrix probes. CytoSearch comes into its own when searching for cytologically defined features, such as cytologically-mapped genes or deficiencies, that haven't been molecularly mapped to the sequence. Of course, as with many aspects of research, complimentary methods should be used. Therefore, we recommend you use both GBrowse and CytoSearch to analyse cytology.

Expression Data

Browsing Expression Data

Expression patterns are captured by FlyBase curators for transcripts, proteins, and "reporters" (i.e. enhancer trap insertions and reporter constructs). Information about transcript and protein expression patterns can be found on gene reports (e.g. the elav gene), data for reporter constructs can be found on recombinant construct reports (e.g. P{elav-lacZ.H}) and associated allele reports (e.g. Ecol\lacZelav.PH), and data for enhancer or protein traps can be found on insertion reports (e.g. P{GawB}elavC155) and associated allele reports (e.g. Scer\GAL4elav-C155). In all cases, expression data will be found in the "Expression Data" section of the report. For those constructs or insertions that reflect expression of a particular gene, data are also promoted to the corresponding gene report, in a subsection of "Expression Data" labeled "Expression Deduced from Reporters" (e.g. expression data for both P{elav-lacZ.H} and P{GawB}elavC155 are displayed on the elav gene report). Subcellular Localization of protein is populated from Gene Ontology (GO) Cellular Component curation of genes.

We cooperate with several other databases of expression data and either display a portion of their data within FlyBase (e.g. FlyExpress) and/or link to their database (e.g. Fly-FISH). These types of data can be found in the "External Data & Images" subsection of the "Expression Data" section. Additionally, we maintain a set of links to Image Based Resources, including image databases, tools for image analysis, and tools for image visualization and annotation.

High throughput expression data from FlyAtlas and modENCODE can be found on Gene Reports in a subsection of "Expression Data" labeled "High-Throughput Expression Data". These data are visualized in a further subsection labeled "Reference". The modENCODE data can be visualized as a linear or log graph, or as a heatmap. The FlyAtlas section also includes a 'back-to-back' option, in which gene expression levels in larval tissues are juxtaposed with gene expression levels in the corresponding adult tissues. The graph displays can be scaled by gene maximum expressed, or by low, moderate, or high expression bin max.

High throughput expression data can also be viewed using the GBrowse2 tool. By selecting the tracks found under the "Microarray Features" and/or "Expression Levels" sections, the available data will be shown on GBrowse2.

Searching for Expression Patterns

Expression data curated from literature can be searched most easily and accurately by using the QuickSearch Expression or Gal4 etc tabs. The Expression tab allows searches for genes by temporal-spatial expression pattern, while the GAL4 etc allows searches for GAL4 and other binary drivers, and non-binary reporters. Another expression pattern search option is QueryBuilder, which supports multipart queries (e.g. generate a list of genes which have the GO term "transcription factor activity" and whose protein products are expressed in the central nervous system). However, if you're interested in all genes expressed in a bodypart, tissue, or developmental stage, you can find that using or Vocabularies. For example, by entering the term "adult mushroom body" into Vocabularies, you can obtain a list of genes expressed in that tissue.

Searching for High-Throughput Expression Patterns

RNA-Seq expression data can be searched to identify genes with specific expression characteristics using the RNA-Seq Profile Search tool. Genes that have expression patterns similar to a given gene can be found using the RNA-Seq Similarity Search

Mutant Phenotype Data

Mutant phenotype data is associated with alleles in FlyBase, so you need to search allele data if you are interested in mutant phenotype. In addition to free text describing the phenotype, the alleles are indexed with controlled vocabulary (CV) terms, which makes it easier for you to search for a particular phenotype, e.g. searching for mutants that affect the wing. You can search with these CV terms using either Vocabularies or QueryBuilder.

You can find mutant alleles affecting the wing from all species using Vocabularies. If you enter the term "wing" into Vocabularies search page and then click on the "Alleles" button in the report page, you will obtain a list of mutant alleles that affect the wing. However, to search in a specific species, or to search for mutant phenotypes as part of a multipart query, QueryBuilder must be used. In this case, you should pick the "CV Hierarchy (GO/etc.)" dataset and then use the term picker to choose the body part, e.g. wing. In both cases, the default is to search both for alleles specifically labelled with the CV term, e.g. wing and also with child CV terms that are a subset of the term chosen, e.g. wing vein. If you want to restrict your search to just the precise term chosen, use QueryBuilder and select 'Retrieve records annotated with "This CV term only"' before you run the query.

References

FlyBase is an excellent source of Drosophila references. References can be searched in a number of ways. The easiest way is through QuickSearch, on our homepage. Choose the 'References' tab and fill in one or more of the search boxes. The field identity of each search box can be modified using the dropdown menus at the left. For more information, please go to the QuickSearch Help Page.

More refined reference searches can be performed using QueryBuilder (QB). Click on the box titled 'Query is empty.. Click here to start building' on the QB start page to being the search. At this stage the window will be displaying all the fields available to search for the 'Genes' dataset. Change the dataset to 'References'. Now the fields found in the reference reports are displayed. From here, you can search all the data found in the reference report, including pubmed ID, author, and type (e.g. review).

A popular way to search for references is to search for a (list of) objects (e.g. genes, GO terms) and then to use the 'Show related' toggle on the hits page to change the hit list to the related references. The 'Results Analysis/Refinement' button, found on the hit list page, can be used to analyse the distribution of the references over year, journal, author, and type of publication (e.g. review, paper, abstract).

Stocks

One of the easiest ways to search for a stock in FlyBase is to use QuickSearch. Simply change the data class to 'stocks', type in the feature of interest (e.g. a gene symbol, allele symbol), and search. A further way to identify stocks is through the hit list produced after a search. At the top of the hit list there is a toggle allowing you to 'Show related' stocks. Stocks can also be found for individual alleles by clicking on the Stocks matryoska on the allele report page.

Main Query Tools

Jump to Gene / Search FlyBase

Jump to Gene (J2G) and Search Flybase are alternative query tools found in the top-right of the blue navigation bar on every page in FlyBase - these allow for targeted and wide searches of FlyBase, respectively.

Jump to Gene

The J2G mode is a NAVIGATION tool, not a search tool, and thus should be used when you know the name, symbol or ID for your gene/allele, and you simply want to go to the corresponding report page. You can type a gene symbol or synonym, valid gene name, or FBid into the J2G box (e.g. amn, amnesiac or FBgn0086782). This tool also works with FBids for non-gene entities (e.g. FBal0090485 or FBab0002363) and with allele symbols (e.g. amn[X8]).

J2G searches with your query in the following order:

  1. primary FlyBase ID (FBgn) any hits? return hit(s), end search
  2. symbol (case-sensitive) any hits? return hit(s), end search
  3. symbol (case-insensitive) any hits? return hit(s), end search
  4. synonym (case-sensitive) any hits? return hit(s), end search
  5. synonym (case-insensitive) any hits? return hit(s), end search
  6. name (case-sensitive) any hits? return hit(s), end search
  7. secondary FlyBase ID any hits? return hit(s), end search

If nothing found, return error page

J2G searches D. melanogaster genes by default. If you would like to search for a non-melanogaster gene, you need to use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp), or use the respective FBgn identifier. J2G does NOT search name synonyms! J2G entries allow wildcards (*), but non-unique results may be returned.

Search FlyBase

The Search FlyBase mode is the same as the QuickSearch - Search FlyBase tab found at the HomePage. It performs a comprehensive search of text-searchable FlyBase data across all classes of reports and the results are displayed in the form a Hit List summarizing the matching records by data type. For example, a search for 'amn' retrieves the matching reports for Aberration, Allele, Anatomy Ontology, Clone, Dataset, Gene, etc...

By clicking on one or more of the data types in the hit list it will only display individual matches within those data types. Click on any of the individual hits to view the corresponding report page.

'Search FlyBase' entries allow wildcards (*) to broaden the query. 'Search FlyBase' entries also allow multiple terms: a Boolean 'AND' is used as default (e.g. ‘cnn cbs’ equals to 'cnn AND cbs'). Adding 'OR' between terms will find records that have one or another of a list of terms (e.g. ‘cnn OR cbs’). To exclude certain terms from the results, use the ‘-’ character as a prefix (e.g. ‘Parkinson -CG5680’). Finally, results can be specified to contain an exact phrase by surrounding the search term with double quotes (e.g. “SH3 domain”).

QuickSearch

The QuickSearch tool on the FlyBase home page allows searching across all FlyBase reports. Forms for searching across all data types or for searching specific types of data have been separated into ‘tabs’, arrayed at the top of the QuickSearch window. Use the "Simple" tab to search all FlyBase reports. Results are in the form of a hit list summarizing the matching records by data type. More limited searches are available in the remaining tabs. You can search for particular curated data classes, e.g. genes, alleles, aberrations, etc. (Data Class tab), Human Disease models (Human Disease tab), Orthologs (Orthologs tab), GAL4 and other drivers and reporters (GAL4 etc tab), Protein Domains (Protein Domains tab), Gene Expression (Expression tab), Gene Groups (Gene Groups tab), Phenotype (Phenotype tab), Gene Ontolgy (GO tab), and References (References tab).

QuickSearch searches D. melanogaster data by default. The "Simple, Expression, and "Data Class" tabs offer the option to search all species. If you want to search for a gene in a particular species, you can use the unique, 4-letter species abbreviation, followed by a backslash, and then the gene symbol (e.g. Dpse\dpp).

For a full description of the QuickSearch tabs, see QuickSearch Help Page.

QueryBuilder

QueryBuilder (QB) provides the most powerful way to search FlyBase on a field-by-field level. QB presents a simple user interface that supports powerful searches by offering access to DataSet|Field pairs (for example, Genes|CV:GO:Molecular Function) in FlyBase along with the ability to include any combination of datasets in the same search (Note that Human Disease, Cell Line, Gene Group, and Strain reports are not accessible with QueryBuilder). QB automatically creates sets of records that are cross-referenced to the records that match your query, providing links to all related records in FlyBase from a single page. Both simple and complex queries can be built in a few steps. A search can be focused on a particular piece of data within a report page, such as the 'mapped features and mutations' associated with a gene, and Boolean operators (and, or, but not) can be used to combine two or more searches. QB allows a user to perform much more sophisticated searches compared to QuickSearch or other search tools on FlyBase, that take full advantage of how the data is stored in FlyBase. A useful feature of QueryBuilder is that sets of results can be exported to QB from hitlists, as described in the 'Hit list refinement' section, and then modified to refine the search by adding additional query segments. Thus, QB is a very powerful tool that can be used in many different ways to explore the data in FlyBase.

The 'Query Builder Help' section on the 'QueryBuilder Home Page' outlines the basic search strategy. There are three options on the QB start page: select a pre-constructed query, import a previously saved query, and build a new query. Help for all of these options is available further down the page as well as a description of how to carry out an expression data search.

Vocabularies (previously known as TermLink)

The Vocabularies Search Page provides easy access to data annotated with a particular controlled term or one of its synonyms. For example, you can use Vocabularies to retrieve a list of all the genes annotated with a particular GO term, or all the transcripts expressed in a particular body part. You do not need to know the precise term that FlyBase uses to store the data; the search box on the Vocabularies page retrieves controlled vocabulary terms that contain your query or terms that list a synonym containing the search term. For example, if you enter wing you will obtain a list that includes the controlled terms wing, anterior wing margin, and dorsal mesothoracic disc, which has the synonym wing disc. The controlled terms in the list are hyperlinked to TermReport pages that describe a single term in detail. Alternatively, you can also browse various controlled vocabulary hierarchies, by using the trees displayed on the main Vocabularies page.

The Vocabularies Search is the only search tool in FlyBase that allows users to search directly for controlled vocabulary (CV) term reports from any of the controlled vocabularies (CVs) used by FlyBase. This includes the GO and anatomy hierarchies, among others. Wildcards are automatically added to the beginning and the end of a search term. For each search performed, Vocabularies returns a hit list of CV term reports that match the search term. These are listed according to CV type, in the following order: anatomy term reports, FlyBase controlled vocabulary term reports, development term reports, GO term reports and SO term reports. Each term report allows the user to retrieve gene, allele, transcript, polypeptide or image reports associated with the term.

Please go to Vocabularies Help for more information. This page can also be accessed from the bottom of the Vocabularies Search Page.

There is a video tutorial on YouTube.

Query Results Analysis Tools

HitList Refinement

When you perform any search that returns multiple hits, you are presented with a hit list, that can be modified or refined. By default all records are selected for inclusion in subsequent manipulations, but the checkboxes allow user-defined subsets to be created. The first data column links directly to the report for each record that matched your search. Other columns link to GBrowse or to searches that return hits directly related to that record. In addition to these links, the hit list provides a set of powerful tools for query refinement or batch processing.

The 'Show related' drop down menu enables you to see all objects of a particular class that are related to the hits selected in your list. For example, selecting 'clones' from the 'Show related' menu of a gene search will return a list of clones that are related to the selected genes.

The 'Results Analysis/Refinement' button allows you to see the frequency of values within your selected hits for a predefined list of fields. Selecting 'Biological process', for example, from the Results Analysis/Refinement tool for a list of genes involved in the Notch signalling pathway will result in a page listing the distribution of the different biological process controlled vocabulary terms associated with the list. Clicking on the number in the 'Related records' column will return the genes from your hitlist that are annotated to be involved in that GO term.

Lastly, the 'HitList Conversion Tools' button allows you to send the selected hits to our Batch Download tool for use offline, to a new QueryBuilder session for further querying, or to link-out HTML tables of various third party data sources with data linked to the hits in your result list.

Batch Download

The Batch Download tool provides bulk access to a variety of data and data formats, such as FASTA sequence data and XML files, for a specified list of unique IDs (please note: secondary IDs, synonyms, or full names are not allowed because they are not unique).

IDs can be sent from a FlyBase hit list, uploaded from a local file, or entered manually.

The Field Data output format provides access to two types of data: data from our set of precomputed flat files and data from the HTML reports. Any line from a precomputed file that matches the lists of IDs supplied can be downloaded using the precomputed file option.

The HTML table option allows you to create a custom report with only the fields you want while preserving hyperlinks for direct navigation to other FlyBase data. Recently the HTML table option has been improved by listing all fields as they appear on the report pages, and making them easier to identify by categorising them as CV (controlled vocabulary), Symbol, Date, or Text.

Genomic Search Tools and Browsers

BLAST

BLAST (Basic Local Alignment Search Tool), provides a method for rapid searching of nucleotide and protein databases. FlyBase BLAST allows the opportunity to BLAST query the 12 completed Drosophila genomes, along with related insect species for which full genomes have been sequenced. BLAST provides access to the FASTA sequences of all sequenced Drosophila sequences, as well as providing links to GenBank. In addition, you can BLAST an unknown sequence and identify its position on GBrowse.

The BLAST homepage is split into three sections; the first allows the user to input the query sequence and set-up the standard BLAST parameters (e.g. Expectation value, database to be searched); the second section allows the species to be selected; while the third allows the user to specify advanced BLAST options.

Clicking on the hyperlinks provides hints and tips for the BLAST search.

GBrowse

FlyBase GBrowse provides a graphical or tabular representation of the 12 sequenced Drosophila genomes. Genes, insertions, deficiencies, mapped mutations, RNA-seq data, orthologous regions in other Drosophila genomes, and a wide array of other mapped features can be selected and viewed along a genome coordinate scale. You can navigate to a specific location by entering a precise sequence range, any valid FlyBase identifier for a gene, gene product, or insertion, or a cytological band in the 'Landmark or Region' box. Additionally, FlyBase BLAST output includes GBrowse links that display each BLAST alignment as a highlighted feature in the context of neighbouring gene models and other features of the region.

By default FlyBase presents a view of D.melanogaster that displays gene models, transcripts, natural transposon insertion sites, repeat regions, estimated cytological bands, cDNAs, transgenic insertion sites, Gnomon gene predictions, regions with orthologs in other drosophila species and the modENCODE Developmental stage RNA-seq track. Tracks can be easily reordered by clicking on the track name and dragging to a new location on the viewer. Additional tracks can be selected from the 'Select Tracks' link at the top of the Browser. Descriptions of individual tracks can be accessed from the "?" icon next to each track on the 'Select 'tracks' page or at the GBrowse tracks document at the FlyBase wiki. A set of icons next to the track names in the GBrowse viewer provide options for managing tracks including the option to show, hide, turn off, get information about and configure each track. The configure tracks options for the RNA-seq expression tracks are particularly helpful, including the ability to choose a log2 or linear view and track spacing; see description at Custom configurations of RNA-Seq profiles in GBrowse. Together, these options allow a highly customized view of the data.

See the FlyBase GBrowse Help wiki page for FlyBase-specific tips (there is also a link at the top right of the GBrowse window). A more generic GBrowse help manual, 'Help with this browser', provides additional details on other very useful features of the GBrowse viewer, and can be accessed from the Help menu in the upper right corner of the GBrowse page. Additional representations of the genome data including a tabular view of mapped features or decorated FASTA can be selected and configured from the drop down menu in the upper right corner of the viewer.

Finding orthologs using GBrowse

By adding 'Similarity' tracks to the D.melanogaster genome view you can use the resulting ortholog links to navigate to orthologs in the other species. You can also find an ortholog by selecting the species from the 'Data Source' menu and entering the D.melanogaster gene symbol or FBgn ID in the 'Landmark or Region' box. In addition, as described above, FlyBase BLAST output provides links to GBrowse. This is an extremely useful entry path into the sequence data of species other than D.melanogaster, which in some cases is comprised of a large number of relatively short unlinked scaffolds.

JBrowse

FlyBase JBrowse provides a graphical representation of the Drosophila melanogaster genome. JBrowse was developed by the Generic Model Organism Database (GMOD) consortium to be the eventual successor to GBrowse. It is offered in FlyBase in parallel to GBrowse2 until all the features now available in GBrowse2 can be made available in JBrowse.

Genes, cDNAs, insertions, deficiencies, mapped mutations, regulatory features, RNAi reagents, RNA-seq data, and a wide array of other mapped features can be selected and viewed along a genome coordinate scale. You can navigate to a specific location by entering a precise sequence range, any valid FlyBase identifier for a gene, gene product, or insertion, or a cytological band in the 'Landmark or Region' box. Then move laterally along the genome by using the arrows at the top of the browser or by clicking in an open area of the viewer and dragging side to side. You can zoom in and out by clicking the plus and minus icons in the navigation bar or zoom in by selecting a region of the lower coordinate scale. You can move to a different region of the chromosome arm by clicking on a spot on the chromosome scale at the top of the viewer and switch to a different chromosome by using the chromosome selector at the top.

FlyBase presents a view of D.melanogaster that displays gene models and the modENCODE Developmental stage RNA-seq track. Additional tracks can be selected from the 'Available Tracks' menu at the left side of the Browser. Tracks can be easily reordered by clicking on the track name and dragging to a new location on the viewer. Descriptions of individual tracks can be found in the FlyBase JBrowse Tracks document at the FlyBase wiki]].

See the FlyBase JBrowse Help wiki page for FlyBase-specific tips. More generic JBrowse help can be accessed from the Help menu in the upper row of the JBrowse page.

Chromosome Maps

The chromosome maps show sequence scaffolds aligned to polytene chromosome maps for the Muller elements of the sequenced Drosophila species. For more information on the syntenic relationships among the 12 sequenced genomes, their standard chromosomal numbering and corresponding Muller element please see the Muller Element Arm Synteny Table. The aligned sequence scaffolds, shown in blue on the maps, provide access to the sequence data and gene models. When you move your cursor over one of the blue scaffolds a yellow box appears that corresponds to a GBrowse window, and clicking on the box will take you to the corresponding location in GBrowse.

CytoSearch

CytoSearch lists are regional maps of the Drosophila melanogaster genome incorporating both sequence-based and cytology-based map data. Sequence-based data trumps cytology when both are available, cytology trumps meiotic data when both are available, and estimated cytology is used when only meiotic data are available. The FlyBase correspondence table for cytological and sequence level maps are used to estimate cytology from sequence range and sequence range from cytology, for both the underlying data and the query input.

CytoSearch is useful for searching for genetic objects mapped to a particular genomic region (but not necessarily mapped to the sequence).

Coordinate Converter

The Coordinates Converter allows you to convert genomic coordinates between different genome releases. Just select the input and output assembly, enter your list of coordinates (or load them from a file), and away you go! It's that simple.

Feature Mapper

The Feature Mapper allows you to do a search with one or many genes, sequence-based features or genomic regions and returns a wide variety of sequence-based genomic features that overlap or map within the associated genomic region(s). The reported features include gene structure features (including genes, exons, 5’ UTRs, CDSs), aligned evidence (including cDNAs, exon junctions, and peptides), noncoding features (including regulatory regions, insulators, transcription factor binding sites, and RNA editing sites), mapped mutations (including transgenic insertion sites, point mutations, and indels), and RNAi reagents. The search returns lists of features that map to the region(s) of interest in a report that contains the feature type, the feature sequence coordinates, and the feature symbol. Where applicable, the symbol links to the FlyBase report page for that feature. Enter the symbols or IDs for genomic features or a sequence region, check the features that you wish to have returned, and submit your query. Search results can be saved as a GFF file or exported to a hitlist.

RNA-Seq Tools and Browsers

The primary RNA-Seq data in FlyBase are the modENCODE data originally published in Graveley et al., 2011 and Brown et al., 2014, comprising 30 developmental stage expression profiles, 29 tissue expression profiles, 25 treatment/condition expression profiles and 24 cell line expression profiles. RNA-Seq reads were mapped to the Release 6 genome assembly as described in Brown et al., 2014. In GBrowse/JBrowse genomic views, several other RNA-Seq datasets are also presented. The RNA-Seq query tools are restricted to the modENCODE datasets. For each modENCODE RNA-Seq sample, gene expression level was calculated as RPKM within the exonic extent of the gene, as described in Gelbart and Emmert, 2013. For purposes of presentation and queries, values were assigned to one of eight bins, from very low to extremely high.

There is direct link from the FlyBase home page (top array of icons) to an RNA-Seq overview page that includes the tools and options described below.

GBrowse/JBrowse

FlyBase GBrowse has several tracks that display RNA-Seq expression profiles, which give coverage values base-by-base across the genome. Choose datasets for expression by stage, tissue, treatments, or cell lines. By default, the many tracks are displayed in the layered FlyBase “TopoView” format, and data are shown on a log2 scale, since they range over many orders of magnitude. There are many customization options, including using a linear scale, viewing a subset of tracks within an RNA-Seq dataset, and removing the view offset; details are provided here. Currently, the RNA-Seq data presentation in FlyBase JBrowse is more limited (restricted to the modENCODE developmental stages) and cannot be customized.

RNA-Seq Profile

RNA-Seq Profile is a fine grained query tool, powered by modENCODE high-throughput RNA-Seq expression data, that allows you to find genes with specific patterns of expression across several variables. Interested in development of the central nervous system? Search for genes that are expressed in these tissues during a specific developmental stage. Curious how toxins affect the fly reproductive system? Search for genes expressed in fly gonads that are activated (or suppressed) by exposure to Paraquat or Rotenone.

Choose datasets for expression by stage, tissue, treatments, or cell lines, or use several datasets in conjunction. Each dataset is presented in a form that allows you to select either narrow slices of the data, or larger sections for more coverage. You also have control over the levels of expression used in the search, allowing you to define distinct thresholds for the ON and OFF states. Keep in mind that extremely narrow search conditions may produce sparse or empty result sets. Feel free to experiment; the tool will remember your settings so that you can adjust, instead of needing to re-enter them. Search results can be exported, as usual, for further analysis or download.

NB: The group check box selectors are interpreted differently depending on whether you are making selections from the 'Expression ON' or 'Expression OFF' sections. 'Expression ON' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'OR'. This means that if a gene is expressed at or above the chosen expression level in any one or more of the selected stages it will be returned in the result list. To get 'AND' behavior (i.e., return only those genes which are expressed at the chosen level in each one of the selected stages) you must select each of the stages individually. 'Expression OFF' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'AND'. This means that for a gene to be returned in the result list, the observed level of expression must be at or below the selected level in all of the selected group stages. Therefore, for the 'expression OFF' selectors, checking a group check box is functionally identical to selecting each individual sub-category.

RNA-Seq Similarity

RNA-Seq Similarity finds genes with expression patterns that are similar to that of a given gene. 'Similar to' in this case means that the pattern of higher and lower expression1 in the categories for the RNA-Seq expression experiment data you choose are close to those of your chosen gene, as measured by the correlation coefficient2 between the data for your given gene and each of the search result genes. Enter your query gene symbol in the box, and choose to search for genes with similar expression by developmental stage, tissue, treatments, or cell lines3. You can also specify a subset of experimental samples within a set of RNA-Seq expression data to use when making comparisons. The resulting genes can be exported to a FlyBase hit list.

1 Note that two expression patterns will be flagged as similar if the profile of peaks and troughs of expression have a similar shape, even though one expression pattern may have much higher or lower values overall.

2 FlyBase uses a generalized Spearman rank correlation for this statistic.

3The similarity tool works currently with only the modENCODE developmental RNA-Seq data; other RNA-Seq datasets will be included soon.

RNA-Seq By Region

RNA-Seq By Region can be used to compare the RNA-Seq signal for a given region across samples, or to compare signal between two regions within a single sample.

Supply the symbol or FBgn ID for one gene of interest, and choose to query either the developmental or tissue RNA-Seq profiles. The tool will retrieve the locations of all exons for the gene specified, and report the average RNA-Seq signal (per base) for each region. Values are normalized for read depth across a given set, and reported as log2 values. Alternatively, input one or more genomic regions using standard GBrowse coordinate nomenclature (e.g., 3L: 20000..21000) and the tool will return the average RNA-Seq signal for each submitted genomic span; for multiple regions, enter one region per line in the input box.

For fast visual inspection of the potentially large expression tables, the background of table cells is colored in the same way as in the heatmap coloring schema of expression histograms in FlyBase gene reports. These tables can be copied and downloaded and used for further analysis.

Note that the signal reported for a given region may arise from the expression of multiple transcripts from one or more genes; the tool simply reports the total signal for that region and does not attempt to assign the expression in that region to any specific transcript.

Other Tools

GoogleTM FlyBase

We have a Google search box (found in box the Tools and Help menus) that can be used to search the entire FlyBase site in a Google-style manner. Google FlyBase is best used to search documentation, but not necessarily to search data about a gene, as it does not restrict its search to specific data fields, and results depend upon Google indexing which cannot be controlled by FlyBase (i.e. you may not find results specific to the newest release).

Interactions Browser

The Interactions Browser, accessible under the 'Tools' menu or at the top of the Interaction section of allele reports, provides a graphical way of exploring the genetic interaction data (enhancer data only, suppressor data only, or both). The browser works in two modes: You can either search for the interactions of an allele, or the interactions of a gene. The latter will show the interactions of all alleles of the gene. Each node of an interaction diagram is a hyperlink, which enables you to navigate and browse the complex web of known genetic interactions. Placing your cursor over the center of a node activates a pop-up window that in the case of a network of gene interactions contains a summary of the function of that particular gene, while in the case of interactions between alleles shows the context in which the interactions of that allele have been reported. For more information, go to the Interactions Browser help documentation.

ImageBrowse

ImageBrowse allows the user to browse through image reports by organ system, life-cycle, tagma, or germ layer, as well as to browse images of different Drosophilids. This section also gives access to posters of common visible markers in D. melanogaster, as well as miscellaneous images and quick-time films. Controlled vocabulary terms are used to annotate and label the images. To search images, and to link relevant gene, allele, transcript and protein records to stages of development, a region of the body or to a specific body part, go to Vocabularies.

Find a Person

FlyBase compiles Drosophila Researcher information to aid networking and communication in the community. To add yourself to the database, use the Add a new Person tool, found in the Tools menu.

Find a Person allows you to select which field of the personal data you want to search. For example, to search for all the registered Drosophila Researchers in a particular city, you can select the city field and search for the city of interest.

Simple combinatorial searches are also possible, for example you can search for 'Smith' in 'Texas', if you so desire.

The search can also be to Principal Investigators (PIs) by ticking the 'Search for PIs only' box.

Update an Address

To update an existing address, the name of the person concerned should be typed into the text box. If the name is ambiguous, e.g. Smith, then a list of full names containing the name is provided.

From here, clicking on the name to change allows the details to be altered. A confirmation e-mail will be sent to the given e-mail address to confirm the changes.