Difference between revisions of "FlyBase:Overview"
Line 21: | Line 21: | ||
'''Querying strains:''' Strains can be searched using the QuickSearch ‘Simple’ tab. | '''Querying strains:''' Strains can be searched using the QuickSearch ‘Simple’ tab. | ||
− | |||
=== Cell lines === | === Cell lines === |
Revision as of 13:01, 30 June 2017
Introduction
The Homepage
The Gene Report
Alleles and Phenotypes
Expresson Data
Interactions
Genomic Data
Reagents
There are several ways to find reagents associated with a specific gene or genomic region. The ‘Stocks and Reagents’ section of the Gene Report is a good place to start. Here, subsections list publicly available fly stocks, genomic and cDNA clones, cell-based RNAi reagents and antibodies described in the published literature (Fig. 2). Other reagents are best found by searching a genomic region of interest using GBrowse, FeatureMapper or CytoSearch. For example, the GMR (45) and VDRC (44) putative enhancer collections are not associated with specific genes, while some classes of transgenic insertions are not listed in the Gene Report. Moreover, a visual representation of the location of a sequence-based reagent relative to the gene of interest is often informative when planning experiments.
Stocks
Stock Reports display the stock list genotype and the source collection, together with the stock number hyperlinked to the specific record at the appropriate stock center to facilitate ordering. There are links to Stock Reports from other appropriate reports (primarily alleles, aberrations, transgenic constructs and insertions) throughout FlyBase. The Bloomington Drosophila Stock Center is the most widely represented source, though many others are included--a complete list can be found in the ‘Links’ menu on the NavBar.
Querying stocks: Stocks can be searched specifically by selecting ‘stocks’ in the ‘Data Class’ tab of QuickSearch.
Strains
FlyBase Strain Reports contain data about wild type strains such as ‘Oregon-R’, significant mutant strains such as ‘iso-1’ (the D. melanogaster strain sequenced by the BDGP (33)), as well as the 200 or so inbred lines generated by the Drosophila Genetics Reference Panel (46). The reports include information on the origin and history of the strain alongside any known genetic or phenotypic components (e.g., the ‘iso-1’ strain harbors several mutations). Where relevant, links are also provided to Large Dataset Metadata Reports (section 9.1) that describe strain collections, and to Stock Reports to facilitate ordering. (Note that stocks are instances of strains in theory, but they are effectively distinct in time and place and may have characteristics that differ from the strains from which they descended.)
Querying strains: Strains can be searched using the QuickSearch ‘Simple’ tab.
Cell lines
Cell Line Reports display data obtained from the Drosophila Genomics Resource Center (DGRC) on cell lines, such as ‘Kc167’ or ‘S2R+’. The reports include the source and development stage of each line, its sex and karyotype (where known), and any parental or descendent lines. A link back to the DGRC is also provided for additional data and ordering information.
Querying cell lines: Cell lines can be searched specifically by selecting ‘cell lines’ in the ‘Data Class’ tab of QuickSearch.
cDNAs
cDNAs are shown in GBrowse and appear in the ‘Stocks and Reagents’ section of the Gene Report of the aligned gene(s). Links from GBrowse go to the GenBank report; links from the Gene Report go to the FlyBase Clone Report. The Clone Report includes the sequence, links to GenBank, and fields for ‘Known Problems’ and ‘FlyBase assessment’. Examples of known problems are clones that are chimeric or that contain genomic DNA or transposon sequences. The FlyBase assessment field displays a note if the clone has been replaced, for example “Caution: This cDNA clone replaced by FI01005”. There is also a link to the DGRC where clones are available from that resource.
Querying cDNAs: cDNA clones can be searched specifically by selecting ‘clones’ in the ‘Data Class’ tab of QuickSearch. FeatureMapper should be used to find cDNAs associated with a specific gene or genomic region.
Integrated Reports
As the amount of Drosophila data and resources increase in FlyBase, it has become both necessary and useful to organize and integrate related data into discrete sets or collections. This has multiple benefits, including the ability to associate metadata across a range of related entities, and to present related data to users in new ways that aid comprehension. To date, FlyBase has developed three types of such integrated reports.
Large dataset metadata
Large Dataset Metadata Reports, previously named Library/Collection Reports, provide information on large datasets and reagent collections that apply to the set as a whole. Examples of datasets are the protein interaction network defined by the Drosophila Protein interaction Mapping (DPiM) project (29), the set of RAMPAGE transcription start sites (36), and datasets generated by the modENCODE project (22). Examples of collections are the set of dsRNA amplicons used for RNAi-knockdown assays in cell culture by the Drosophila RNAi Screening Center (49), the set of defined X-chromosome duplications made by the Bloomington Stock Center (50), and several large construct and insertion collections. Metadata describing cDNA libraries are also captured in this format. The Large Dataset Metadata Report includes the type of dataset or collection, a brief description of the set, a summary of the experimental details, and a link to download all the associated features. Links to external data repositories and reagent sources are provided where relevant. The ‘Description’ field of the dataset report is propagated to each member report; reciprocal links are provided.
Querying large dataset metadata: The ‘Simple’ or the ‘Data Class (large dataset metadata)’ tabs of QuickSearch can be used to find datasets and collections of interest.
Gene groups
Gene Group Reports have been introduced to allow easy access to, and analysis of, related sets of D. melanogaster genes and their associated data (47). Examples of gene groups include members of a gene family (Actins, Wnts…), subunits of a protein complex (proteasome, ribosome…), or other functional groupings (protein kinases, Ubiquitin E3 ligases…). All gene groups in FlyBase are based on published literature and the basis for the membership of each group is clearly attributed. The main feature of these reports is a ‘Members’ table that lists the genes comprising the group, arranged into a series of subgroups where appropriate. Buttons are provided to facilitate the downloading of associated data (phenotypes, expression data, protein interactions etc.) using Batch Download (section 10.2), or to further refine or analyze the gene set by exporting it to a standard hit list. Also shown are links to equivalent gene groups for other organisms, including nematodes (WormBase (48)) and humans (HGNC (8)). To aid navigation, the ‘Families, Domains and Molecular Function’ section of the Gene Report contains a link to any associated gene group(s) (Fig. 2).
Querying gene groups: Gene groups can be retrieved by entering the symbol/name of a group or any member gene in the ‘Gene Groups’ tab of QuickSearch. This tab also includes a link to a browsable list of all current gene groups in FlyBase.
Human disease models
Human Disease Model Reports provide a less specialized entry point into FlyBase for researchers interested in Drosophila models of human disease (17). Data from numerous outside sources, including OMIM, and from recent reviews are presented in a general ‘Disease Summary’ section, followed by information on orthology between a human gene implicated in the disease and the related Drosophila gene(s). For many diseases, multiple causative genes have been implicated; OMIM describes these as different disease subtypes and groups them into ‘phenotypic series’. In the Human Disease Model Report, such a phenotypic series of subtypes is presented in a table titled ‘Related Diseases’, which includes links to other relevant Human Disease Model Reports and provides a quick view of which disease subtypes have been modeled in flies.
The major portion of the disease report is devoted to ‘Experimental Findings’ in Drosophila, focusing on disease-related implications and results. Descriptions of specific experiments are meant to be generally accessible, with links to Allele Reports with more detailed information. Results may include data using both fly genes and human genes introduced into flies. The ‘Experimental Findings’ section initiates with a FlyBase-authored summary that presents a concise review, including phenotypes, interactions, and suitability of the model for drug assays; in addition, new findings and emerging mechanistic themes are highlighted. At the end of this section, a link to the FlyBase Disease Wiki is provided; comments and contributions from users are encouraged, especially those with expertise in the specific disease model. The last sections of the report draw relevant data from other sections of FlyBase, including physical interaction data for the orthologous Drosophila gene(s), a table of genetic reagents and stocks useful for investigations of human disease, and a table of Disease Ontology-based annotations of alleles used for that disease model (see section 4.3).
There are links to relevant Human Disease Model Reports in the ‘Human Disease Model Data’ section of Gene Reports (Fig. 2). Note that many such links are found in FlyBase Gene Reports for human genes (e.g., Hsap\SNCA and Hsap\TARDBP).
Querying Human Disease Model Reports: These reports can be found by using the ‘Human Disease’ or ‘Simple’ tabs of QuickSearch, or by searching the Disease Ontology within the Vocabularies tool.
Bulk Data Analysis and Downloads
Users increasingly want to be able to process data in bulk. They may have generated a hit list of genes (or any other data class) within FlyBase, or have a list of IDs from elsewhere to upload, and wish to analyze/refine this list or obtain associated data. Alternatively, users may wish to directly obtain bulk data files corresponding to a particular data type for processing off-line.
Uploading and analyzing data
A list of IDs (e.g., gene symbols or CG numbers, allele or insertion symbols, FlyBase identifiers) can be pasted or uploaded into the Upload/Convert IDs tool (Fig. 6; accessed via the Tools menu on the NavBar). This tool will then validate the list, updating any obsolete IDs to the current version where possible, and generate a ‘Conversion report’ clearly indicating if any of the submitted IDs failed verification. The user can choose to correct these cases, or ignore them before proceeding to convert the list into a standard FlyBase hit list (see section 2.2). This list can then be further analyzed/refined before being exported or downloaded as required.
Downloading data
Batch Download is a powerful tool for generating customized output files in various formats for most data types in FlyBase (11). Users may arrive at Batch Download via a hit list (as described above), by navigating to it from the Tools menu of the NavBar, or by clicking on its pictograph on the homepage. If the first, then the input list will be pre-filled (Fig. 6); otherwise the user can paste in or upload a list of symbols or IDs directly. Depending on the nature of the input and the desired outcome, the output format can then be specified as ‘FASTA Sequence’ (with the option to further specify introns, UTRs, CDS etc.), ‘Database Format’ (XML), or as ‘Field Data’ (with output options of an HTML table, a tab-separated value (tsv) file, or in the same format as the precomputed files described below). If the ‘Field Data’ option is selected, the user can then specify any combination of data fields (appropriate to the given data class) from a page styled in the same format as a standard FlyBase report page.
Bulk files of FlyBase data can be downloaded using our FTP site (ftp://ftp.flybase.org/releases/) or the ‘Downloads’ menu of the NavBar on the website (see the ‘Overview’ page under the Downloads menu for more details). ‘Precomputed files’ contain particular slices of FlyBase data that users or collaborators have requested over the years or are otherwise difficult to obtain in bulk (Table 3). Notable recent additions include D. melanogaster unique protein isoforms, RPKM gene expression values, gene groups, and physical interactions. Also included are several useful correspondence tables and the ontology files used in FlyBase (Table 3). In addition, Chado XML (database format) files are provided for all FlyBase data classes and comprehensive sets of FASTA, GFF and GTF files are available for the twelve originally sequenced and annotated Drosophila species (see section 7.1). The FASTA files comprise many different cuts of genomic data, including annotation categories such as small RNA classes and pseudogenes, components of gene model annotations such as exons, introns, UTRs and predicted translations, as well as other genome features such as transposons and intergenic sequences. As described above, Batch Download can also be used to obtain specified subsets of data in precomputed file, Chado XML or FASTA format by selecting the appropriate output options.
Most bulk files are regenerated for every release of FlyBase. Those corresponding to the current or previous (archived) versions of FlyBase are found under the appropriate submenus/subfolders on the web/FTP site. The release version used for a particular file is indicated in the file name and in the header lines of the file itself.
The FlyBase Community
FlyBase engages with our user community through multiple approaches. The primary method for users to get in touch with FlyBase about any matter remains our ‘Contact FlyBase’ page, accessible via the ‘Help’ menu on the NavBar or the link in the footer of any FlyBase page. All other community resources are grouped under the ‘Community’ menu of the NavBar and/or are found on the homepage.
If a user wants to specifically alert us to a Drosophila publication or data therein to be added to FlyBase, then the ‘Fast Track Your Paper’ (FTYP) tool should be used (51). This tool allows the user to indicate the key genes studied and flag data types present in a paper. The resulting gene-to-publication links are submitted directly to the FlyBase database while the data type information is used to prioritize the paper for more detailed curation. We actively solicit FTYP submissions using our ‘EmailAuthor’ pipeline, whereby the corresponding author of a Drosophila publication is automatically sent an email that includes a link to a personalized FTYP form (51). Approximately 50% of authors respond to this request, thereby reducing by half the amount of manual triaging to be done by FlyBase curators.
Our recently launched ‘FlyBase Community Advisory Group’ (FCAG) is a worldwide group of over 500 volunteers (lab heads, postdocs, students, technicians) who use FlyBase for a range of purposes. We contact this group up to six times per year with a survey on a variety of subjects to get feedback about how data collection, presentation and searching on FlyBase can be improved. By consulting this relatively large, diverse group of researchers, we hope to implement changes to FlyBase that are helpful for the greatest number of people.
Users may also help improve FlyBase by contributing to the Human Disease Wiki (described in section 9.3) or the FlyGene Wiki. There is a link to the latter at the top and within the ‘Summaries’ section of each Gene Report. This is pre-seeded with the automatically-generated FlyBase summary and users are encouraged to modify or add to this text to build up a more complete and readable summary of each gene’s main features and functions.
The FlyBase Forum is a Google™ Group that provides an alternative, more open platform for users to interact both with FlyBase and with each other. The forum has two areas: one for general questions and discussions about FlyBase and Drosophila protocols etc., and the other for relevant job postings.
Users are made aware of new or changed features in FlyBase through any of several means. First, there are the ‘News’ and ‘Commentary’ sections of the FlyBase homepage (Fig. 1). Second, users can sign up to receive an occasional Newsletter via email by clicking the link on the homepage. The Newsletter contains release announcements, significant website updates, and other important Drosophila community news. Third, to obtain more frequent updates, users can follow FlyBase on Twitter™ by clicking on the icon in the footer of any FlyBase page. Fourth, users can choose to subscribe to any FlyBase record (a specific gene, transgenic construct, reference etc.) and receive automatic updates through a feed reader by clicking the icon in the ‘Recent Updates’ section of any report page. Finally, users have the opportunity to see and hear about FlyBase updates in person at the Annual Drosophila Research Conference in the USA and the biennial European Drosophila Research Conference, where FlyBase representatives give presentations and are available to answer questions. Previous conference presentations and pamphlets can be obtained via the ‘FlyBase Guides’ link under the ‘Help’ menu in the NavBar.