Difference between revisions of "FlyBase:Overview"

From FlyBase Wiki
Jump to navigation Jump to search
(→‎The FlyBase Community: Added PMC5107610 text)
(→‎Bulk Data Analysis and Downloads: added PMC5107610 text)
Line 11: Line 11:
 
== Integrated Reports ==
 
== Integrated Reports ==
 
== Bulk Data Analysis and Downloads ==
 
== Bulk Data Analysis and Downloads ==
 +
Users increasingly want to be able to process data in bulk. They may have generated a hit list of genes (or any other data class) within FlyBase, or have a list of IDs from elsewhere to upload, and wish to analyze/refine this list or obtain associated data. Alternatively, users may wish to directly obtain bulk data files corresponding to a particular data type for processing off-line.
 +
 +
=== Uploading and analyzing data ===
 +
 +
A list of IDs (e.g., gene symbols or CG numbers, allele or insertion symbols, FlyBase identifiers) can be pasted or uploaded into the Upload/Convert IDs tool (Fig. 6; accessed via the Tools menu on the NavBar). This tool will then validate the list, updating any obsolete IDs to the current version where possible, and generate a ‘Conversion report’ clearly indicating if any of the submitted IDs failed verification. The user can choose to correct these cases, or ignore them before proceeding to convert the list into a standard FlyBase hit list (see section 2.2). This list can then be further analyzed/refined before being exported or downloaded as required.
 +
 +
=== Downloading data ===
 +
 +
Batch Download is a powerful tool for generating customized output files in various formats for most data types in FlyBase (11). Users may arrive at Batch Download via a hit list (as described above), by navigating to it from the Tools menu of the NavBar, or by clicking on its pictograph on the homepage. If the first, then the input list will be pre-filled (Fig. 6); otherwise the user can paste in or upload a list of symbols or IDs directly. Depending on the nature of the input and the desired outcome, the output format can then be specified as ‘FASTA Sequence’ (with the option to further specify introns, UTRs, CDS etc.), ‘Database Format’ (XML), or as ‘Field Data’ (with output options of an HTML table, a tab-separated value (tsv) file, or in the same format as the precomputed files described below). If the ‘Field Data’ option is selected, the user can then specify any combination of data fields (appropriate to the given data class) from a page styled in the same format as a standard FlyBase report page.
 +
 +
Bulk files of FlyBase data can be downloaded using our FTP site (ftp://ftp.flybase.org/releases/) or the ‘Downloads’ menu of the NavBar on the website (see the ‘Overview’ page under the Downloads menu for more details). ‘Precomputed files’ contain particular slices of FlyBase data that users or collaborators have requested over the years or are otherwise difficult to obtain in bulk (Table 3). Notable recent additions include D. melanogaster unique protein isoforms, RPKM gene expression values, gene groups, and physical interactions. Also included are several useful correspondence tables and the ontology files used in FlyBase (Table 3). In addition, Chado XML (database format) files are provided for all FlyBase data classes and comprehensive sets of FASTA, GFF and GTF files are available for the twelve originally sequenced and annotated Drosophila species (see section 7.1). The FASTA files comprise many different cuts of genomic data, including annotation categories such as small RNA classes and pseudogenes, components of gene model annotations such as exons, introns, UTRs and predicted translations, as well as other genome features such as transposons and intergenic sequences. As described above, Batch Download can also be used to obtain specified subsets of data in precomputed file, Chado XML or FASTA format by selecting the appropriate output options.
 +
 +
Most bulk files are regenerated for every release of FlyBase. Those corresponding to the current or previous (archived) versions of FlyBase are found under the appropriate submenus/subfolders on the web/FTP site. The release version used for a particular file is indicated in the file name and in the header lines of the file itself.
 +
 
== The FlyBase Community ==
 
== The FlyBase Community ==
 
FlyBase engages with our user community through multiple approaches. The primary method for users to get in touch with FlyBase about any matter remains our ‘Contact FlyBase’ page, accessible via the ‘Help’ menu on the NavBar or the link in the footer of any FlyBase page. All other community resources are grouped under the ‘Community’ menu of the NavBar and/or are found on the homepage.
 
FlyBase engages with our user community through multiple approaches. The primary method for users to get in touch with FlyBase about any matter remains our ‘Contact FlyBase’ page, accessible via the ‘Help’ menu on the NavBar or the link in the footer of any FlyBase page. All other community resources are grouped under the ‘Community’ menu of the NavBar and/or are found on the homepage.

Revision as of 12:53, 30 June 2017

Summary

Introduction

The Homepage

The Gene Report

Alleles and Phenotypes

Expresson Data

Interactions

Genomic Data

Reagents

Integrated Reports

Bulk Data Analysis and Downloads

Users increasingly want to be able to process data in bulk. They may have generated a hit list of genes (or any other data class) within FlyBase, or have a list of IDs from elsewhere to upload, and wish to analyze/refine this list or obtain associated data. Alternatively, users may wish to directly obtain bulk data files corresponding to a particular data type for processing off-line.

Uploading and analyzing data

A list of IDs (e.g., gene symbols or CG numbers, allele or insertion symbols, FlyBase identifiers) can be pasted or uploaded into the Upload/Convert IDs tool (Fig. 6; accessed via the Tools menu on the NavBar). This tool will then validate the list, updating any obsolete IDs to the current version where possible, and generate a ‘Conversion report’ clearly indicating if any of the submitted IDs failed verification. The user can choose to correct these cases, or ignore them before proceeding to convert the list into a standard FlyBase hit list (see section 2.2). This list can then be further analyzed/refined before being exported or downloaded as required.

Downloading data

Batch Download is a powerful tool for generating customized output files in various formats for most data types in FlyBase (11). Users may arrive at Batch Download via a hit list (as described above), by navigating to it from the Tools menu of the NavBar, or by clicking on its pictograph on the homepage. If the first, then the input list will be pre-filled (Fig. 6); otherwise the user can paste in or upload a list of symbols or IDs directly. Depending on the nature of the input and the desired outcome, the output format can then be specified as ‘FASTA Sequence’ (with the option to further specify introns, UTRs, CDS etc.), ‘Database Format’ (XML), or as ‘Field Data’ (with output options of an HTML table, a tab-separated value (tsv) file, or in the same format as the precomputed files described below). If the ‘Field Data’ option is selected, the user can then specify any combination of data fields (appropriate to the given data class) from a page styled in the same format as a standard FlyBase report page.

Bulk files of FlyBase data can be downloaded using our FTP site (ftp://ftp.flybase.org/releases/) or the ‘Downloads’ menu of the NavBar on the website (see the ‘Overview’ page under the Downloads menu for more details). ‘Precomputed files’ contain particular slices of FlyBase data that users or collaborators have requested over the years or are otherwise difficult to obtain in bulk (Table 3). Notable recent additions include D. melanogaster unique protein isoforms, RPKM gene expression values, gene groups, and physical interactions. Also included are several useful correspondence tables and the ontology files used in FlyBase (Table 3). In addition, Chado XML (database format) files are provided for all FlyBase data classes and comprehensive sets of FASTA, GFF and GTF files are available for the twelve originally sequenced and annotated Drosophila species (see section 7.1). The FASTA files comprise many different cuts of genomic data, including annotation categories such as small RNA classes and pseudogenes, components of gene model annotations such as exons, introns, UTRs and predicted translations, as well as other genome features such as transposons and intergenic sequences. As described above, Batch Download can also be used to obtain specified subsets of data in precomputed file, Chado XML or FASTA format by selecting the appropriate output options.

Most bulk files are regenerated for every release of FlyBase. Those corresponding to the current or previous (archived) versions of FlyBase are found under the appropriate submenus/subfolders on the web/FTP site. The release version used for a particular file is indicated in the file name and in the header lines of the file itself.

The FlyBase Community

FlyBase engages with our user community through multiple approaches. The primary method for users to get in touch with FlyBase about any matter remains our ‘Contact FlyBase’ page, accessible via the ‘Help’ menu on the NavBar or the link in the footer of any FlyBase page. All other community resources are grouped under the ‘Community’ menu of the NavBar and/or are found on the homepage.

If a user wants to specifically alert us to a Drosophila publication or data therein to be added to FlyBase, then the ‘Fast Track Your Paper’ (FTYP) tool should be used (51). This tool allows the user to indicate the key genes studied and flag data types present in a paper. The resulting gene-to-publication links are submitted directly to the FlyBase database while the data type information is used to prioritize the paper for more detailed curation. We actively solicit FTYP submissions using our ‘EmailAuthor’ pipeline, whereby the corresponding author of a Drosophila publication is automatically sent an email that includes a link to a personalized FTYP form (51). Approximately 50% of authors respond to this request, thereby reducing by half the amount of manual triaging to be done by FlyBase curators.

Our recently launched ‘FlyBase Community Advisory Group’ (FCAG) is a worldwide group of over 500 volunteers (lab heads, postdocs, students, technicians) who use FlyBase for a range of purposes. We contact this group up to six times per year with a survey on a variety of subjects to get feedback about how data collection, presentation and searching on FlyBase can be improved. By consulting this relatively large, diverse group of researchers, we hope to implement changes to FlyBase that are helpful for the greatest number of people.

Users may also help improve FlyBase by contributing to the Human Disease Wiki (described in section 9.3) or the FlyGene Wiki. There is a link to the latter at the top and within the ‘Summaries’ section of each Gene Report. This is pre-seeded with the automatically-generated FlyBase summary and users are encouraged to modify or add to this text to build up a more complete and readable summary of each gene’s main features and functions.

The FlyBase Forum is a Google™ Group that provides an alternative, more open platform for users to interact both with FlyBase and with each other. The forum has two areas: one for general questions and discussions about FlyBase and Drosophila protocols etc., and the other for relevant job postings.

Users are made aware of new or changed features in FlyBase through any of several means. First, there are the ‘News’ and ‘Commentary’ sections of the FlyBase homepage (Fig. 1). Second, users can sign up to receive an occasional Newsletter via email by clicking the link on the homepage. The Newsletter contains release announcements, significant website updates, and other important Drosophila community news. Third, to obtain more frequent updates, users can follow FlyBase on Twitter™ by clicking on the icon in the footer of any FlyBase page. Fourth, users can choose to subscribe to any FlyBase record (a specific gene, transgenic construct, reference etc.) and receive automatic updates through a feed reader by clicking the icon in the ‘Recent Updates’ section of any report page. Finally, users have the opportunity to see and hear about FlyBase updates in person at the Annual Drosophila Research Conference in the USA and the biennial European Drosophila Research Conference, where FlyBase representatives give presentations and are available to answer questions. Previous conference presentations and pamphlets can be obtained via the ‘FlyBase Guides’ link under the ‘Help’ menu in the NavBar.