Difference between revisions of "FlyBase:ModENCODE data at FlyBase"

From FlyBase Wiki
Jump to navigation Jump to search
Line 34: Line 34:
 
===JBrowse Track Listing===
 
===JBrowse Track Listing===
 
JBrowse tracks sourced from modENCODE data can be enabled via the “Available Tracks” menu on the left pane of the [http://flybase.org/jbrowse/?data=data%2Fjson%2Fdmel JBrowse] viewer. These tracks include:
 
JBrowse tracks sourced from modENCODE data can be enabled via the “Available Tracks” menu on the left pane of the [http://flybase.org/jbrowse/?data=data%2Fjson%2Fdmel JBrowse] viewer. These tracks include:
* Expression > RNA-Seq > modENCODE transcriptomes > Developmental stages
 
* Expression > RNA-Seq > modENCODE transcriptomes > Cell lines
 
* Expression > RNA-Seq > modENCODE transcriptomes > Treatments/Conditions
 
 
* Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Digestive system
 
* Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Digestive system
 
* Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Fat body and salivary glands
 
* Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Fat body and salivary glands
Line 81: Line 78:
 
|}
 
|}
  
The FlyBase JBrowse wig files are available for download on the [http://ftp.flybase.net/flybase/associated_files/RNA-seq/ FlyBase FTP site]. Note that FlyBase wig files are slightly different from the [https://genomebrowser.wustl.edu/goldenPath/help/wiggle.html standard wig format], in that the optional "span" parameter is implied (not explicitly declared): for example, for a run of 5 nucleotides on chr 2L having the same value at each position, only the value for the first position is declared, without the preceding "variableStep chrom=chr2 span=5" information.
+
The FlyBase JBrowse wig files are available for download on the [http://ftp.flybase.net/flybase/associated_files/RNA-seq/ FlyBase FTP site]. Note that FlyBase wig files are slightly different from the [https://genomebrowser.wustl.edu/goldenPath/help/wiggle.html standard wig format], in that the optional "span" parameter is implied (not explicitly declared): for example, for a run of 5 nucleotides on chr 2L having the same value at each position, only the value for the first position is declared, without the preceding "variableStep chrom=chr2L span=5" information.
  
 
==RNA-Seq RPKM Data==
 
==RNA-Seq RPKM Data==

Revision as of 23:38, 3 March 2023

WARNING

This page is under construction. The link to this temp page is below. https://wiki.flybase.org/wiki/FlyBase:FlyBase_Help_Index#Tools_and_Downloads_Documentation Once this page is finalized, it's intended to replace content at this page. https://wiki.flybase.org/wiki/FlyBase:ModENCODE_RNA-Seq_Overview Once the replacement is complete, delete this page and links to it.

Overview

FlyBase offers a subset of modENCODE datasets that characterize gene expression and transcriptional regulation in D. melanogaster. The modENCODE datasets incorporated by FlyBase often represent high-level distillations of data, combining or synthesizing data from multiple modENCODE experiments, rather than the raw data from individual experiments. All modENCODE datasets at FlyBase are available through JBrowse; for RNA-seq datasets, FlyBase provides additional query/analysis tools, as well download files and data displays on gene reports. FlyBase dataset reports provide descriptions of how data was generated and analyzed, as well as links to the original raw data at data repositories.

Please note: While FlyBase hosts a selection of tools for browsing and using modENCODE data, it is not an exhaustive resource for all data generated from the modENCODE project.

RNA-Seq Query Tools and Browsers

The primary RNA-Seq data in FlyBase are the modENCODE data originally published in Graveley et al., 2011 and Brown et al., 2014, comprising 30 developmental stage expression profiles, 29 tissue expression profiles, 25 treatment/condition expression profiles and 24 cell line expression profiles. RNA-Seq reads were mapped to the Release 6 genome assembly as described in Brown et al., 2014; note that data for replicates of a given biological condition were combined. In JBrowse genomic views, several other RNA-Seq datasets are also presented, but the RNA-Seq query tools are restricted to the modENCODE datasets.

A series of video tutorials describing different RNA-Seq tools is available. See

JBrowse

FlyBase JBrowse has several tracks that display RNA-Seq expression profiles, which give coverage values base-by-base across the genome. Choose datasets for expression by stage, tissue, treatments, or cell lines. By default, the many tracks are displayed in the layered FlyBase “TopoView” format, and data are shown on a log2 scale, since they range over many orders of magnitude. Customization options are offered to help drill down into the data, accessed by clicking on the down arrow in the track title bar.

Alternate RNA-Seq Views
  • Space the data out. Increase the “vertical spacing between samples to prevent strong signal from one sample from obscuring the profile behind it.
  • Align the profiles. Change the “Samples presentation style” from “Tilted” (default) to “Vertical” to remove the horizontal offset between adjacent RNA-Seq profiles so that they align horizontally to the same genome position.
  • Choose the appropriate scaling method. Log2 scaling provides the best dynamic range for viewing both low and high signal together. Linear scaling is preferable in regions with high baseline signal, and provides a more intuitive view of the relative change in signal.

Additional JBrowse tracks display discrete transcription start sites, exon junctions and RNA A-to-I editing sites identified by analysis of modENCODE RNA-Seq data. The exon junction and editing site tracks also include additional data from non-modENCODE experiments.

JBrowse Track Listing

JBrowse tracks sourced from modENCODE data can be enabled via the “Available Tracks” menu on the left pane of the JBrowse viewer. These tracks include:

  • Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Digestive system
  • Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Fat body and salivary glands
  • Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Imaginal disc and other carcass
  • Expression > RNA-Seq > modENCODE transcriptomes > Tissues > CNS and adult head
  • Expression > RNA-Seq > modENCODE transcriptomes > Tissues > Gonads and male accessory glands
Track section Track name FlyBase Dataset(s) JBrowse Track Description
Transcript Level Features
-Transcription Start Sites (TSS)
TSS (modENCODE, embryo) mE_Transcription_Start_Sites Transcription Start Site Tracks
Transcript Level Features RNA-Seq exon junctions modENCODE_mRNA-Seq_U_junctions RNA-Seq Exon Junction Tracks
Transcript Level Features RNA Editing Sites mE_A-to-I_RNA_Editing_Sites RNA Editing Site Tracks
Expression
-RNA-Seq
 -modENCODE Transcriptomes
Developmental stages modENCODE_mRNA-Seq_development RNA-Seq Tracks
Expression
-RNA-Seq
 -modENCODE Transcriptomes
Cell lines modENCODE_mRNA-Seq_cell.B RNA-Seq Tracks
Expression
-RNA-Seq
 -modENCODE Transcriptomes
Treatments/Conditions modENCODE_mRNA-Seq_treatments RNA-Seq Tracks

The FlyBase JBrowse wig files are available for download on the FlyBase FTP site. Note that FlyBase wig files are slightly different from the standard wig format, in that the optional "span" parameter is implied (not explicitly declared): for example, for a run of 5 nucleotides on chr 2L having the same value at each position, only the value for the first position is declared, without the preceding "variableStep chrom=chr2L span=5" information.

RNA-Seq RPKM Data

For most modENCODE RNA-Seq samples, FlyBase calculated the "RPKM" gene expression level within the exonic extent of the gene, as described in Gelbart and Emmert, 2013. These RPKM values are recalculated with each FlyBase release to account for changes in gene transcript structure. For the purposes of presentation and queries, values were assigned to one of eight bins, from very low to extremely high. These RPKM values are displayed on gene reports (see the "Expression Data > High-Throughput Expression Data" section) and can be downloaded from directly from the gene report. RPKM data for all genes can be downloaded from the "Genes" section of the Downloads page.

RNA-Seq Profile

Go to RNA-Seq Part II: Using RNA-Seq Profile Search to see the associated video tutorial.

RNA-Seq Profile is a fine grained query tool, powered by modENCODE high-throughput RNA-Seq expression data (using FlyBase-computed RPKM expression values), that allows you to find genes with specific patterns of expression across several variables. Interested in development of the central nervous system? Search for genes that are expressed in these tissues during a specific developmental stage. Curious how toxins affect the fly reproductive system? Search for genes expressed in fly gonads that are activated (or suppressed) by exposure to Paraquat or Rotenone.

Choose datasets for expression by stage, tissue, treatments, or cell lines, or use several datasets in conjunction. Each dataset is presented in a form that allows you to select either narrow slices of the data, or larger sections for more coverage. You also have control over the levels of expression used in the search, allowing you to define distinct thresholds for the ON and OFF states. Keep in mind that extremely narrow search conditions may produce sparse or empty result sets. Feel free to experiment; the tool will remember your settings so that you can adjust, instead of needing to re-enter them. Search results can be exported, as usual, for further analysis or download.

NB: The group check box selectors are interpreted differently depending on whether you are making selections from the 'Expression ON' or 'Expression OFF' sections. 'Expression ON' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'OR'. This means that if a gene is expressed at or above the chosen expression level in any one or more of the selected stages it will be returned in the result list. To get 'AND' behavior (i.e., return only those genes which are expressed at the chosen level in each one of the selected stages) you must select each of the stages individually. 'Expression OFF' selectors: Selecting multiple stages using one of the grouping check boxes acts as an 'AND'. This means that for a gene to be returned in the result list, the observed level of expression must be at or below the selected level in all of the selected group stages. Therefore, for the 'expression OFF' selectors, checking a group check box is functionally identical to selecting each individual sub-category.

RNA-Seq Similarity

Go to RNA Seq Part III: Searching for Similarly Expressed Genes to see the associated video tutorial.

RNA-Seq Similarity finds genes with expression patterns that are similar to that of a given gene; this search option can also be launched from the relevant gene page. 'Similar to' in this case means that the pattern of higher and lower expression1 in the categories for the RNA-Seq expression experiment data you choose are close to those of your chosen gene, as measured by the correlation coefficient2 between the data for your given gene and each of the search result genes. Enter your query gene symbol in the box, and choose to search for genes with similar expression by developmental stage, tissue, treatments, or cell lines. You can also specify a subset of experimental samples within a set of RNA-Seq expression data to use when making comparisons. The resulting genes can be exported to a FlyBase hit list.

1 Note that two expression patterns will be flagged as similar if the profile of peaks and troughs of expression have a similar shape, even though one expression pattern may have much higher or lower values overall.

2 FlyBase uses a generalized Spearman rank correlation for this statistic.

RNA-Seq By Region

RNA-Seq By Region can be used to compare the RNA-Seq signal for a given region across samples, or to compare signal between two regions within a single sample.

Supply the symbol or FBgn ID for one gene of interest, and choose to query either the developmental or tissue RNA-Seq profiles. The tool will retrieve the locations of all exons for the gene specified, and report an average RNA-Seq signal for each region. Values are normalized for read depth across a given set, and reported as values from 1 to 50; very high read values are truncated at a value of 50. Alternatively, input one or more genomic regions using standard GBrowse coordinate nomenclature (e.g., X:350000..351000) and the tool will return the average RNA-Seq signal for each submitted genomic span; for multiple regions, enter one region per line in the input box.

For fast visual inspection of the potentially large expression tables, the background of table cells is colored in the same way as in the heatmap coloring schema of expression histograms in FlyBase gene reports. These tables can be copied and downloaded and used for further analysis.

Note that the signal reported for a given region may arise from the expression of multiple transcripts from one or more genes; the tool simply reports the total signal for that region and does not attempt to assign the expression in that region to any specific transcript.

Download RNA-Seq Data

Each FlyBase analysis tool described above offers a way to download the results.

RPKM data for a specific gene can be downloaded from directly from the gene report "Expression Data" section. RPKM data for all genes can be downloaded from the "Genes" section of the Downloads page.

FlyBase JBrowse wig files are available for download on the FlyBase FTP site. Note that the format is more compact: for a run of nucleotides having the same coverage value, only the first nucleotide in the run is declared in the file.

Transcriptional Regulation Datasets

JBrowse

FlyBase JBrowse has several tracks that display regions of distinct chromatin configurations, origins of replication, insulators, and transcription factor binding sites (TFBS), determined by ChIP-chip and ChIP-Seq experiments.

JBrowse Track Listing

JBrowse tracks describing aspects of chromatin and transcriptional regulation that were sourced from modENCODE data can be enabled via the “Available Tracks” menu on the left pane of the JBrowse viewer. These tracks are all located in the Genome Level Features track section and are listed in the table below.

Track section Track name FlyBase Dataset(s) JBrowse Track Description
Genome Level Features
-Chromatin Features
Chromatin Domains (9-state model, S2 cells) Chromatin_types_mE1.S2 Chromatin Feature Tracks
Genome Level Features
-Chromatin Features
Chromatin Domains (9-state model, BG3 cells) Chromatin_types_mE1.BG3 Chromatin Feature Tracks
Genome Level Features
-Transcriptional Regulatory Elements
Insulators (modENCODE, class I) Insulator_Class_I.mE01 Transcriptional Regulatory Elements Tracks
Genome Level Features
-Transcriptional Regulatory Elements
Insulators (modENCODE, class II) Insulator_Class_II.mE01 Transcriptional Regulatory Elements Tracks
Genome Level Features
-Transcriptional Regulatory Elements
Putative PREs (modENCODE) mE1_HDAC_PRE Transcriptional Regulatory Elements Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, TFBS HOT spot analysis mE1_TFBS_HSA TFBS Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, ZINC Finger TFBS mE1_TFBS_disco
mE1_TFBS_ftz-f1
mE1_TFBS_GATAe
BDTNP1_TFBS_hb
mE1_TFBS_hkb
BDTNP1_TFBS_kni
mE1_TFBS_Kr
mE1_TFBS_sbb
mE1_TFBS_sens
BDTNP1_TFBS_shn
BDTNP1_TFBS_sna
BDTNP1_TFBS_tll
mE1_TFBS_zfh1
TFBS Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, Homeodomain TFBS BDTNP1_TFBS_bcd
mE1_TFBS_cad
mE1_TFBS_Dll
mE1_TFBS_en
mE1_TFBS_eve
BDTNP1_TFBS_ftz
mE1_TFBS_inv
BDTNP1_TFBS_prd
mE1_TFBS_Ubx
BDTNP1_TFBS_z
TFBS Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, Helix-loop-helix TFBS BDTNP1_TFBS_da
mE1_TFBS_h
mE1_TFBS_kn
BDTNP1_TFBS_twi
TFBS Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, BTB/POS ChIP TFBS mE1_TFBS_bab1
mE1_TFBS_chinmo
mE1_TFBS_Trl
mE1_TFBS_ttk
TFBS Tracks
Genome Level Features
-Transcription Factor Binding Sites (TFBS)
 -TFBS (modENCODE, ChIP-chip, whole embryo)
whole embryo, Other classes TFBS mE1_TFBS_cnc
mE1_TFBS_D
BDTNP1_TFBS_dl
BDTNP1_TFBS_gt
mE1_TFBS_jumu
BDTNP1_TFBS_Mad
BDTNP1_TFBS_Med
mE1_TFBS_run
BDTNP1_TFBS_slp1
mE1_TFBS_Stat92E
TFBS Tracks
Genome Level Features
-Other Sequence Elements
Origins of replication (modENCODE, Kc, S2, BG3 cells) mE_Early_Replication_Origins_cells Other Sequence Element Tracks

Download Data

For JBrowse tracks representing discrete regions (like ChIP binding regions, chromatin domains, etc), the locations of those features can be downloaded using the JBrowse track menu (at the top left of the track), for either the region in view or the entire chromosome scaffold being viewed, in GFF3, BED or Sequin formats. Unfortunately, download of genome-wide data for a given JBrowse track is not supported.
Currently, the only way to download genome-wide data for a given dataset in JBrowse is to parse it from the single large FlyBase GFF file that powers FlyBase JBrowse, using dataset name and/or FlyBase "FBlc" identifier in column 9 of the GFF file: e.g., _Library=mE1_TFBS_chinmo:FBlc0000289.

Finding Your modENCODE Dataset of Interest

Below is a list of all FlyBase datasets representing modENCODE data, along with modENCODE, NCBI GEO and SRA identifiers. FlyBase datasets offer succinct descriptions of the sample prep and data analysis methods. For raw data, please follow links to NCBI GEO or SRA. If you can't find your modENCODE dataset of interest, we recommend searching for the dataset at NCBI GEO or dataMED. If searching using the modENCODE identifier, it may be helpful to add modencode_ or modencode_submission_ as a prefix to the identifier.