Difference between revisions of "FlyBase:JBrowse Tracks"

From FlyBase Wiki
Jump to navigation Jump to search
Line 25: Line 25:
 
'''PeptideAtlas peptides''' Indicated in red. Alignment of peptide sequences determined by mass spectroscopy, derived from polypeptides isolated from the sequenced strain at various developmental stages. Contributed by the Center for Model Organism Proteomes, SystemsX and Research Priority Project of the University of Zurich, Switzerland. For more information, see [https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/buildDetails?atlas_build_id=352 Peptide Atlas]
 
'''PeptideAtlas peptides''' Indicated in red. Alignment of peptide sequences determined by mass spectroscopy, derived from polypeptides isolated from the sequenced strain at various developmental stages. Contributed by the Center for Model Organism Proteomes, SystemsX and Research Priority Project of the University of Zurich, Switzerland. For more information, see [https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/buildDetails?atlas_build_id=352 Peptide Atlas]
  
'''Proteomic peptides''' Alignment of peptide sequences identified in the developmental proteome of Casas-Vila et al., 2017 (http://flybase.org/reports/FBrf0235991) to the reference genome assembly. Glyphs are hyperlinked to sequence feature reports, with additional information on proteins that match a given peptide. Only peptides that map to a unique locus within the portion of the genome encoding annotated proteins are shown.
+
'''Proteomic peptides (uniquely mapping)''' Alignment of peptide sequences identified in the developmental proteome of Casas-Vila et al., 2017 (http://flybase.org/reports/FBrf0235991) to the reference genome assembly. Glyphs are hyperlinked to sequence feature reports, with additional information on proteins that match a given peptide. Only peptides that map to a unique locus within the portion of the genome encoding annotated proteins are shown.
  
 
'''Protein domains (PFAM)''' [http://pfam.xfam.org/ Pfam] protein domains identified in annotated proteins, as obtained from [https://www.ebi.ac.uk/interpro/download.html InterPro], are mapped to the genome. The feature glyph links to a Pfam report for the domain.
 
'''Protein domains (PFAM)''' [http://pfam.xfam.org/ Pfam] protein domains identified in annotated proteins, as obtained from [https://www.ebi.ac.uk/interpro/download.html InterPro], are mapped to the genome. The feature glyph links to a Pfam report for the domain.

Revision as of 18:53, 19 February 2019

Reference Genome

Gene Shows all annotated genes (including non-coding genes), with direction of transcription indicated by small arrow at downstream edge. For each annotated gene, all transcripts are shown with the complete intron/exon structure (exon (wider bars) and intron (black line)). Coding regions are shown in blue and untranslated regions are uncolored. The gene region is hyperlinked to a pop-up window containing an automated gene summary and links to the FlyBase Gene Report and NCBI gene report; labels shows FlyBase transcript symbols.

RNA SO:0000673 Shows the exon (wider bars) and intron (black line) structure of each annotated coding transcript, with direction of transcription indicated with a small arrow at the downstream edge. Coding regions are shown in tan and untranslated regions are uncolored. Each transcript is hyperlinked to a FlyBase Transcript Report; label shows FlyBase transcript symbol.

Natural TE SO:0000101 Shows the extent of a natural transposable element in the sequenced strain (at the time it was sequenced). Hyperlinked to Natural Transposon report; label shows FlyBase Natural transposon symbol.

Repeat region Regions of genomic repeats and low complexity DNA sequences (in pink), as computed using RepeatMasker and RepeatRunner (Smith, et al., 2007)

Estimated Cytological band Approximate extent of the classical cytological chromosome bands described by Bridges. See Computed cytological data in FlyBase for a detailed description of how this computed cytological location is calculated. See FlyBase Maps for a collection of polytene chromosome images.

Nucleotide view When zoomed in, the DNA sequence of the Release 6 (R6) (Hoskins et al, 2015) genome is shown. The bases are color-coded, G-yellow, A-green, T-red, C-blue. When zoomed in (150 bp or less) the bases are colored and labeled (GATC). Between 150 and 750 bp, the bases are colored but not labeled. Above and below the base representation, the three-frame translation for the plus and minus strands is shown using the single-letter amino acid code. Start codons (M and L) are indicated in green and stop codons are marked in red.

Aligned Evidence

EST ("expressed sequence tag") Partial sequence of a cDNA; Indicated in medium green (darker than cDNAs). Partial sequence of a cDNA; shows the exon (wider bars) and intron (narrow bars) structure, and direction of transcription. When zoomed out to greater than 100kb, a density plot is shown. Link to GenBank entry. Sequences are aligned to release 6 by NCBI and submitted to FlyBase.

cDNA D. melanogaster cDNA sequences from large-scale submissions submitted to the sequence databases; Indicated in light green. Shows the exon (wider bars) and intron (black line) structure, and direction of transcription. Link to GenBank entry. cDNA sequences are aligned to release 6 by NCBI and submitted to FlyBase. Some genomic DNA submissions, including third party submissions, are included in this tier.

RNA-Seq based exon junctions Indicated in blue. Orientation of the junction is indicated with an arrow. Links go to FlyBase exon junction reports containing read counts from modENCODE or Baylor. See the BCM_1_RNAseq_junctions and modENCODE_mRNA-Seq_U_junctions Dataset Reports.

other aligned sequences D. melanogaster aligned nucleotides submitted to the sequence databases. Link to a report which includes the ID, position, length, DNA sequence, and more. Sequences are aligned to release 6 by NCBI and submitted to FlyBase. Some genomic DNA submissions, including third party submissions, are included in this tier.

PeptideAtlas peptides Indicated in red. Alignment of peptide sequences determined by mass spectroscopy, derived from polypeptides isolated from the sequenced strain at various developmental stages. Contributed by the Center for Model Organism Proteomes, SystemsX and Research Priority Project of the University of Zurich, Switzerland. For more information, see Peptide Atlas

Proteomic peptides (uniquely mapping) Alignment of peptide sequences identified in the developmental proteome of Casas-Vila et al., 2017 (http://flybase.org/reports/FBrf0235991) to the reference genome assembly. Glyphs are hyperlinked to sequence feature reports, with additional information on proteins that match a given peptide. Only peptides that map to a unique locus within the portion of the genome encoding annotated proteins are shown.

Protein domains (PFAM) Pfam protein domains identified in annotated proteins, as obtained from InterPro, are mapped to the genome. The feature glyph links to a Pfam report for the domain.

Protein domains (SMART) SMART protein domains identified in annotated proteins are mapped to the genome. The feature glyph links to an InterPro report for the domain.

Transcription Start Sites (modENCODE), embryo Transcription start site (TSS) regions identified by integrative analysis of ESTs, CAGE or RLM-RACE are depicted as green glyphs indicating the range over which 90 percent of the TSS signal is located. The arrow points in the direction of transcription. Clicking on the TSS feature links to the relevant Sequence Feature report. Note: data for embryonic stages only. See the mE_Transcription_Start_Sites Dataset Report.

Transcription Start Sites (RAMPAGE), peak calls Transcription start site regions (peak calls) identified by RAMPAGE-Seq across 36 stages of development, depicted as green glyphs. The arrow points in the direction of transcription. The feature glyph links to the relevant Sequence Feature report.

Mapped Mutations

Transgenic insertion site SO:0000368 Indicated by vertical bars with an arrow indicating orientation if known. An insertion indicated with an arrow pointing to the right is oriented with its conventional 5' terminus to the left (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "plus" orientation. An insertion indicated with an arrow pointing to the left is oriented with its conventional 5' terminus to the right (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "minus" orientation. An insertion with no arrow has an unknown orientation. For all insertions, if the estimated insertion site is larger than 10 nucleotides, the insertion is shown as a blue box rather than a vertical line. In those cases, see the Insertion Report for more information about localization. Insertions are linked to their respective Insertion Report.

Point mutation SO:1000008 A single nucleotide has been changed into another nucleotide. Location of mutation is indicated with a vertical bar and labeled with the FlyBase allele symbol. The feature glyph is linked to the related Allele Report.

Sequence variant SO:0000109 A region of sequence where variation has been observed. Often these refer to natural variants of a protein that lead to two different functions. The location of the mutation is indicated with a red box and labeled with the FlyBase Allele symbol. The feature glyph is linked to the related Allele Report.

Uncharacterized change in nucleotide sequence SO:1000007 The nature of the nucleotide substitution is either uncharacterized or only partially characterized. The location of the mutation is indicated with a red box and labeled with the FlyBase allele symbol. The feature glyph is linked to the related Allele Report.

Aberration junction SO:0000687 Location of aberration breakpoint reported in the literature. Labeled with FlyBase aberration symbol designation and the numerical designation of the breakpoint mapped (where known). Often the exact breakpoint location is unknown and the feature indicates a range within which the breakpoint has been mapped. Genetic data is available in the Aberration Report which can be accessed directly by clicking on the feature.

Complex substitution SO:1000005 The mutation does not fall simply into any of the other categories and is often a combination of events, e.g. deletion of 20 bases and insertion of 11 unrelated bases. Location of mutation is indicated with a red bar and labeled with the FlyBase allele symbol. Feature is linked to the related Allele Report.

Indel SO:1000032 The junction where an insertion or deletion of one or more nucleotides occurred. Location of mutation is indicated and labeled with the FlyBase allele symbol. In the case of deletions, the extent of the deletion is indicated by a red bar. In the case of nucleotide insertions, the location of the nucleotide(s) insertion is indicated by a vertical line. Features are linked to the related Allele Report.

Rescue fragment SO:0000411 Locations of transgenic rescue fragment reported in the literature. Labeled with FlyBase allele symbol designation. Features are linked to the related Allele Reports which contain genetic data.

Noncoding Features

Insulators class I Insulator_Class_I.mE01 Dataset Report

Insulators class II Insulator_Class_II.mE01 Dataset Report

Protein binding site. Indicated by grey bar. Locations of protein binding sites reported in the literature, as compiled by FlyBase and/or RedFly). Reference and supporting information available on related Sequence Feature reports (click feature glyph to access report). See also the REDfly TFBSs dataset report.

Enhancers mE1_CBP_Enhancers Dataset Report. Genomic sequences identified as putative embryo-only enhancers by virtue of embryo-specific CBP-binding in ChIP assays.

Silencers mE1_HDAC_PRE Dataset Report. Genomic sequences identified as putative polycomb response elements (silencers) in embryos.

Regulatory region. Indicated by grey bar. Locations of regulatory regions reported in the literature, as compiled by FlyBase and/or REDfly). Reference and supporting information available on related Sequence Feature reports (click feature glyph to access report). See also the REDfly CRMs dataset report.

TFBS – HOT spot analysis mE1_TFBS_HSA Dataset Report. Genomic sequences identified as unique regions of transcription factor (TF) binding using HOT spot analysis (HSA); one or many TFs may bind in a given region. A synthesis of ChIP data sets for 41 different transcription factors. TF binding profiles used in this analysis were assayed at early embryo stages. Mousing over the feature pops up a box that lists the transcription factor genes that bind within the region. Clicking on the feature links to the related Sequence Feature report.

TFBS – zinc finger domain Binding sites for transcriptions factors that contain one or more zinc finger domains. Clicking on the feature links to the related Sequence Feature report. The following Dataset Reports comprise the data found in this track.

mE1_TFBS_disco
mE1_TFBS_ftz-f1
mE1_TFBS_GATAe
BDTNP1_TFBS_hb
mE1_TFBS_hkb
BDTNP1_TFBS_kni
mE1_TFBS_Kr
mE1_TFBS_sbb
mE1_TFBS_sens
BDTNP1_TFBS_shn
BDTNP1_TFBS_sna
BDTNP1_TFBS_tll
mE1_TFBS_zfh1

TFBS – homeodomain Binding sites for transcriptions factors that contain one or more homeodomains. The following Dataset Reports comprise the data found in this track.

BDTNP1_TFBS_bcd
mE1_TFBS_cad
mE1_TFBS_Dll
mE1_TFBS_en
mE1_TFBS_eve
BDTNP1_TFBS_ftz
mE1_TFBS_inv
BDTNP1_TFBS_prd
mE1_TFBS_Ubx
BDTNP1_TFBS_z

TFBS – helix-loop-helix domain Binding sites for transcriptions factors that contain one or more helix-loop-helix domains. The following Dataset Reports comprise the data found in this track.

BDTNP1_TFBS_da
mE1_TFBS_h
mE1_TFBS_kn
BDTNP1_TFBS_twi

TFBS – BTB/POZ domain Binding sites for transcriptions factors that contain one or more BTB/POZ domains. The following Dataset Reports comprise the data found in this track.

mE1_TFBS_bab1
mE1_TFBS_chinmo
mE1_TFBS_Trl
mE1_TFBS_ttk

TFBS – other Binding sites for transcriptions factors that do not fall into one of the other categories. The following Dataset Reports comprise the data found in this track.

mE1_TFBS_cnc
mE1_TFBS_D
BDTNP1_TFBS_dl
BDTNP1_TFBS_gt
mE1_TFBS_jumu
BDTNP1_TFBS_Mad
BDTNP1_TFBS_Med
mE1_TFBS_run
BDTNP1_TFBS_slp1
mE1_TFBS_Stat92E

Origins of replication mE_Early_Replication_Origins_cells Dataset Report. Genome profile of early activating origins of replication, BrdU label, Kc, BG3 and S2 cell lines. Links go to relevant Sequence Feature report.

Putative brain enhancers (Pfeiffer et al.) GMR_Brain_exp_1 Dataset Report. Grey glyphs represent putative enhancers used to generate fly stocks carrying GAL4 transgenic constructs designed to be expressed in adult brain. Stocks are available from the Bloomington Stock Center. Clicking the glyph brings up the associated Sequence Feature report (e.g. http://flybase.org/reports/FBsf0000162377.html). On the Sequence Feature report, under "associated information" there is a construct listed. Clicking this construct link brings up the associated Construct Report (e.g. http://flybase.org/reports/FBtp0058072.html) on which you can find a link to the Stock Report (e.g. http://flybase.org/reports/FBst0045107.html). Sorry it's so roundabout!

RNA Editing Sites A-to-I RNA editing sites.
mE_A-to-I_RNA_Editing_Sites Dataset Report.
Rosbash_Adult_Head_A-to-I_Editing_Sites Dataset Report.

Similarity

Orthologs. Indicated by pale yellow bar covering extent of the orthologous region. Clicking on a bar will cause a pop-up box to appear which will contain links for orthologs of the gene. The ortholog box will contain the drosophilid orthologs on the left (link to a GBrowse view of the orthologous region) and other orthologs on the right (link to GenBank report).

sgRNA Reagents

TRiP-OE sgRNAs (overexpression)
Genomic sequences used as short guide RNAs in UAS constructs designed to target genes for CRISPR/Cas9-VPR-based overexpression. Extents of sgRNAs are indicated by red glyphs (arrows indicate orientation). Two nearby sgRNAs are used in a single construct to target the upstream region of a given gene.
Example region: 2L:7305255..7308450
More information: TRiP-OE report

TRiP-KO sgRNAs (knockout)
Genomic sequences used as short guide RNAs in UAS constructs designed to target genes for CRISPR/Cas9-based mutation. Extents of sgRNAs are indicated by red glyphs (arrows indicate orientation).
Example region: 2R:16654718..16655037
More information: TRiP-KO report

Predicted sgRNAs
Sequences predicted to be suitable as sgRNAs for genome engineering. These comprise all possible 23-mers in the D. melanogaster Release6 genome assembly that have 1) a 3-prime PAM sequence (NGG) and 2) a 15 bp sequence that is unique to the genome (including the PAM sequence). The seed score for each sgRNA, which ranges from 12 to 15 bases, indicates the uniqueness of the base pairing end of the sgRNA (excluding the PAM sequence). The frameshift score is the percent of frameshift changes predicted by micro-homology around the target site; the higher the score, the better suited the sgRNA for knockout. Because base pair mismatches can be tolerated outside the seed region, predicted sgRNAs were evaluated for potential off-target sites allowing for 3, 4 or 5 mismatches; potential sgRNA sequences are sorted into five browser tracks based on their predicted specificity at different stringencies.
Example region: 2L:7314441..7314673
More information: DRSC_sgRNA_designs report

RNAi Reagents and Data

DRSC RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
DRSC dsNRA amplicon platform Dataset Report. DNA fragments amplified from D. melanogaster genomic DNA (OregonR) by the Drosophila Genomics Resource Center (DGRC), using gene-specific primers made by Incyte and designed to target transcribed regions with minimal sequence similarity to other genes. Used for the DGRC-D.melanogaster-DGRC1-15552-v5 amplicon microarray, release date June 2, 2006 (original release of v1, May 2004) . For further information see the DGRC-1 Dataset report.

VDRC RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
Segment used to create an inverted repeat in RNAi construct from the Vienna Drosophila RNAi Center (Dickson B. et al. 2007.7.18).

VDRC-1 Dataset Report
VDRC-2 Dataset Report.

TRiP RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
FBlc0000048 TRiP-1 Dataset Report
FBlc0000153 TRiP-2 Dataset Report
FBlc0000185 TRiP-3 Dataset Report
FBlc0000186 TRiP-4 Dataset Report
FBlc0000416 TRiP-5 Dataset Report

BKNA RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
HD2 Dataset Report (dsRNA amplicon platform) RNAi amplicons from the GenomeRNAi database.

HFA RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
HFA Datset Report.

NIG-Fly RNAi amplicons
Extents of the amplicons are indicated with an orange bar.
NIG_RNAi_Fly-1 Datset Report.

Aberrations

Deleted segment. Indicated in red. Dashed lines indicate uncertainty in breakpoint location, and demarcate the region to which the breakpoint has been mapped. When one or more aberrations overlap the region being viewed, a darker red bar labeled "Spanning aberration(s)" will be seen. When moused-over, a pop-up box containing all the aberrations that span the region being viewed will appear. Click one of the aberration symbols to go to the Aberration Report. When mousing over a lighter red bar labeled with a deficiency symbol, a list of genes within the aberration extents pops up. Clicking on one of the gene symbols within the pop up will link to the Gene Report. Clicking on the bar itself links to the Aberration Report.

Expression Levels: RNA-Seq

These tracks contain RNA-Seq expression data for several different stages of development, specific tissues, types of tissue culture cells, or treatment conditions. Different samples are presented in different colors. The tracks are labeled with the sample identity. Some labels are obscured by the RNA-Seq signal. Moving laterally along the chromosome should take you to a visible label.

Developmental stage subsets, unique reads (modENCODE)modENCODE_mRNA-Seq_U Dataset report. These tracks contain RNA-Seq expression data for several different stages of development.

Digestive system
mE_mRNA_L3_Wand_dig_sys
mE_mRNA_A_1d_dig_sys
mE_mRNA_A_4d_dig_sys
mE_mRNA_A_20d_dig_sys

Fat body and salivary glands
mE_mRNA_L3_Wand_fat
mE_mRNA_WPP_fat
mE_mRNA_P8_fat
mE_mRNA_L3_Wand_saliv
mE_mRNA_WPP_saliv

Imaginal disc and other carcass
mE_mRNA_L3_Wand_imag_disc
mE_mRNA_L3_Wand_carcass
mE_mRNA_A_1d_carcass
mE_mRNA_A_4d_carcass
mE_mRNA_A_20d_carcass

CNS and adult head
mE_mRNA_L3_CNS
mE_mRNA_P8_CNS
mE_mRNA_A_MateM_1d_head
mE_mRNA_A_MateM_4d_head
mE_mRNA_A_MateM_20d_head
mE_mRNA_A_VirF_1d_head
mE_mRNA_A_VirF_4d_head
mE_mRNA_A_VirF_20d_head
mE_mRNA_A_MateF_1d_head
mE_mRNA_A_MateF_4d_head
mE_mRNA_A_MateF_20d_head

Gonads and male accessory glands
mE_mRNA_A_MateM_4d_testis
mE_mRNA_A_MateM_4d_acc_gland
mE_mRNA_A_VirF_4d_ovary
mE_mRNA_A_MateF_4d_ovary

Tissue culture cells, by strand (modENCODE Transcription Group) modENCODE_mRNA-Seq_cell.B Dataset report.

Treatments/Conditions modENCODE_mRNA-Seq_treatments Dataset report.

L3 CNS neuron
Knoblich_mRNA_L3_CNS_neuron

L3 CNS neuroblast
Knoblich_mRNA_L3_CNS_neuroblast

Expression Levels: Small RNA-Seq

These tracks contain RNA-Seq expression data for small RNA species (<30nt) that have been consolidated from various independent studies by sample type (developmental stage, tissue or cell line). Different samples are presented in different colors. The tracks are labeled with the sample identity. Some labels are obscured by the RNA-Seq signal. Moving laterally along the chromosome should take you to a visible label.

Tissues, stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_tissues Dataset Report.

Developmental stages, stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_development Dataset Report.

Tissue culture cells (Schneider + embryonic), stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_cells Dataset Report.

Tissue culture cells (imaginal disc), stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_cells Dataset Report.

Tissue culture cells (CNS, ovary, blood), stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_cells Dataset Report.

Genome Variation

These tracks report single nucleotide polymorphisms (SNPs) and indels observed in various strains of Drosophila melanogaster, relative to the "Release 6" reference genome assembly derived from the iso-1 sequenced strain.

DGRP variants These are SNPs and indels observed in the set of DGRP_wild_type_strains. SNPs are shown in blue, deletions in red and insertions in green. Click on a variant's glyph to obtain additional information on the sequence change, the variant's frequency across the DGRP strains, and a list of the specific strains carrying the variant. Data were mapped to the "Release 6" coordinates and provided in VCF file format by Wen Huang (Michigan State University) and Trudy MacKay (Clemson University). Note that variant data are reported only on the major chromosome arms (X, 2L, 2R, 3L, 3R and 4); note that no variant data are reported in the large 3-4 Mbp regions near the centromeres of chromosome arms X, 2L, 2R, 3L and 3R.