Difference between revisions of "FlyBase:JBrowse Tracks"
m (Updating to current labels) |
|||
(One intermediate revision by the same user not shown) | |||
Line 257: | Line 257: | ||
=== EST (Expressed Sequence Tag) === | === EST (Expressed Sequence Tag) === | ||
Partial sequence of a cDNA. Shows the exon (wider bars) and intron (black lines) structure, and direction of transcription. When zoomed out to greater than 100kb, a density plot is shown. Each cDNA glyph is hyperlinked to its GenBank nucleotide entry. Sequences are aligned to release 6 by NCBI and submitted to FlyBase. | Partial sequence of a cDNA. Shows the exon (wider bars) and intron (black lines) structure, and direction of transcription. When zoomed out to greater than 100kb, a density plot is shown. Each cDNA glyph is hyperlinked to its GenBank nucleotide entry. Sequences are aligned to release 6 by NCBI and submitted to FlyBase. | ||
+ | |||
+ | === RNA-Seq long reads === | ||
+ | Long read RNA-Seq sequences from [http://{{flybaseorg}}/reports/FBrf0256598.htm Gonzalez et al., 2023], as reported in the GTF file available on the [https://hilgerslab.shinyapps.io/ciaTranscriptome/ Hilgers Lab] website. This study identified approximately 60,000 distinct transcript sequences from combined adult ovary, adult head and whole embryonic tissues (whole cell extracts were used). FlyBase analyzed these long-read sequences for coding potential by comparing them to FlyBase polypeptide coding sequences (CDS), and sorted them into four tracks: | ||
+ | :'''RNA-Seq long reads encoding known CDS''' Sequences that encode a known (FlyBase-annotated) CDS. | ||
+ | :'''RNA-Seq long reads encoding novel CDS''' Sequences that encode a novel CDS (starting and ending with known/annotated start and stop codons). | ||
+ | :'''RNA-Seq long reads encoding partial CDS''' Sequences that encode a partial CDS (starting with an annotated start codon, or, ending with an annotated stop codon). | ||
+ | :'''RNA-Seq long reads encoding no known CDS''' Sequences that do not encode any CDS with an annotated start or annotated stop codon. | ||
+ | |||
+ | For all four tracks above, the sequence designation is displayed just below the left end of the sequence glyph: UTRs are shown in blue, CDSs are shown in yellow, and non-coding transcript exons are shown in red. | ||
=== Transcription Start Sites === | === Transcription Start Sites === |
Latest revision as of 14:09, 18 December 2024
Reference Genome
Nucleotide View When zoomed in, the DNA sequence of the Release 6 (R6) (Hoskins et al, 2015) genome is shown as color-coded blocks, G-orange, A-green, T-red, C-blue. When zoomed in further, the bases are also labeled (GATC). Above and below the base representation, the three-frame translation for the plus and minus strands is shown using the single-letter amino acid code. Start codons (M and L) are indicated in green and stop codons are marked in red.
Gene span Shows the extent of all annotated gene models with direction of transcription indicated by a small arrow at the downstream edge. Protein-coding genes are shown in blue with the different shades further indicating direction of transcription. Non-protein-coding genes are shown in red, tRNA genes in purple and pseudogenes in pink. Each gene glyph displays a pop-up window containing an automated gene summary when moused over and links to the FlyBase Gene Report when clicked.
Gene: transcript view Shows a condensed transcript summary of all annotated gene models with direction of transcription indicated by a small arrow at the downstream edge. For each annotated gene, all transcripts are shown with the complete intron/exon structure (exons indicated by wider bars, introns by black lines). For protein-coding genes, coding regions are shown in orange and untranslated regions in gray. Non-protein-coding genes are shown in red, tRNA genes in purple and pseudogenes in pink. Each transcript summary glyph is hyperlinked to a pop-up window containing an automated gene summary; labels show FlyBase transcript symbols. For links to individual FlyBase transcript reports, please see the RNA track.
RNA Like the Gene:transcript view track, this track shows the complete intron/exon structure (exons indicated by wider bars, introns by black lines) of each annotated transcript with direction of transcription indicated by a small arrow at the downstream edge. For protein-coding genes, coding regions are shown in orange and untranslated regions in gray. Non-protein-coding genes are shown in red, tRNA genes in purple and pseudogenes in pink. Each transcript glyph is hyperlinked to its individual FlyBase Transcript Report; labels show FlyBase transcript symbols.
CDS Shows extent of sequence encoding each specific polypeptide, with direction of transcription indicated by a small arrow; introns indicated as narrow lines. Each CDS glyph is hyperlinked to a FlyBase Polypeptide Report; labels show FlyBase polypeptide symbols.
Orthologs Pale yellow bar indicates the extent of the orthologous region. Each glyph is hyperlinked to a pop-up window containing links for orthologs of the gene. The ortholog box will contain the drosophilid orthologs on the left (which link to a FlyBase Gene Report) and other orthologs on the right (which link to a GenBank report).
Natural TE Shows the extent of a natural transposable element in the sequenced strain (at the time it was sequenced). Directionality is indicated by a small arrow at the downstream edge. Each TE glyph is hyperlinked to a FlyBase Natural Transposon Report; labels show FlyBase natural transposon symbols.
Repeat region Regions of genomic repeats and low complexity DNA sequences, as computed using RepeatMasker and RepeatRunner (Smith, et al., 2007)
Estimated Cytological band Approximate extent of the classical cytological chromosome bands described by Bridges. See Computed cytological data in FlyBase for a detailed description of how this computed cytological location is calculated. See FlyBase Maps for a collection of polytene chromosome images.
Aberrations
Aberration junctions Location of aberration breakpoint reported in the literature. Often the exact breakpoint location is unknown and the feature indicates a range within which the breakpoint has been mapped.
Deficiencies (formerly Deleted segment) Glyphs correspond to the extent of deleted sequences. For convenience, labels for deficiencies without stocks available at Drosophila stock centers are red. Hover over each glyph to view a pop-up window with stock center availability. Click on each glyph to expand the pop-up window with a list of affected genes and a link to the relevant FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information. For tracks displaying only deficiencies with stocks available, please see the 'Deficiency Stocks' and 'Deficiency Kit' tracks in this section.
Duplications Glyphs correspond to the extent of duplicated sequences. For convenience, labels for duplications without stocks available at Drosophila stock centers are red. Hover over each glyph to view a pop-up window with stock center availability. Click on each glyph to expand the pop-up window with a list of affected genes and a link to the relevant FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information. Many of these duplications were generated by transgenic methods and belong to several curated collections. See the Dps(X)DC_set1, Dps_BAC_Retrofit, Dps_GenetiVision, and Dps(X)BSC_set1 FlyBase Dataset Reports for more details on these collections. For tracks displaying only duplications with stocks available, please see the 'Duplication Stocks' track in this section.
Deficiency Stocks in Bloomington, Kyoto (formerly Stock Center Aberrations: deleted segment under Stocks) This track contains only deficiencies with stocks available from the Bloomington and Kyoto Stock Centers. Click on each glyph to view a pop-up window with a list of affected genes and a link to the relevant FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information.
The Bloomington Deficiency Kit The Bloomington Deficiency Kit is a set of stocks defined by the Bloomington Drosophila Stock Center (BDSC) to provide maximal coverage of the genome with the minimal number of deficiencies. The BDSC Deficiency Kit includes deficiencies with molecularly mapped breakpoints as well as deficiencies with breakpoints that have not been mapped molecularly, primarily to provide coverage of gaps between the molecularly defined deficiencies. These two classes of deficiencies are separated into the following two tracks.
- BDSC Deficiency Kit: deleted segment Molecularly defined deficiencies. Click on each glyph to view a pop-up window with a list of affected genes and a link to the relevant FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information.
- BDSC Deficiency Kit: gap filling or haploinsufficiency flanking segment. Segments of cytologically defined deficiencies that fill gaps between molecularly defined deficiencies or flank haploinsufficient loci. Since the ends of cytologically characterized deficiencies cannot be placed on the genome map with certainty, the BDSC has defined segments of these deficiencies that fill gaps in molecularly defined coverage for JBrowse display. The endpoints of gap filling segments are derived primarily from overlapping deficiency endpoints and complementation with annotated genes. Each glyph is hyperlinked to a FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information.
Duplication Stocks in Bloomington, Kyoto (formerly Stock Center Aberrations: duplicated segment) This track contains only duplications available from the Bloomington or Kyoto Stock Centers. Click on each glyph to view a pop-up window with a list of affected genes and a link to the relevant FlyBase Aberration Report with further information including links to the relevant Stock Report with stock ordering information.
Transgenic Insertions
Transgenic Insertions(formerly Transgenic insertion site) Insertions are indicated by blue diamonds with an arrow indicating orientation if known. An arrow pointing to the right indicates a transposable element with its conventional 5' terminus to the left (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "plus" orientation. An arrow pointing to the left indicates a transposable element oriented with its conventional 5' terminus to the right (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "minus" orientation. An insertion with no arrow has an unknown orientation. For all insertions, if the estimated insertion site is larger than 10 nucleotides, a dotted line represents the region of uncertainty. Hover over each insertion glyph to view a pop-up window with basic information and click to go to its FlyBase Insertion Report with further information including links to the relevant Stock Report with stock ordering information.
Transgenic Insertion Stocks in Bloomington, Kyoto This track contains only insertions available from the Bloomington and Kyoto Stock Centers. Insertions are indicated by blue diamonds with an arrow indicating orientation if known. An arrow pointing to the right indicates a transposable element with its conventional 5' terminus to the left (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "plus" orientation. An arrow pointing to the left indicates a transposable element oriented with its conventional 5' terminus to the right (assuming view is in conventional orientation of the Drosophila chromosome); this is described as being in the "minus" orientation. An insertion with no arrow has an unknown orientation. For all insertions, if the estimated insertion site is larger than 10 nucleotides, a dotted line represents the region of uncertainty. Hover over each insertion glyph to view a pop-up window with basic information and click to go to its FlyBase Insertion Report with further information including links to the relevant Stock Report with stock ordering information.
Mutations & Sequence Variants
Mutations & Experimentally Induced Variants
- Substitutions Substitutions include point mutations in which a single nucleotide has been changed into another nucleotide as well as multiple nucleotide variants (MNV) in which the inserted sequence is the same length as the replaced sequence. The location of the mutation is indicated with a vertical bar and labeled with the FlyBase allele symbol. Point mutations are labeled in blue while MNVs are labeled in red to allow the user to easily distinguish substitution types. Each feature glyph is hyperlinked to the related FlyBase Allele Report.
- Insertions, Deletions This track shows insertions or deletions of one or more nucleotides as well as delins, alterations which include both an insertion and a deletion of 2 or more nucleotides. In the case of deletions and delins, the extent of the lesion is indicated by a yellow bar. In the case of nucleotide insertions, the location of the insertion is indicated by a vertical line. Mutations are labeled with the FlyBase allele symbol, insertions in green, deletions in red and delins in black. Each feature glyph is hyperlinked to the related FlyBase Allele Report.
- Complex Substitutions The mutation does not fall simply into any of the other categories and is often a combination of events, e.g. deletion of 20 bases and insertion of 11 unrelated bases. Location of mutation is indicated with a red bar and labeled with the FlyBase allele symbol. Each feature glyph is hyperlinked to the related FlyBase Allele Report.
- Roughly Mapped Mutations (formerly Uncharacterized change in nucleotide sequence) The mutation has only been roughly mapped molecularly, e.g. to a restriction fragment. The location of the mutation is indicated with a red bar and labeled with the FlyBase allele symbol. Each feature glyph is hyperlinked to the related FlyBase Allele Report.
- FlyBase Mutations/Variants on Alliance These represent the portion of FlyBase curated mutation data shared with the Alliance of Genome Resources. The same data is contained in the other tracks in this section, as well as in the Transgenic Insertions track, and can be viewed separately by mutation type therein. Here, the differently-shaped glyphs represent mutation type- deletions are rectangles, insertions are inverted triangles and point mutations are diamonds. Hovering over each glyph brings up a window with further information.
Polymorphisms & Natural Population Variants
These tracks report single nucleotide polymorphisms (SNPs) and indels observed in various strains of Drosophila melanogaster, relative to the "Release 6" reference genome assembly derived from the iso-1 sequenced strain.
- FlyBase Annotated Variants (formerly Sequence variant) A region of sequence where variation has been observed. Often these refer to natural variants of a protein that lead to two different functions. The location of the mutation is indicated with a red box and labeled with the FlyBase Allele symbol. The feature glyph is linked to the related Allele Report.
- DGRP Variants These are SNPs and indels observed in the set of DGRP_wild_type_strains. SNPs are shown in blue, deletions in red and insertions in green. Click on a variant's glyph to obtain additional information on the sequence change, the variant's frequency across the DGRP strains, and a list of the specific strains carrying the variant. Data were mapped to the "Release 6" coordinates and provided in VCF file format by Wen Huang (Michigan State University) and Trudy MacKay (Clemson University). Note that variant data are reported only on the major chromosome arms (X, 2L, 2R, 3L, 3R and 4); note that no variant data are reported in the large 3-4 Mbp regions near the centromeres of chromosome arms X, 2L, 2R, 3L and 3R.
Misc. Reagents
Putative Enhancer Lines
- Putative Brain Enhancers (Janelia GAL4 lines) Putative enhancers used to generate fly stocks carrying GAL4 transgenic constructs designed to be expressed in the adult brain. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data and links to the relevant Stock Report with stock ordering information. See the GMR_Brain_exp_1 FlyBase Dataset Report and references therein for more information on this collection.
- Putative Enhancers (Vienna Tile GAL4 lines) Putative enhancers used to generate fly stocks carrying GAL4 transgenic constructs. Stocks are available from the Vienna Drosophila Resource Center. Each feature glyph is hyperlinked to a FlyBase Sequence Feature report with supporting data and links to the related FlyBase Stock Report. See the VDRC-VT FlyBase Dataset Report and references therein for more information on this collection.
sgRNA Reagents
- TRiP-OE-VPR sgRNAs (overexpression) Genomic sequences used as short guide RNAs in constructs designed to target genes for CRISPR/Cas9-VPR-based overexpression. Two nearby sgRNAs are used in a single construct to target the upstream region of a given gene. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a FlyBase Sequence Feature Report containing associated data. See the TRiP-OE-VPR FlyBase Dataset Report and references therein for more information.
- TRiP-OE-flySAM sgRNAs (overexpression) Genomic sequences used as short guide RNAs in constructs designed to target genes for overexpression by a "synergistic activation mediator" (flySAM). These sgRNAs have been fused to MS2 loops for recruitment of MCP-tagged transcriptional activator, in addition to the sgRNA-recruited nuclease-dead Cas9 fused to a transcriptional activator. Two nearby sgRNAs are used in a single construct to target the upstream region of a given gene. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a FlyBase Sequence Feature Report containing associated data. See the TRiP-OE-flySAM and TRiP-OE-flySAM.dCas9 FlyBase Dataset Reports and references therein for more information.
- TRiP-KO sgRNAs (knockout) Genomic sequences used as short guide RNAs in constructs designed to target genes for CRISPR/Cas9-based mutation. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a FlyBase Sequence Feature Report containing associated data. See the TRiP-KO FlyBase Dataset Report and references therein for more information.
- Weizmann KO sgRNAs (knockout) Genomic sequences used as short guide RNAs in constructs designed to target genes for CRISPR/Cas9-based mutation. Only some of these sgRNAs have been incorporated into flies, while the rest are available as plasmid stocks. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a FlyBase Sequence Feature Report containing associated data. See the Weizmann_CRISPR_reagents FlyBase Dataset Report and references therein for more information.
- Heidelberg CFD KO sgRNAs (conditional knockout) Genomic sequences used as short guide RNAs in UAS constructs designed to target genes for CRISPR/Cas9-based mutation. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a FlyBase Sequence Feature Report containing associated data. See the HD_CFD FlyBase Dataset Report and references therein for more information.
- Predicted sgRNAs Sequences predicted to be suitable as sgRNAs for genome engineering. These comprise all possible 23-mers in the D. melanogaster Release6 genome assembly that have 1) a 3-prime PAM sequence (NGG) and 2) a 15 bp sequence that is unique to the genome (including the PAM sequence). The seed score for each sgRNA, which ranges from 12 to 15 bases, indicates the uniqueness of the base pairing end of the sgRNA (excluding the PAM sequence). The frameshift score is the percent of frameshift changes predicted by micro-homology around the target site; the higher the score, the better suited the sgRNA for knockout. Because base pair mismatches can be tolerated outside the seed region, predicted sgRNAs were evaluated for potential off-target sites allowing for 3, 4 or 5 mismatches; potential sgRNA sequences are sorted into five browser tracks based on their predicted specificity at different stringencies. Each glyph indicating the extent of sgRNAs (arrows indicate orientation) is hyperlinked to a pop-up window containing associated data. See the DRSC_sgRNA_designs FlyBase Dataset Report and references therein for more information.
RNAi Reagents
- DRSC RNAi amplicons DNA fragments amplified from D. melanogaster genomic DNA (OregonR) by the Drosophila Genomics Resource Center (DGRC), using gene-specific primers made by Incyte and designed to target transcribed regions with minimal sequence similarity to other genes. Used for the DGRC genome-wide RNAi library. Each glyph indicating the extent of the amplicon (arrows indicate orientation) is hyperlinked to a pop-up window containing associated data. See the DRSC_dsRNA FlyBase Dataset Report and references therein for more information.
- VDRC RNAi amplicons DNA segments used to create RNAi libraries from the Vienna Drosophila RNAi Center. This track is a composite of three VDRC RNAi library collections- GD, KK and shRNA. Each glyph indicating the extent of the amplicon (arrows indicate orientation) is hyperlinked to a pop-up window containing associated data. See the VDRC-GD, VDRC-KK and VDRC-SH FlyBase Dataset Reports and references therein for more information.
- TRiP RNAi amplicons Extents of the amplicons are indicated with an orange bar. See the TRiP-1, TRiP-2, TRiP-3, TRiP-4 and TRiP-5 FlyBase Dataset Reports and references therein for more information.
- BKNA RNAi amplicons RNAi amplicons from the GenomeRNAi database. Extents of the amplicons are indicated with an orange bar. See the HD2 FlyBase Dataset Report and references therein for more information.
- HFA RNAi amplicons Extents of the amplicons are indicated with an orange bar. See the HFA FlyBase Dataset Report and references therein for more information.
- NIG-Fly RNAi amplicons Extents of the amplicons are indicated with an orange bar. See the NIG_RNAi_Fly-1 FlyBase Dataset Report and references therein for more information.
Rescue fragments
Locations of transgenic rescue fragments reported in the literature. Labeled with FlyBase allele symbol designation. Each feature glyph is linked to the related FlyBase Allele Report which contains genetic data.
Genomic Libraries
- Pacman Chori-321_BAC A BAC library containing genomic DNA fragments (average size of 83 kb) in the attB-P[acman]-CmR-BW vector. The glyphs indicate the extent of the BAC, inferred from the mapping of the sequenced BAC insert ends. Each feature glyph is hyperlinked to a FlyBase Clone Report with supporting data. See the CHORI-321_BAC FlyBase Dataset Report and references therein for more information.
- Pacman Chori-322_BAC A BAC library containing genomic DNA fragments (average size of 21 kb) in the attB-P[acman]-CmR-BW vector. The glyphs indicate the extent of the BAC, inferred from the mapping of the sequenced BAC insert ends. Each feature glyph is hyperlinked to a FlyBase Clone Report with supporting data. See the CHORI-322_BAC FlyBase Dataset Report and references therein for more information.
Microarrays
- Affymetric v1 Oligonucleotides (25-mers) designed by Affymetrix to correspond to annotated transcripts in D. melanogaster. Used for the Affymetrix GeneChip Drosophila Genome Array DrosGenome1 microarray, release date February 19, 2002. Each glyph is hyperlinked to a pop-up window containing sequence data. See the Affymetrix_GeneChip_v1 FlyBase Dataset Report for more information.
- Affymetric v2 Oligonucleotides (25-mers) designed by Affymetrix to correspond to annotated transcripts in D. melanogaster. Used for the Affymetrix GeneChip Drosophila Genome 2.0 Array, release date July 1, 2004. Each glyph is hyperlinked to a pop-up window containing sequence data. See the Affymetrix_GeneChip_v2 FlyBase Dataset Report for more information.
- DGRC-1 amplicons DNA fragments amplified from D. melanogaster genomic DNA (OregonR) by the Drosophila Genomics Resource Center (DGRC), using gene-specific primers made by Incyte and designed to target transcribed regions with minimal sequence similarity to other genes. Used for the DGRC-D.melanogaster-DGRC1-15552-v5 amplicon microarray, release date June 2, 2006 (original release of v1, May 2004) . Each glyph is hyperlinked to a pop-up window containing sequence data. See the DGRC-1 FlyBase Dataset Report for more information.
- DGRC-2 oligos "Long oligos" (65-69mers) designed to correspond to annotated transcripts in D. melanogaster (r4.3); synthesized by FlyChip in collaboration with the International Drosophila Array Consortium INDAC and the Drosophila Genomics Resource Center (DGRC). Used for the DGRC-D.melanogaster-DGRC2-17328-v1 oligonucleotide microarray, release date June 20,2006. Each glyph is hyperlinked to a pop-up window containing sequence data. See the DGRC-2 FlyBase Dataset Report for more information.
Genome Level Features
Chromatin Features
- Chromatin Domains (5-state model, Kc cells) Whole-genome DamID binding profiles of 53 chromatin proteins in Drosophila Kc167 cells were generated and/or analyzed. In the same array platform, ChIP-chip profiles of histone H3, H1, H3K9me2, H3K27me3, H3K4me2, and H3K79me3 were obtained. These were correlated with gene expression, which was measured by RNA-tag profiling. See the Chromatin_types_NKI.Kc167 FlyBase Dataset Report and references therein for more information. The 5-state model track legend is as follows.
- Chromatin Domains (9-state model, S2 cells) Demarcation of chromatin domains of nine major types based on analysis of 18 histone modification profiles. See the Chromatin_types_mE1.S2 FlyBase Dataset Report and references therein for more information.
- Chromatin Domains (9-state model, BG3 cells) Demarcation of chromatin domains of nine major types based on analysis of 18 histone modification profiles. See the Chromatin_types_mE2.BG3 FlyBase Dataset Report and references therein for more information. The 9-state model track legend is as follows.
- His3 modifications (ChIP-seq, embryonic mesoderm) Peak calls for ChIP-seq of histone modifications obtained from purified embryonic mesodermal nuclei. Data are available for H3K4me1, H3K4me3, H3K27Ac, H3K27me3, H3K36me3, H3K79me3 and RNA Pol II and are displayed as one compiled track. Data were kindly provided by Eileen Furlong's lab (EMBL), as published in Bonn et al., 2012. Data analysis methods by Matthias Monfort (of the Furlong group) are described here. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
Transcriptional Regulatory Elements
- Insulators (modENCODE, class I) Insulators identified by ChIP-chip of Cp190, BEAF-32 and CTCF in embryos. Class I insulators are defined as having at least two binding sites among Cp190, BEAF-32 and CTCF with peak overlap of less than 250bp. See the Insulator_Class_I.mE01 FlyBase Dataset Report and references therein for more information. Each diamond-shaped feature glyph displays a pop-up window with basic information when hovered over and links to a FlyBase Sequence Feature Report with further supporting data.
- Insulators (modENCODE, class II) Insulators identified by ChIP-chip of su(Hw) in embryos. Class I insulators are defined as having only su(Hw) binding sites. See the Insulator_Class_II.mE01 FlyBase Dataset Report and references therein for more information. Each diamond-shaped feature glyph displays a pop-up window with basic information when hovered over and links to a FlyBase Sequence Feature Report with further supporting data.
- Putative PREs (modENCODE) Genomic sequences identified by ChIP-chip of HDAC and histone modifications as putative polycomb response elements (silencers) in embryos. See the mE1_HDAC_PRE FlyBase Dataset Report and references therein for more information. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data.
- Transcriptional Regulatory Regions (REDfly) Locations of regulatory regions reported in the literature, as compiled by FlyBase and/or REDfly). See the REDfly CRMs FlyBase Dataset Report and references therein for more information. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data.
Transcription Factor Binding Sites
- Transcription Factor Binding Sites (REDfly) Locations of protein binding sites reported in the literature, as compiled by FlyBase and/or RedFly). See the REDfly TFBSs FlyBase Dataset Report and references therein for more information. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data.
- Transcription Factor Binding Sites (Furlong lab compiled, ChIP-chip, mesodermal TFs) ChIP-chip binding peak calls for 13 TFs that control mesodermal development at various points of embryogenesis. There are 28 samples in all, compiled into five tracks by stage of embryogenesis. These data were kindly provided by Eileen Furlong's lab (EMBL), comprising several studies: Zinzen et al., 2009, Bonn et al., 2012, Junion et al., 2012, Rembold et al., 2014 and Ciglar et al., 2014. These 28 ChIP-chip datasets were processed in parallel by Matthias Monfort of the Furlong group, as described here. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
- ChIP (embryo, 2-4h) - Mef2 sna tin twi
- ChIP (embryo, 4-6h) - Doc2 pMad Mef2 pan pnr slp1 tin twi
- ChIP (embryo, 6-8h) - bap bin Doc2 lmd pMad Mef2 pan pnr slp1 tin ttk twi
- ChIP-chip_bap_E6-8h_organism
- ChIP-chip_bin_E6-8h_organism
- ChIP-chip_Doc2_E6-8h_organism
- ChIP-chip_lmd_E6-8h_organism
- ChIP-chip_pMad_E6-8h_organism
- ChIP-chip_Mef2_E6-8h_organism
- ChIP-chip_pan_E6-8h_organism
- ChIP-chip_pnr_E6-8h_organism
- ChIP-chip_slp1_E6-8h_organism
- ChIP-chip_tin_E6-8h_organism
- ChIP-chip_ttk_E6-8h_organism
- ChIP-chip_twi_E6-8h_organism
- ChIP-chip_bap_E6-8h_organism
- ChIP (embryo, 8-10h) - bin Mef2
- ChIP (embryo, 10-12h) - bin Mef2
- Transcription Factor HOT spot analysis (modENCODE compiled, ChIP-chip, whole embryo) Genomic sequences identified as unique regions of transcription factor (TF) binding using HOT spot analysis (HSA); one or many TFs may bind in a given region. A synthesis of ChIP-chip data sets for 41 different transcription factors. TF binding profiles used in this analysis were assayed at early embryo stages. Each glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the mE1_TFBS_HSA FlyBase Dataset Report and references therein for more information.
- Zinc Finger TF Binding Sites (modENCODE compiled, ChIP-chip, whole embryo) Binding sites for transcriptions factors that contain one or more zinc finger domains. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
- Homeodomain TF Binding Sites (modENCODE compiled, ChIP-chip, whole embryo) Binding sites for transcriptions factors that contain one or more homeodomains. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
- Helix-loop-helix TF Binding Sites (modENCODE compiled, ChIP-chip, whole embryo) Binding sites for transcription factors that contain one or more helix-loop-helix domains. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
- BTB/POZ ChIP TF Binding Sites (modENCODE compiled, ChIP-chip, whole embryo) Binding sites for transcriptions factors that contain one or more BTB/POZ domains. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
- Other classes TF Binding Sites (modENCODE compiled, ChIP-chip, whole embryo) Binding sites for transcriptions factors that do not fall into one of the other categories. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See the below FlyBase Dataset Reports and references therein for more information.
Other Sequence Elements
- Origins of replication Genome profile of early activating origins of replication in Kc, BG3 and S2 cell lines identified by BrdU label/RepliSeq. See the mE_Early_Replication_Origins_cells FlyBase Dataset Report and references therein for more information. Each feature glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data.
Transcript Level Features
cDNA
- Sequenced cDNAs Mostly D. melanogaster cDNA sequences from large-scale submissions to sequence databases. Shows the exon (wider bars) and intron (black lines) structure, and direction of transcription. Each cDNA glyph is hyperlinked to its GenBank nucleotide entry. cDNA sequences are aligned to release 6 by NCBI and submitted to FlyBase. A few genomic DNA submissions, including third party submissions, are included in this tier.
- Inferred transcript sequences Mostly D. melanogaster sequences designated as Third Party Annotation:Inferential. Each glyph is hyperlinked to a pop-up window containing the ID, position, length, DNA sequence, and more. Sequences are aligned to release 6 by NCBI and submitted to FlyBase. Some genomic DNA submissions are included in this tier.
EST (Expressed Sequence Tag)
Partial sequence of a cDNA. Shows the exon (wider bars) and intron (black lines) structure, and direction of transcription. When zoomed out to greater than 100kb, a density plot is shown. Each cDNA glyph is hyperlinked to its GenBank nucleotide entry. Sequences are aligned to release 6 by NCBI and submitted to FlyBase.
RNA-Seq long reads
Long read RNA-Seq sequences from Gonzalez et al., 2023, as reported in the GTF file available on the Hilgers Lab website. This study identified approximately 60,000 distinct transcript sequences from combined adult ovary, adult head and whole embryonic tissues (whole cell extracts were used). FlyBase analyzed these long-read sequences for coding potential by comparing them to FlyBase polypeptide coding sequences (CDS), and sorted them into four tracks:
- RNA-Seq long reads encoding known CDS Sequences that encode a known (FlyBase-annotated) CDS.
- RNA-Seq long reads encoding novel CDS Sequences that encode a novel CDS (starting and ending with known/annotated start and stop codons).
- RNA-Seq long reads encoding partial CDS Sequences that encode a partial CDS (starting with an annotated start codon, or, ending with an annotated stop codon).
- RNA-Seq long reads encoding no known CDS Sequences that do not encode any CDS with an annotated start or annotated stop codon.
For all four tracks above, the sequence designation is displayed just below the left end of the sequence glyph: UTRs are shown in blue, CDSs are shown in yellow, and non-coding transcript exons are shown in red.
Transcription Start Sites
- Transcription start sites (modENCODE), embryo Transcription start site (TSS) regions identified by integrative analysis of ESTs, CAGE or RLM-RACE; glyphs are dotted lines with TSS peak profile overlaid. The arrow points in the direction of transcription. Hover over glyph for a pop-up window with basic information, click to go to the relevant Sequence Feature Report. Note: data for embryonic stages only. See the mE_Transcription_Start_Sites FlyBase Dataset Report and references therein for more information.
- Transcription start sites (RAMPAGE), peak calls Transcription start site regions (peak calls) identified by RAMPAGE-Seq across 36 stages of development. The arrow points in the direction of transcription. Hyperlinked to the relevant FlyBase Sequence Feature Report. See the TSS_RAMPAGE FlyBase Dataset Report and references therein for more information.
- Transcription start sites (RAMPAGE), early embryo 0-12hr, stranded RNA-Seq Profile of capped transcript 5' ends observed by RAMPAGE-Seq for 12 stages of early embryogenesis. See the TSS_RAMPAGE FlyBase Dataset Report and references therein for more information.
- Transcription start sites (RAMPAGE), late embryo 13-24hr, stranded RNA-Seq Profile of capped transcript 5' ends observed by RAMPAGE-Seq for 12 stages of late embryogenesis. See the TSS_RAMPAGE FlyBase Dataset Report and references therein for more information.
- Transcription start sites (RAMPAGE), larva/pupa/adult, stranded RNA-Seq Profile of capped transcript 5' ends observed by RAMPAGE-Seq for 12 larval, pupal or adult stages. See the TSS_RAMPAGE FlyBase Dataset Report and references therein for more information.
- Transcription start sites (MachiBase), stranded RNA-Seq Profile of capped transcript 5' ends observed by MachiBase CAGE-Seq for embryonic, larval, adult male and female (young or old) and S2 cell line samples. See the TSS_MachiBase FlyBase Dataset Report and references therein for more information.
Please note that the signal shown for the Transcription Start Site data (RAMPAGE and MachiBase) is one base off: i.e., the signal is shown one base to the left of where it should be. In the example to the right, the signal should map to the first base (3R:6864324, A) of the ftz transcript, but is instead shown one base to the left (3R:6864323, C). This is a known bug and we're working to fix this.
RNA-seq Exon Junctions
Orientation of the junction is indicated with an arrow. The portion of exons confirmed by RNA-Seq reads are represented by blue bars while introns are represented as lines. Hover over each exon junction glyph to view a pop-up window containing read counts from modENCODE and/or Baylor datasets and click to go to a FlyBase exon junction Sequence Feature Report. See the BCM_1_RNAseq_junctions and modENCODE_mRNA-Seq_U_junctions FlyBase Dataset Reports and references therein for more information.
Polyadenylation Sites
D. melanogaster polyadenylation sites identified in various tissues. Arrows indicate orientation (i.e., pointing toward the side to which the polyA extension is added). Each glyph is hyperlinked to a FlyBase Sequence Feature Report with supporting data. See pA_sites_Lai and pA_sites_Moreira FlyBase Dataset Reports and references therein for more information.
RNA Editing Sites
A-to-I RNA editing sites. Each glyph is hyperlinked to a FlyBase Sequence Feature Report containing read counts by developmental stage. See mE_A-to-I_RNA_Editing_Sites and Rosbash_Adult_Head_A-to-I_Editing_Sites FlyBase Dataset Reports and references therein for more information.
Protein Level Features
Protein Domains
- Protein domains (Pfam) Pfam protein domains identified in annotated proteins, as obtained from InterPro, are mapped to the genome. Hyperlinked to a Pfam report for the domain.
- Protein domains (SMART) SMART protein domains identified in annotated proteins are mapped to the genome. Hyperlinked to an InterPro report for the domain.
Peptide Sequences
- Developmental Proteomes Alignment of peptide sequences identified in various developmental proteomes to the reference genome. Each peptide glyph is hyperlinked to a FlyBase Sequence Feature Report with the peptide sequence. Data obtained from: Casas-Vila et al., 2017, Cao et al., 2020.
- Peptide Atlas Alignment of peptide sequences determined by mass spectroscopy, derived from polypeptides isolated from the sequenced strain at various developmental stages and from various tissues. Contributed by the Center for Model Organism Proteomes, SystemsX and Research Priority Project of the University of Zurich, Switzerland. For more information, see Peptide Atlas
Expression
RNA-Seq
These tracks contain RNA-Seq expression data for several different stages of development, specific tissues, types of tissue culture cells, or treatment conditions. Different samples are presented in different colors. The tracks are labeled with the sample identity. Some labels are obscured by the RNA-Seq signal. Moving laterally along the chromosome should take you to a visible label.
- modENCODE transcriptomes
- Developmental stages, unique reads modENCODE_mRNA-Seq_U Dataset report. These tracks contain RNA-Seq expression data for several different stages of development.
- Tissues
- Fat body and salivary glands
- Fat body and salivary glands
- Imaginal disc and other carcass
- Imaginal disc and other carcass
- Gonads and male accessory glands
- Gonads and male accessory glands
- Tissue culture cells, by strand (modENCODE Transcription Group) modENCODE_mRNA-Seq_cell.B Dataset report.
- Treatments/Conditions modENCODE_mRNA-Seq_treatments Dataset report.
- Knoblich lab- L3 CNS transcriptomes
- L3 CNS neuron
- L3 CNS neuroblast
- L3 CNS neuron
- Oliver lab- SRA Aggregated RNA-Seq
- Stranded RNA-Seq coverage data from Justin Fear and Brian Oliver that combines data from thousands of high quality SRA RNA-Seq accessions. These data provide an "average" view of the transcriptome. The exceptional read depth provides insight into regions of low transcription. Tracks are offered with signal cut-off set to 100 (high sensitivity), 1,000 (medium sensitivity) or 10,000 (low sensitivity) for viewing regions of low, medium or high signal, respectively. Signal mapping to the genomic plus strand are shown on top, with signal mapping to the genomic minus strand shown below. See the Oliver_aggregated_RNA-Seq_profile Dataset report for more details.
- FlyAtlas2 transcriptomes RNA-seq from larval and adult male or female tissues grouped into the four subsets below. See the FlyAtlas2 FlyBase Dataset report and references therein for further information.
- Nervous system
- RNA-Seq_Profile_FlyAtlas2_L3_CNS
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Brain
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Brain
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Head
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Head
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Eye
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Eye
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_ThoracicoAbdominalGanglion
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_ThoracicoAbdominalGanglion
- RNA-Seq_Profile_FlyAtlas2_L3_CNS
- Nervous system
- Digestive system
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Crop
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Crop
- RNA-Seq_Profile_FlyAtlas2_L3_Midgut
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Midgut
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Midgut
- RNA-Seq_Profile_FlyAtlas2_L3_Hindgut
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Hindgut
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Hindgut
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_RectalPad
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_RectalPad
- RNA-Seq_Profile_FlyAtlas2_L3_SalivaryGland
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_SalivaryGland
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_SalivaryGland
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Crop
- Digestive system
- Other systems and whole organism
- RNA-Seq_Profile_FlyAtlas2_L3_Trachea
- RNA-Seq_Profile_FlyAtlas2_L3_FatBody
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_FatBody
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_FatBody
- RNA-Seq_Profile_FlyAtlas2_L3_MalpighianTubule
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_MalpighianTubule
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_MalpighianTubule
- RNA-Seq_Profile_FlyAtlas2_L3_Carcass
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Carcass
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Carcass
- RNA-Seq_Profile_FlyAtlas2_L3_Whole
- RNA-Seq_Profile_FlyAtlas2_Adult_Female_Whole
- RNA-Seq_Profile_FlyAtlas2_Adult_Male_Whole
- RNA-Seq_Profile_FlyAtlas2_L3_Trachea
- Other systems and whole organism
Small RNA-Seq
These tracks contain RNA-Seq expression data for small RNA species (<30nt) that have been consolidated from various independent studies by sample type (developmental stage, tissue or cell line). Different samples are presented in different colors. The tracks are labeled with the sample identity. Some labels are obscured by the RNA-Seq signal. Moving laterally along the chromosome should take you to a visible label.
- Lai lab transcriptomes
- Developmental stages, stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_development Dataset Report.
- Tissues, stranded small RNA-Seq (Lai lab) See Lai_shortRNA-Seq_profiles_tissues Dataset Report.
- Cell lines
- Schneider and embryonic-derivedSee Lai_shortRNA-Seq_profiles_cells Dataset Report.
- Cell lines
- Imaginal disc-derived See Lai_shortRNA-Seq_profiles_cells Dataset Report.
- CNS-, ovary- and blood-derived See Lai_shortRNA-Seq_profiles_cells Dataset Report.
- FlyAtlas2 transcriptomes Small RNA-seq from larval and adult male or female tissues grouped into the four subsets below. See the FlyAtlas2 FlyBase Dataset report and references therein for further information.
- Nervous system
- microRNA-Seq_Profile_FlyAtlas2_L3_CNS
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Brain
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Brain
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Head
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Head
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Eye
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Eye
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_ThoracicoAbdominalGanglion
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_ThoracicoAbdominalGanglion
- microRNA-Seq_Profile_FlyAtlas2_L3_CNS
- Nervous system
- Digestive system
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Crop
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Crop
- microRNA-Seq_Profile_FlyAtlas2_L3_Midgut
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Midgut
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Midgut
- microRNA-Seq_Profile_FlyAtlas2_L3_Hindgut
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Hindgut
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Hindgut
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_RectalPad
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_RectalPad
- microRNA-Seq_Profile_FlyAtlas2_L3_SalivaryGland
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_SalivaryGland
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_SalivaryGland
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Crop
- Digestive system
- Reproductive system
- Reproductive system
- Other systems and whole organism
- microRNA-Seq_Profile_FlyAtlas2_L3_Trachea
- microRNA-Seq_Profile_FlyAtlas2_L3_FatBody
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_FatBody
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_FatBody
- microRNA-Seq_Profile_FlyAtlas2_L3_MalpighianTubule
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_MalpighianTubule
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_MalpighianTubule
- microRNA-Seq_Profile_FlyAtlas2_L3_Carcass
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Carcass
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Carcass
- microRNA-Seq_Profile_FlyAtlas2_L3_Whole
- microRNA-Seq_Profile_FlyAtlas2_Adult_Female_Whole
- microRNA-Seq_Profile_FlyAtlas2_Adult_Male_Whole
- microRNA-Seq_Profile_FlyAtlas2_L3_Trachea
- Other systems and whole organism
Gene Predictions
NCBI Gnomon, 2006 Predicted coding regions generated via a hidden Markov model using transcript alignment constraints and protein hit information, if available; allows prediction of alternatively spliced isoforms; submitted by J. Ostell. Each glyph is hyperlinked to a pop-up window containing supporting data. See the NCBI Gnomon Description Page for more information.
PhyloCSF (CONGO) Exon prediction; region of sequence conservation across multiple Drosophila species, with a pattern of conservation indicative of protein-coding and termini consistent with exon structure (start, splice or stop); submitted by M. Lin and M. Kellis. Each glyph is hyperlinked to a pop-up window containing sequence data. See related publications: Lin MF, Carlson, JW et al. (2007), Lin MF, Jungreis I, and Kellis M. (2011) for more information.