Difference between revisions of "FlyBase:Polypeptide Report"

Revision as of 22:55, 14 December 2017

Last Updated: 21 May 2015

The Polypeptide Report provides information on individual annotated polypeptides. Annotated polypeptides are derived from annotated Transcripts by calculating the open reading frame defined by the annotated translation start and stop sites. This generally represents the largest possible open reading frame, assuming this is consistent with conservation among the Drosophila species, but there are exceptions. In over 150 cases in D. melanogaster, a downstream ATG is annotated based on the PhyloCSF exon prediction algorithm and a small number of genes have been annotated with a non-canonical translation start. These exceptions are noted in comments attached to the relevant transcripts. Since an annotated polypeptide is created for every annotated coding transcript, multiple annotated polypeptides for a given gene may be identical in amino acid sequence. Annotated polypeptides may or may not correspond exactly to polypeptides described in the literature. For more information about curated polypeptides for a given gene, go to subsection Polypeptide Data, field Reported protein sizes in the Gene report for that gene.

This is a field-by-field guide to the information provided in the Polypeptide Report.

General Information

Symbol	The valid symbol that is used in FlyBase for the polypeptide. The first part of the symbol (before the '\') is the standard prefix for the species (from the Species Abbreviations list). For species other than D.melanogaster, the species prefix is displayed wherever the polypeptide symbol is used throughout FlyBase. For D.melanogaster polypeptides, the species prefix is only displayed in the GENERAL INFORMATION section at the top of a Report.
Annotation Symbol	The current symbol for the annotation that represents the polypeptide.
Associated gene	The gene that encodes the polypeptide. Clicking on the gene symbol will take you to the relevant Gene Report.
Species	The organism that the polypeptide originates from, with the initial letter of the genus and the full species name listed.
FlyBase ID	The FlyBase identifier number of the polypeptide, used to uniquely identify the polypeptide in the database.
Length (aa)	The length in amino acid residues of the polypeptide.
Theoretical pI	The theoretical pI was calculated from the predicted amino acid sequence of the annotated protein using the BioPerl pICalculator module and the EMBOSS set of pK values for individual amino acids.
Predicted MW (kD)	The Predicted MW was calculated from the predicted amino acid sequence of the annotated protein using the BioPerl SeqStats module.

Genomic Location

Genomic Maps - Links in the left hand panel take you to the corresponding region in the GBrowse or JBrowse genome viewer. On the right is a GBrowse shapshot showing the gene region plus 2kb on either side of the gene. The snapshot includes polypeptides of the gene of interest, plus polypeptides of neighboring genes. Clicking on the genes or polypeptides in the snapshot takes you to the associated gene or polypeptide report.

Protein Domains

Protein domains called by Pfam and SMART are presented. In each section a graphic is shown with the protein domains called by Pfam or SMART appearing as colored bubbles along the polypeptide. Mousing over a domain causes a popup to appear that contains a description and the called amino acid start and end coordinates for that domain.

Underneath the graphic is a table that lists the InterPro name, classification and called start and end amino acid position for each domain. The InterPro names are linked to the InterPro report for that domain.

Sequence

The amino acid sequence of the polypeptide.

Sequence Downloader

Sequence Downloader - Click on the link to the left to go to the Sequence Downloader tool. This opens the Sequence Downloader tool with the polypeptde ID entered in the ID box at the top and the 'Translations' mode selected. The amino acid sequence of the polypeptide appears below. A ‘Type’ menu allows you to toggle between ‘Translations’ and ‘CDS’. To change the mode, choose a sequence type and click on “View Sequence”. (Note: additional sequence region options are available if you access the Sequence Downloader tool from the gene page or from the “Tools” dropdown menu).

Find the symbol, ID, genome coordinates, length and entity coordinates for your selected sequence region below.

Options

To retrieve the FASTA sequence, click on the icon to the right of the ID.
To find the coordinates of a particular span of the polypeptide (or CDS), mouse over a region of the sequence. The amino acid (or nucleotide) coordinates for the selected region will be displayed in the “Selected region” field.
Search for a sequence of interest in the Search box. For an explanation of the regular expressions that can be used to search, click on the icon to the right of the “Search in sequence” box.

Other Products of this Gene

Transcripts

A table of transcripts encoded by the same gene, which lists each transcript symbol, its Primary FlyBase identifier number and its length in nucleotides.

The table is subdivided into the transcript that encodes this polypeptide and transcripts that encode other polypeptides encoded by the same gene.

Clicking on a transcript symbol will take you to the relevant Transcript Report.

Other Polypeptides

A table of other polypeptides encoded by the same gene, which lists each polypeptide symbol, its Primary FlyBase identifier number and its length in amino acid residues.

Clicking on a polypeptide symbol will take you to the relevant Polypeptide Report.

External Crossreferences

A listing of RefSeq and DDBJ/EMBL/Genbank sequence accession numbers corresponding to the polypeptide.

Clicking on an accession number will take you to the appropriate entry in the GenBank database.

Synonyms

A list of symbols that have been used in the literature, or by FlyBase, to describe the polypeptide.

References

A list of publications that discuss the polypeptide, subdivided into fields by type of publication. Publications which discuss the associated gene but not this particular polypeptide can be found on the Gene Report. Only those fields containing data are displayed in an individual Polypeptide Report.

@@ Line 1: / Line 1: @@
 Last Updated: 21 May 2015
-The '''Polypeptide Report''' provides information on individual annotated polypeptides. Annotated polypeptides are derived from Annotated Transcripts by calculating the open reading frame defined by the annotated translation start and stop sites. This generally represents the largest possible open reading frame, assuming this is consistent with conservation among the Drosophila species, but there are exceptions. In over 150 cases in ''D. melanogaster'', a downstream ATG is annotated based on the PhyloCSF exon prediction algorithm and a small number of genes have been annotated with a non-canonical translation start.  These exceptions are noted in comments attached to the relevant transcripts. Since an annotated polypeptide is created for every annotated coding transcript, multiple annotated polypeptides for a given gene may be identical in amino acid sequence. Annotated polypeptides may or may not correspond exactly to polypeptides described in the literature. For more information about curated polypeptides, see the '''Gene Report''', subsection '''Polypeptide Data''', field '''Reported protein sizes'''.
+The '''Polypeptide Report''' provides information on individual annotated polypeptides. Annotated polypeptides are derived from annotated Transcripts by calculating the open reading frame defined by the annotated translation start and stop sites. This generally represents the largest possible open reading frame, assuming this is consistent with conservation among the Drosophila species, but there are exceptions. In over 150 cases in ''D. melanogaster'', a downstream ATG is annotated based on the PhyloCSF exon prediction algorithm and a small number of genes have been annotated with a non-canonical translation start.  These exceptions are noted in comments attached to the relevant transcripts. Since an annotated polypeptide is created for every annotated coding transcript, multiple annotated polypeptides for a given gene may be identical in amino acid sequence. Annotated polypeptides may or may not correspond exactly to polypeptides described in the literature. For more information about curated polypeptides for a given gene, go to subsection '''Polypeptide Data''', field '''Reported protein sizes''' in the '''Gene report''' for that gene.
 This is a field-by-field guide to the information provided in the '''Polypeptide Report'''.
@@ Line 26: / Line 26: @@
 |-
 |'''Predicted MW (kD)''' || The Predicted MW was calculated from the predicted amino acid sequence of the annotated protein using the BioPerl SeqStats module.
-|-
-|'''Map'''	 || GBrowse shapshot showing gene region plus 2kb on either side of the gene. Snapshot includes polypeptides of the gene of interest, plus polypeptides of neighboring genes.
 |}
+==Genomic Location==
+Genomic Maps - Links in the left hand panel take you to the corresponding region in the GBrowse or JBrowse genome viewer.
+On the right is a GBrowse shapshot showing the gene region plus 2kb on either side of the gene. The snapshot includes polypeptides of the gene of interest, plus polypeptides of neighboring genes. Clicking on the genes or polypeptides in the snapshot takes you to the associated gene or polypeptide report.
+==Protein Domains==
+Protein domains called by Pfam and SMART are presented. In each section a graphic is shown with the protein domains called by Pfam or SMART appearing as colored bubbles along the polypeptide. Mousing over a domain causes a popup to appear that contains a description and the called amino acid start and end coordinates for that domain.
+Underneath the graphic is a table that lists the InterPro name, classification and called start and end amino acid position for each domain. The InterPro names are linked to the InterPro report for that domain.
 ==Sequence==
 The amino acid sequence of the polypeptide.
+==Sequence Downloader==
+Sequence Downloader - Click on the link to the left to go to the Sequence Downloader tool. This opens the Sequence Downloader tool with the polypeptde ID entered in the ID box at the top and the 'Translations' mode selected. The amino acid sequence of the polypeptide appears below.
+A ‘Type’ menu allows you to toggle between ‘Translations’ and ‘CDS’. To change the mode, choose a sequence type and click on “View Sequence”. (Note: additional sequence region options are available if you access the Sequence Downloader tool from the gene page or from the “Tools” dropdown menu).
+Find the symbol, ID, genome coordinates, length and entity coordinates for your selected sequence region below.
+Options
+*To retrieve the FASTA sequence, click on the icon to the right of the ID.
+*To find the coordinates of a particular span of the polypeptide (or CDS), mouse over a region of the sequence. The amino acid (or nucleotide) coordinates for the selected region will be displayed in the “Selected region” field.
+*Search for a sequence of interest in the Search box. For an explanation of the regular expressions that can be used to search, click on the icon to the right of the “Search in sequence” box.
 ==Other Products of this Gene==
@@ Line 52: / Line 73: @@
 ==External Crossreferences==
-A table of DDBJ/EMBL/Genbank sequence accession numbers corresponding to the polypeptide.
+A listing of RefSeq and DDBJ/EMBL/Genbank sequence accession numbers corresponding to the polypeptide.
-Clicking on the accession number will take you to the appropriate entry in the GenBank database.
+Clicking on an accession number will take you to the appropriate entry in the GenBank database.
 ==Synonyms==