Difference between revisions of "FlyBase:Nomenclature"

From FlyBase Wiki
Jump to navigation Jump to search
(Section 4)
Line 266: Line 266:
  
 
'''3.2.8.''' An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., ''w<sup>*</sup>''.
 
'''3.2.8.''' An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., ''w<sup>*</sup>''.
 +
 +
==Transposons and Transgene Constructs==
 +
 +
Transposons or transgene constructs integrated into the Drosophila genome, if they cause a mutant phenotype, are both alleles and aberrations (similar to other classes of aberrations that are associated with mutant phenotypes). Where such insertions produce no mutant phenotype, they are named purely according to aberration conventions. Where transposon/transgene insertions produce a mutant phenotype by disrupting an endogenous gene, they are given names both as an allele of the mutated endogenous gene and as an aberration. The name of the allele follows conventions outlined in section 2. Rules for naming natural transposons and transgene constructs and their insertion into the genome follow.
 +
 +
Generic naturally occurring transposons are symbolized as ''ends{}'', where ''ends'' stands for the symbol of a given transposon, such as ''P'' for ''P-element''. ''Doc{}'', ''copia{}'' and ''P{}'' are examples. A defined natural variant of the transposon family can be named by including a symbol for that name inside the brackets. A specific insertion of a given transposon is described by including an additional unique symbol following the brackets.
 +
 +
Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, [http://flybase.org/reports/FBti0020021.html copia{}910] has the synonym TE20021.
 +
 +
Symbols for constructed transposons, or transgene constructs, must always include a construct symbol, which defines a particular construct. A '''full transgene construct genotype''' consists of the source of transposon ends, included genes, construct symbol, and insertion identifier, in the form ''ends{genes=construct-symbol}''. Once defined, ''ends{construct-symbol}'' (or less formally, ''construct-symbol'' alone) can be used in most circumstances to refer to a specific transgene construct. The symbol for a '''specific insertion''' of a given transgene construct has the form ''ends{construct-symbol}insertion-identifier''. Further details are given in the sections that follow.
 +
 +
Some examples:
 +
 +
[http://flybase.org/reports/FBtp0000359.html ''P{w+mC ovoD1-18=ovoD1-18}'']
 +
    the full genotype of the P-element transgene construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}'']
 +
 +
[http://flybase.org/reports/FBti0002104.html ''P{ovoD1-18}13X6'']
 +
    a viable insertion of the construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}'']
 +
 +
[http://flybase.org/reports/FBtp0000352.html P{Scer\GAL4wB w+mW.hs ''Ecol\ampR Ecol\ori=GawB}'']
 +
    the full genotype of the transgene construct [http://flybase.org/reports/FBtp0000352.html ''P{GawB}'']
 +
 +
[http://flybase.org/reports/FBti0002095.html ''P{GawB}h1J3'']
 +
    an insertion of the construct [http://flybase.org/reports/FBti0002095.html P{GawB}] that disrupts the [http://flybase.org/reports/FBgn0001168.html ''h''] gene
 +
 +
[http://flybase.org/reports/FBtp0000910.html ''H{w+mC Ecol\ori Tn\kanR Ecol\lacZHZ50a=Lw2}'']
 +
    the full genotype of the hobo transgene construct [http://flybase.org/reports/FBtp0000910.html ''H{Lw2}'']
 +
 +
[http://flybase.org/reports/FBti0002564.html ''H{Lw2}dpp151H'']
 +
    an insertion of the transgene construct [http://flybase.org/reports/FBtp0000910.html H{Lw2}] that disrupts the [http://flybase.org/reports/FBgn0000490.html ''dpp''] gene
 +
 +
This nomenclature is formally similar to that used for aberrations, where the ends{symbol} prefix is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations, and the identifier suffix is similar to the gene-allele suffix of aberrations with associated alleles, or the alphanumeric string suffix of other aberrations. Specific rules for assembling the components of a transgene construct genotype follow.
 +
 +
==='''Transposon ends.'''===
 +
Pairs of terminal repeats which together form a transposon are symbolized by opposing braces, {}. The source of the transposon ends is indicated outside the braces, at the left end of the string by a symbol derived from the name of the transposon family:
 +
 +
{| class="wikitable" border="1"
 +
|-
 +
! Transposon ends
 +
!
 +
! Transposon family
 +
|-
 +
| ''P''
 +
| =
 +
| P-element
 +
|-
 +
| ''H''
 +
| =
 +
| H-element (hobo)
 +
|-
 +
| ''I''
 +
| =
 +
| I-element
 +
|-
 +
| ''M''
 +
| =
 +
| mariner-element
 +
|-
 +
| ''Mi''
 +
| =
 +
| Minos-element 
 +
|}
 +
 +
'''4.1.1.''' Isolated terminal repeats are indicated with the family symbol followed by 3' or 5', e.g., P5' represents the isolated 5' end of a ''P{}'' transposon.
 +
 +
'''4.1.2.''' Multiple sets of matched transposon ends are indicated by nesting ''ends{}'' symbols, e.g., [http://flybase.org/reports/FBtp0005038.html ''P{I{neo[RT<nowiki>]</nowiki>W[+<nowiki>]</nowiki>}}'']. A ''P'' transgene construct containing [http://flybase.org/reports/FBal0039845.html ''ry<sup>+t7.2</sup>''] and an isolated ''hobo'' terminal repeat from the 5' end of a ''hobo'' element would be described as ''P{ry+t7.2 H5'}''.
 +
 +
Formally, this system can be extended to any insertion of mobile DNA, for example, the ''copia'', ''gypsy'' and ''FB'' elements. Thus, the [http://flybase.org/reports/FBal0002042.html ''ct<sup>MR2</sup>''] mutation, caused by the insertion of a gypsy element, is called ''gypsy{}ct<sup>MR2</sup>''. When a mobile element inserts into a mutant gene already carrying a mobile element, it is the new insertion that is named. For example, a jockey insertion into [http://flybase.org/reports/FBal0002042.html ''ct<sup>MR2</sup>''] generates [http://flybase.org/reports/FBal0030987.html ''ct<sup>MRpD</sup>''], this is called ''jockey{}ct<sup>MRpD</sup>''. The name describes the new insertion which has caused the new phenotype. A full genotype description, including all sets of transposable element ends, is only provided when the progenitor allele is also fully described.
 +
 +
FlyBase uses this nomenclature not only because of its rigor, but also because its more general use may be needed if such elements are engineered.
 +
 +
===Included genes.===
 +
A full transgene construct description lists within the braces all functional genes, including non-Drosophila genes such as antibiotic resistance genes, bacterial and phage origins of replication, and the ''FLP1'' recombination target (''FRT''), separated by spaces. The left-right order of these elements reflect their 5' to 3' order (with respect to the transposon ends) within the construct. If the order of a gene is unknown, it is placed at one end of the list, followed or preceded by a comma.
 +
 +
'''4.2.1. Drosophila melanogaster genes.''' Valid gene symbols are used to name ''D. melanogaster'' genes. Wild-type alleles of intact genes are indicated by a superscripted '+t' followed by an identifier, e.g., [http://flybase.org/reports/FBal0039845.html ''ry<sup>+t7.2</sup>''] or [http://flybase.org/reports/FBal0038104.html ''Adh<sup>+t3.2</sup>'']. A convenient identifier (used in these examples) is the size of the genomic fragment carrying the wild-type gene. Transgene-construct-borne genes that do not confer wild-type function are given unique allele designations without the preceding '+t', e.g., [http://flybase.org/reports/FBal0034664.html ''ftz<sup>B</sup>''] or [http://flybase.org/reports/FBal0036042.html ''y<sup>D225</sup>'']. Replacement of promoter or other control sequences can be indicated in the allele designation: [http://flybase.org/reports/FBal0049436.html ''dpp<sup>hs.PP</sup>''], e.g., for a [http://flybase.org/reports/FBgn0000490.html ''dpp''] gene controlled by a heat shock promoter.
 +
 +
'''4.2.2. Species of origin.''' Species of origin is indicated for non-''melanogaster'' Drosophila genes present in transgene constructs. A species code composed of the first letter of the genus (capitalized) and a three letter code, usually the first three letters of the species (lower case) is added to the gene symbol with a separating backslash, e.g., ''Dvir\Dfd<sup>+t7.6</sup>'' for the wild-type Deformed gene from Drosophila ''virilis'' (see paragraph 2.2.7.).
 +
 +
For genes from species other than those of Drosophila the valid gene symbols are used following a four-letter symbol, as above, indicating the species of origin, e.g., ''Hsap'', for humans, ''Gdom'', for chicken, ''Hsim'', for Herpes simplex, ''Ecol'' for E. coli etc. For viruses, the name or abbreviation, e.g., ''Abelson'', ''Adeno5'', ''Cmeg'', or symbolic name, e.g., ''T4'', ''M13'', the greek symbol lambda, is sometimes used instead of a genus-species-derived four-letter symbol. In all cases, these symbols are separated from the gene symbol by a backslash \. A file of these [http://flybase.org/static_pages/docs/abbreviations.html species abbreviations] is available on FlyBase.
 +
 +
FlyBase considers transposable elements, the mitochondrial DNA and other similar entities to be species (this is because each can contain several different genes). It is for this reason that, for example, the ''P-element Transposase'' has the symbol P\T in constructs.
 +
 +
'''4.2.3. Fusion genes.''' Fusion genes are defined (by FlyBase) as the fusion of protein coding regions of distinct genes constructed by ''in vitro'' mutagenesis. They are named using the gene symbols of their component parts, separated by a double colon, e.g., ''[http://flybase.org/reports/FBgn0012038.html Antp::Scr]'' or [http://flybase.org/reports/FBgn0026604.html ''Act88F::Scer\act1''] .
 +
 +
The order of gene symbols stated in the fusion gene will be alphabetical. The complexity of these constructs is such that were each to be named according to its molecular composition, for example in the 5' to 3' direction, the number of named fusion genes would rapidly become impractical.
 +
 +
An exception to the 'alphabetical order' rule will be made for cases where the fusion is between a ''D. melanogaster'' and a non- ''melanogaster'' gene. In such cases the ''melanogaster'' gene symbol will be stated first, e.g., [http://flybase.org/reports/FBgn0025729.html ''tra2::Hsap\SFRS2''].
 +
 +
For historic reasons, some promoter fusions involving reporter genes such as [http://flybase.org/reports/FBgn0014447.html ''Ecol\lacZ''], though technically protein fusions, are simply treated as alleles of [http://flybase.org/reports/FBgn0014447.html ''Ecol\lacZ'']. The symbol for the additional gene(s) contributing to the fusion indicated as part of a superscript, e.g., [http://flybase.org/reports/FBal0043886.html ''Ecol\lacZP\T.A92'']. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.
 +
 +
'''4.2.4. Modified genes.''' Modified genes, cDNAs and ''in vitro'' mutagenized sequences are treated as alleles, and will be curated by FlyBase as such. They should be named, therefore, by the same conventions used to name classical alleles. The following allele symbols have been assigned by FlyBase to the commonly used modified genes of ''D. melanogaster'':
 +
 +
 +
[http://flybase.org/reports/FBal0028610.html ''w<sup>+mC</sup>'']
 +
 +
The mini-white gene constructed by Pirrotta (1988) by deleting the ''Hin dIII- Xba I'' fragment from the long 5'-intron of the ''w<sup>+</sup>'' gene. Carried by Casper plasmids and their derivatives.
 +
 +
[http://flybase.org/reports/FBal0028611.html ''w<sup>+mW.hs</sup>'']
 +
 +
The mini-white gene constructed by Klemenz et al. (1987). Carried by the W6, W8 family of plasmids and their derivatives.
 +
 +
Genes modified by the addition of a tag allowing the product to be identified, marked or purified represents a special class of modified genes. Tags are used to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript to be identified by ''in situ'' hybridization. Tags are also be used to mark a protein, for purposes of purification (e.g., (His)<sub>6</sub>), for purposes of identification (epitope tags) or for purposes of targeting to a cellular compartment (nls tags). FlyBase considers as tags constructs designed for these purposes and curates these modified genes as alleles of the tagged gene. Tagged genes have symbols with the format '''T:y''' where ''T'' stands for Tag and ''y'' is the species\gene symbol of the tag, e.g., [http://flybase.org/reports/FBgn0015310.html ''T:Hsap\Myc''], [http://flybase.org/reports/FBgn0020413.html ''T:Ivir\HA1''], [http://flybase.org/reports/FBgn0015313.html ''T:Hsap\p53''], [http://flybase.org/reports/FBgn0015307.html ''T:Zzzz\His6''] (the Zzzz 'species' prefix is used when the tag is artificial).
 +
 +
A complete list of tagged gene symbols and their definitions is available from FlyBase through [http://flybase.org/ QuickSearch]. Change the 'Species' option from the default 'Dmel' to 'All species'. Ensure the 'Search' option is set as 'ID/Symbol/Name' and 'genes' is selected as the 'Data Class'. Type 'T:*' (don't use the quotation marks) in the 'Enter text' field and submit the query.
  
 
[[Category:Help]]
 
[[Category:Help]]

Revision as of 16:22, 17 June 2013


Preamble

The nomenclature guidelines below explain how FlyBase assigns canonical symbols and names to its genetic objects (genes, alleles, transposons, insertions, aberrations and balancers). We encourage the community and journals to adhere to FlyBase-approved symbols/names for consistency in published datasets. While these guidelines cover most circumstances, there may be exceptional cases not clearly covered here. Please contact FlyBase to discuss such cases or any other aspect of the nomenclature.

Policy for establishing FlyBase-approved gene symbols and names

Justification for unique approved symbols/names.

It is of great value to the research community that there is a single officially sanctioned (approved) symbol and name for each gene in FlyBase. Use of unique symbols/names, together with corresponding unique identifiers (e.g., FBgn numbers) minimizes ambiguity in referring to these genes in the scientific literature.

Assigning approved symbols/names.

It is inevitable that multiple synonyms for a gene arise in the literature, typically as a result of publications on the same gene by multiple laboratories or the realization that genes previously thought to be independent are actually part of the same genetic unit. In such cases, FlyBase adheres to the following rules for establishing or changing the approved gene symbol/name.

1.2.1. Chronological precedence. Approved gene symbols/names are normally established by the earliest date of publication of the proposed symbol/name in a peer-reviewed primary research paper. (No other form of publication is relevant to chronological precedence.)

1.2.2. Selection of lower or upper case of initial letter. Gene symbols/names begin with a lowercase letter if the gene is FIRST named for the phenotype of a recessive mutant allele, and begin with an uppercase letter if they are FIRST named for the phenotype of a dominant mutant allele. Gene symbols/names also begin with an uppercase letter if they are FIRST named for an aspect of the wild-type molecular function or activity of the gene product, which includes genes named after an ortholog or paralog.

1.2.3. Community usage. The chronological precedence and capitalization rules can be overridden in favor of an alternative gene symbol/name that is clearly favored by the research community. This can be on a gene-by-gene basis or to rationalize the nomenclature for an entire gene family or other functional grouping.

1.2.4. Placeholders. Certain classes of generic gene symbols/names are placeholders (see sections 2.3.1 and 2.4) and are subject to replacement by a more meaningful symbol/name according to the rules of 1.2.1, 1.2.2 and 1.2.5. However, generic symbols/names based on a phenotype shall be retained by FlyBase if they are re-used by the first peer-reviewed research paper to characterize that gene and/or are clearly favored by the research community.

1.2.5. Validity criteria. Authors' preferred symbols/names will be used as the FlyBase-approved gene symbols/names whenever possible. However, the validity criteria set out in section 2.2 must be adhered to, and FlyBase will modify authors' preferred gene symbols/names where necessary.

Gene symbols and names

Symbols versus names.

The gene symbol is typically an abbreviation of the full gene name and as such, should ordinarily consist of a minimal number of characters. The gene symbol and name should use comparable capitalization and character sets.

Requirements of FlyBase-approved Drosophila gene symbols and names.

2.2.1. Uniqueness. Each approved gene symbol and name must be unique amongst all FlyBase-approved symbols and names.

2.2.2. Relevance. The name should allude to the gene's function, mutant phenotype or other relevant characteristic.

2.2.3. Restricted and non-permissible characters. There are several characters which have specific meanings in a genotype string. Use of these characters in a gene symbol would complicate interpretation of genotypes. Therefore, approved gene symbols shall adhere to the following rules:

2.2.3.1. Approved symbols shall not contain the following characters: /, \, {, }, <, >, [, ], ;, *.

2.2.3.2. Approved symbols shall not contain spaces. Where a separator is needed to keep characters from losing meaning by running together, a hyphen "-" should be used.

2.2.3.3. Approved symbols shall not contain letters from any character sets other than English or Greek.

2.2.3.4. Colons ":" shall only be used in the approved symbols of certain classes of non-protein-coding genes, genes encoded in the mitochondrial genome, and synthetic fusion genes, as described in section 2.6.

2.2.3.5. Round brackets "( )" shall only be used in certain classes of approved gene symbols as separators to designate a chromosome or an allele whose phenotype is modified by the gene in question.

2.2.4. Capitalization. The rules governing the capitalization of the initial letter of gene symbols/names are described in sections 1.2.2 and 1.2.3 Subsequent letters are normally lowercase.

2.2.5. Superscripts and subscripts. Gene symbols and names should not normally contain superscripts or subscripts. The only exception is when an allele name is an integral part of a gene symbol or name, e.g., su(wa).

2.2.6. Italicization. All gene symbols and names should be italicized.

2.2.7. Genus/species prefixes. Genes from all species, except D. melanogaster, automatically get a unique species abbreviation prefix appended to their FlyBase-approved symbol (see section 2.5.1). Any different/additional indication of a gene's origin (e.g. D, Dro or Dm) is redundant and/or ambiguous and will not form part of the FlyBase-approved gene symbol/name.

2.2.8. Symbols and names must be inoffensive.

Common prefixes

2.3.1 Prefixes based on phenotype, EST or STS. Several generic gene symbol/name prefixes have been used for genes sharing a common mutant phenotype or originally identified by virtue of an EST or STS. A non-exhaustive list is shown below:

Class Prefix used in gene symbol *
anonymous gene anon-
Berkeley Drosophila Genome Project BEST:
EST cluster-based gene
enhancer e(a)m, E(a)m
European Drosophila Genome Project STS-based gene ESTS:
female sterile fs(n)m, Fs(n)m
lethal l(n)m
male sterile ms(n)m, Ms(n)m
male & female sterile mfs(n)m, Mfs(n)m
maternal mat(n)m, Mat(n)m
meiotic mutant mei-
Minute M(n)m
mitotic mutant mit(n)m, Mit(n)m
mutagen sensitive mus
NIDDK EST Project-based gene NEST:
resistance rst(n)m, Rst(n)m
suppressor su(a)m, Su(a)m
'tumor' tu(n)m, Tu(n)m

* n designates the chromosome, m a distinguishing symbol, and a a gene whose phenotype is modified by an enhancer or suppressor

Gene symbols/names using these generic prefixes are placeholders and are subject to replacement by a more meaningful symbol/name according to the rules set out in sections 1.2.1 and 1.2.4.

2.3.2. Prefixes based on common molecular function. Genes encoding products of similar molecular function may be given symbols/names with identical prefixes and unique suffixes. This is to be encouraged and FlyBase will rationalize the nomenclature for an entire gene family or other functional grouping if favored by the research community. Historically, the unique suffix may refer to a gene's cytological location (e.g. Actin-5C, Actin-42A, Actin-57B etc). More recently, the unique suffix may simply be an incremental numerical value (e.g. Sdic1, Sdic2, Sdic3 etc.), or reflect some other distinguishing feature, such as orthology with a reference data set (e.g. RpL3, RpL4, RpL5 etc.). Also see section 2.6.

Annotation IDs.

Gene annotation IDs, which are distinct from gene symbols, exist for all molecularly defined gene models in the 12 sequenced species of Drosophila.

2.4.1. Format. Annotation IDs are represented in a common way: a species-specific 2 letter prefix followed by a four or five digit integer. For historical reasons, there are two 2-letter prefixes for D. melanogaster: CG for protein-coding genes and CR for non-protein-coding-genes. For all other species, there is a single two-letter code to be used for gene models, regardless of which class of gene they identify.

Prefix Species
CG, CR Drosophila melanogaster
GA Drosophila pseudoobscura pseudoobscura
GD Drosophila simulans
GE Drosophila yakuba
GF Drosophila ananassae
GG Drosophila erecta
GH Drosophila grimshawi
GI Drosophila mojavensis
GJ Drosophila virilis
GK Drosophila willistoni
GL Drosophila persimilis
GM Drosophila sechellia

2.4.2. Use as approved gene symbols. In the absence of other information, the annotation ID is used as a placeholder for the gene symbol (while the gene name field is left blank) and is subject to replacement by a more meaningful symbol/name according to the rules set out in sections 1.2.1, 1.2.2 and 1.2.4.

Approved gene symbols/names for non-D. melanogaster genes.

FlyBase includes genes from all species of Drosophilidae plus genes from other species that have been introduced into Drosophila.

2.5.1. Species abbreviation prefixes. For species other than Drosophila melanogaster, the FlyBase-approved gene symbol follows a species abbreviation indicating the species of origin. The prefix has the form 'Nnnn\', where N is the initial letter of the genus and nnn is a unique code for a given species of that genus, usually the first three letters of the species name. (For example, Dsim is the species abbreviation for Drosophila simulans.) A complete list of valid abbreviations is available on the species abbreviations page. By convention, a 'Dmel' prefix is not used for D. melanogaster gene symbols in FlyBase (unless this is important in context). Gene names are not prefixed with species information.

2.5.2 Approved gene symbols/names. The FlyBase-approved gene symbols/names may correspond to the meaningful symbol/name of the D. melanogaster orthologs, distinguished by the relevant species prefix (as described in 2.5.1). (It should be noted that the assignment of orthology can be problematic in the absence of whole genome sequence information.) D. melanogaster gene symbols/names that are defined as placeholders (see sections 2.3.1 and 2.4) or contain D. melanogaster-specific cytological information should not be used as the symbols/names of orthologs in other species.

Special cases.

2.6.1. rRNA genes. Genes encoding ribosomal RNAs have symbols of the format 'nSrRNA', where n denotes the respective rRNA's sedimentation rate in Svedberg units, e.g., 28SrRNA. By historical convention, the locus containing the genes encoding the 5.8SrRNA, 18SrRNA and 28SrRNA is called bobbed (bb).

2.6.2. tRNA genes. Genes encoding transfer RNAs have symbols of the format 'tRNA:Xn:m', where X is the 1-letter amino-acid code (in upper-case); n is a number signifying the particular isoform; m is the cytogenetic map position of the gene; and a (if used) is a lower-case letter to distinguish between functionally similar tRNA genes mapping to the same location, e.g., tRNA:S7:23Ea.

2.6.3. snRNA genes. Genes encoding small nuclear RNAs have symbols of the format 'snRNA:XX:ma', where XX is the type of snRNA; m is the cytogenetic map position of the gene; and a (if used) is a lower-case letter to distinguish functionally similar snRNA genes mapping to the same location, e.g., snRNA:U6:96Aa.

2.6.4. snoRNA genes. Genes encoding small nucleolar RNAs have symbols of the format 'snoRNA:X'. X usually represents the type of modification catalyzed and/or the substrate, e.g. snoRNA:MeU2-C28, which encodes a snoRNA that guides methylation of nucleotide C28 of the U2 snRNA; or snoRNA:Ψ28S-612, which encodes a snoRNA that guides pseudouridylation of nucleotide 612 of the 28S rRNA. If the substrate is unknown, then 'Or' is used in the symbol to indicate that it encodes an 'Orphan' snoRNA. A suffix is used where necessary to distinguish functionally similar snoRNA genes, e.g., snoRNA:Me18S-G1358b, or snoRNA:U3:9B (where the suffix is based on cytogenetic position).

2.6.5. miRNA genes. Genes encoding microRNAs have symbols of the format 'mir-N', where N is simply a sequential number according to the conventions outlined in Ambros, Bartel, et. al. 2003 e.g., mir-125.

2.6.6. Pseudogenes. Pseudogenes have symbols of the format symbol_of_parental_gene-psX, where X (if used) is a number to distinguish between multiple pseudogene copies of a particular parental gene. If only one pseudogene copy of a particular gene has been found, it should be given the suffix -ps1.

2.6.7. Mitochondrial genes. Genes encoded by the mitochondrial genome have symbols prefixed with 'mt:', e.g., mt:ND4.

2.6.8. Ribosomal protein genes. Genes encoding ribosomal proteins are named based on orthology to their mammalian counterparts. Genes encoding cytoplasmic ribosomal proteins have symbols of the format 'RpSn' or 'RpLn', where S denotes a gene encoding a protein of the small subunit and L a gene encoding a protein of the large subunit, and n is a number reflecting orthology, e.g., RpL3, RpS6. Genes encoding mitochondrial ribosomal proteins have symbols of a similar format and are prefixed with 'm', e.g., mRpL1, mRpS2. Some ribosomal proteins are encoded by duplicate genes; these are distinguished by using a a or b suffix, e.g., RpS14a and RpS14b. Some ribosomal protein genes were originally named after a mutant phenotype, e.g. sop or tko; these have been retained as the approved gene symbols/names in FlyBase.

Allele symbols and names

Superscripts.

Alleles at a particular gene are designated by the same name and symbol and are differentiated by distinguishing superscripts. In written text the allele designation may be separated from that of the gene by a hyphen, e.g., white-apricot.

Symbols.

Allele symbols should be short, preferably no more than three characters long, and cannot contain spaces, superscripts, or subscripts. Whenever possible superscript characters should be limited to the following set:

a-z A-Z 0-9 - + : .

The + symbol is reserved for the wild-type allele. Consecutive allele numbers should be used wherever possible.

Greek characters may be used but are discouraged.

The character \ is reserved in all gene symbol contexts for species identification.

The character / is reserved as a homologue separator in genotypes and cannot be used in allele symbols.

In text in which superscripting is not possible, such as ASCII files, superscripted text should be enclosed between the characters [ and ].

FlyBase makes exceptions to the brevity rule when recording in vitro mutagenesis constructs that are represented with alleles. Where these are not otherwise named FlyBase confers symbols according to a system including the initial of the last name of the first author of the first paper in which the allele was initially reported ('I' in the following examples). The most frequently used classes include:

Symbol Meaning
cIa for 'construct a of Author-lastname'
Scer\UAS.cIa for 'S. cerevisiae UAS construct a of Author-lastname'
tIa for 'transgene a of Author-lastname'
mIa for 'minigene a of Author-lastname'
hs.PI for 'heat shock construct of Author-lastname'
gene_symbol.PI for 'gene promoter fusion of Author-lastname'

In addition, exceptions have been required for some large series of alleles and collections of mutations. Nevertheless, brevity of allele symbols is very much to be encouraged.

3.2.1. It is unacceptable to use, as a superscripted allele symbol, elements of the genotype in which the allele arose, since such a designation implies something more than a trivial connection between allele and element. Alleles that are revertants of a pre-existing allele are an exception to this rule.

3.2.2. While historically, the numeral 1 has been the implied superscript of nonsuperscripted symbols, this practice has created considerable ambiguity and is now discouraged. As with all other alleles, the numeral 1 should be explicitly designated (e.g., sc1, not sc).

3.2.3. For a recessive allele of a gene named as a dominant, or a dominant allele of a gene named as a recessive, the superscripts r and D, respectively, may be used; e.g., Hnr, Hnr2, and ciD.

3.2.4. For a wild-type allele, a superscripted plus character may be used; e.g., b+ or B+. The plus symbol alone implies the normal (wild-type) allele or alleles in any context, such as y1/+.

It may be necessary to distinguish among more than one 'wild-type' allele. In such cases the different wild-type alleles should be given a distinguishing number, which would follow the + character in the superscript, e.g., ry+3.

3.2.5. Absence of a particular locus may informally be noted by use of a superscript minus character with the symbol; e.g., bb-. This is not acceptable as a designation of a particular allele.

3.2.6. Revertants or partial revertants of mutant alleles are designated by the superscript rv followed by a distinguishing number; these are placed after the allele designator, e.g., D4rv32, the 32nd revertant of D4. Revertants of dominant mutations that are deficiencies are treated not as alleles but as deficiencies and are accordingly not superscripted but listed with the distinguishing number, e.g., Df(2L)Scorv4.

3.2.7. Alleles specifying the absence of a particular enzyme or other protein are designated by the superscript n (null) followed by a distinguishing number or letter, e.g., Adhn1, or, where lack of function is inviable, by l (lethal), followed by a distinguishing number, e.g., Nrgl2.

3.2.8. An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., w*.

Transposons and Transgene Constructs

Transposons or transgene constructs integrated into the Drosophila genome, if they cause a mutant phenotype, are both alleles and aberrations (similar to other classes of aberrations that are associated with mutant phenotypes). Where such insertions produce no mutant phenotype, they are named purely according to aberration conventions. Where transposon/transgene insertions produce a mutant phenotype by disrupting an endogenous gene, they are given names both as an allele of the mutated endogenous gene and as an aberration. The name of the allele follows conventions outlined in section 2. Rules for naming natural transposons and transgene constructs and their insertion into the genome follow.

Generic naturally occurring transposons are symbolized as ends{}, where ends stands for the symbol of a given transposon, such as P for P-element. Doc{}, copia{} and P{} are examples. A defined natural variant of the transposon family can be named by including a symbol for that name inside the brackets. A specific insertion of a given transposon is described by including an additional unique symbol following the brackets.

Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, copia{}910 has the synonym TE20021.

Symbols for constructed transposons, or transgene constructs, must always include a construct symbol, which defines a particular construct. A full transgene construct genotype consists of the source of transposon ends, included genes, construct symbol, and insertion identifier, in the form ends{genes=construct-symbol}. Once defined, ends{construct-symbol} (or less formally, construct-symbol alone) can be used in most circumstances to refer to a specific transgene construct. The symbol for a specific insertion of a given transgene construct has the form ends{construct-symbol}insertion-identifier. Further details are given in the sections that follow.

Some examples:

P{w+mC ovoD1-18=ovoD1-18}

   the full genotype of the P-element transgene construct P{ovoD1-18}

P{ovoD1-18}13X6

   a viable insertion of the construct P{ovoD1-18}

P{Scer\GAL4wB w+mW.hs Ecol\ampR Ecol\ori=GawB}

   the full genotype of the transgene construct P{GawB}

P{GawB}h1J3

   an insertion of the construct P{GawB} that disrupts the h gene

H{w+mC Ecol\ori Tn\kanR Ecol\lacZHZ50a=Lw2}

   the full genotype of the hobo transgene construct H{Lw2}

H{Lw2}dpp151H

   an insertion of the transgene construct H{Lw2} that disrupts the dpp gene

This nomenclature is formally similar to that used for aberrations, where the ends{symbol} prefix is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations, and the identifier suffix is similar to the gene-allele suffix of aberrations with associated alleles, or the alphanumeric string suffix of other aberrations. Specific rules for assembling the components of a transgene construct genotype follow.

Transposon ends.

Pairs of terminal repeats which together form a transposon are symbolized by opposing braces, {}. The source of the transposon ends is indicated outside the braces, at the left end of the string by a symbol derived from the name of the transposon family:

Transposon ends Transposon family
P = P-element
H = H-element (hobo)
I = I-element
M = mariner-element
Mi = Minos-element

4.1.1. Isolated terminal repeats are indicated with the family symbol followed by 3' or 5', e.g., P5' represents the isolated 5' end of a P{} transposon.

4.1.2. Multiple sets of matched transposon ends are indicated by nesting ends{} symbols, e.g., P{I{neo[RT]W[+]}}. A P transgene construct containing ry+t7.2 and an isolated hobo terminal repeat from the 5' end of a hobo element would be described as P{ry+t7.2 H5'}.

Formally, this system can be extended to any insertion of mobile DNA, for example, the copia, gypsy and FB elements. Thus, the ctMR2 mutation, caused by the insertion of a gypsy element, is called gypsy{}ctMR2. When a mobile element inserts into a mutant gene already carrying a mobile element, it is the new insertion that is named. For example, a jockey insertion into ctMR2 generates ctMRpD, this is called jockey{}ctMRpD. The name describes the new insertion which has caused the new phenotype. A full genotype description, including all sets of transposable element ends, is only provided when the progenitor allele is also fully described.

FlyBase uses this nomenclature not only because of its rigor, but also because its more general use may be needed if such elements are engineered.

Included genes.

A full transgene construct description lists within the braces all functional genes, including non-Drosophila genes such as antibiotic resistance genes, bacterial and phage origins of replication, and the FLP1 recombination target (FRT), separated by spaces. The left-right order of these elements reflect their 5' to 3' order (with respect to the transposon ends) within the construct. If the order of a gene is unknown, it is placed at one end of the list, followed or preceded by a comma.

4.2.1. Drosophila melanogaster genes. Valid gene symbols are used to name D. melanogaster genes. Wild-type alleles of intact genes are indicated by a superscripted '+t' followed by an identifier, e.g., ry+t7.2 or Adh+t3.2. A convenient identifier (used in these examples) is the size of the genomic fragment carrying the wild-type gene. Transgene-construct-borne genes that do not confer wild-type function are given unique allele designations without the preceding '+t', e.g., ftzB or yD225. Replacement of promoter or other control sequences can be indicated in the allele designation: dpphs.PP, e.g., for a dpp gene controlled by a heat shock promoter.

4.2.2. Species of origin. Species of origin is indicated for non-melanogaster Drosophila genes present in transgene constructs. A species code composed of the first letter of the genus (capitalized) and a three letter code, usually the first three letters of the species (lower case) is added to the gene symbol with a separating backslash, e.g., Dvir\Dfd+t7.6 for the wild-type Deformed gene from Drosophila virilis (see paragraph 2.2.7.).

For genes from species other than those of Drosophila the valid gene symbols are used following a four-letter symbol, as above, indicating the species of origin, e.g., Hsap, for humans, Gdom, for chicken, Hsim, for Herpes simplex, Ecol for E. coli etc. For viruses, the name or abbreviation, e.g., Abelson, Adeno5, Cmeg, or symbolic name, e.g., T4, M13, the greek symbol lambda, is sometimes used instead of a genus-species-derived four-letter symbol. In all cases, these symbols are separated from the gene symbol by a backslash \. A file of these species abbreviations is available on FlyBase.

FlyBase considers transposable elements, the mitochondrial DNA and other similar entities to be species (this is because each can contain several different genes). It is for this reason that, for example, the P-element Transposase has the symbol P\T in constructs.

4.2.3. Fusion genes. Fusion genes are defined (by FlyBase) as the fusion of protein coding regions of distinct genes constructed by in vitro mutagenesis. They are named using the gene symbols of their component parts, separated by a double colon, e.g., Antp::Scr or Act88F::Scer\act1 .

The order of gene symbols stated in the fusion gene will be alphabetical. The complexity of these constructs is such that were each to be named according to its molecular composition, for example in the 5' to 3' direction, the number of named fusion genes would rapidly become impractical.

An exception to the 'alphabetical order' rule will be made for cases where the fusion is between a D. melanogaster and a non- melanogaster gene. In such cases the melanogaster gene symbol will be stated first, e.g., tra2::Hsap\SFRS2.

For historic reasons, some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of Ecol\lacZ. The symbol for the additional gene(s) contributing to the fusion indicated as part of a superscript, e.g., Ecol\lacZP\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.

4.2.4. Modified genes. Modified genes, cDNAs and in vitro mutagenized sequences are treated as alleles, and will be curated by FlyBase as such. They should be named, therefore, by the same conventions used to name classical alleles. The following allele symbols have been assigned by FlyBase to the commonly used modified genes of D. melanogaster:


w+mC

The mini-white gene constructed by Pirrotta (1988) by deleting the Hin dIII- Xba I fragment from the long 5'-intron of the w+ gene. Carried by Casper plasmids and their derivatives.

w+mW.hs

The mini-white gene constructed by Klemenz et al. (1987). Carried by the W6, W8 family of plasmids and their derivatives.

Genes modified by the addition of a tag allowing the product to be identified, marked or purified represents a special class of modified genes. Tags are used to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript to be identified by in situ hybridization. Tags are also be used to mark a protein, for purposes of purification (e.g., (His)6), for purposes of identification (epitope tags) or for purposes of targeting to a cellular compartment (nls tags). FlyBase considers as tags constructs designed for these purposes and curates these modified genes as alleles of the tagged gene. Tagged genes have symbols with the format T:y where T stands for Tag and y is the species\gene symbol of the tag, e.g., T:Hsap\Myc, T:Ivir\HA1, T:Hsap\p53, T:Zzzz\His6 (the Zzzz 'species' prefix is used when the tag is artificial).

A complete list of tagged gene symbols and their definitions is available from FlyBase through QuickSearch. Change the 'Species' option from the default 'Dmel' to 'All species'. Ensure the 'Search' option is set as 'ID/Symbol/Name' and 'genes' is selected as the 'Data Class'. Type 'T:*' (don't use the quotation marks) in the 'Enter text' field and submit the query.