Difference between revisions of "FlyBase:Nomenclature"
m (→Symbols.) |
|||
Line 280: | Line 280: | ||
[http://flybase.org/reports/FBtp0000359.html ''P{w<sup>+mC</sup> ovo<sup>D1-18</sup>=ovoD1-18}''] | [http://flybase.org/reports/FBtp0000359.html ''P{w<sup>+mC</sup> ovo<sup>D1-18</sup>=ovoD1-18}''] | ||
+ | |||
the full genotype of the P-element transgene construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}''] | the full genotype of the P-element transgene construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}''] | ||
[http://flybase.org/reports/FBti0002104.html ''P{ovoD1-18}13X6''] | [http://flybase.org/reports/FBti0002104.html ''P{ovoD1-18}13X6''] | ||
+ | |||
a viable insertion of the construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}''] | a viable insertion of the construct [http://flybase.org/reports/FBtp0000359.html ''P{ovoD1-18}''] | ||
[http://flybase.org/reports/FBtp0000352.html P{Scer\GAL4<sup>wB>/sup> w<sup>+mW.hs</sup> ''Ecol\ampR Ecol\ori=GawB}''] | [http://flybase.org/reports/FBtp0000352.html P{Scer\GAL4<sup>wB>/sup> w<sup>+mW.hs</sup> ''Ecol\ampR Ecol\ori=GawB}''] | ||
+ | |||
the full genotype of the transgene construct [http://flybase.org/reports/FBtp0000352.html ''P{GawB}''] | the full genotype of the transgene construct [http://flybase.org/reports/FBtp0000352.html ''P{GawB}''] | ||
[http://flybase.org/reports/FBti0002095.html ''P{GawB}h<sup>1J3</sup>''] | [http://flybase.org/reports/FBti0002095.html ''P{GawB}h<sup>1J3</sup>''] | ||
+ | |||
an insertion of the construct [http://flybase.org/reports/FBti0002095.html P{GawB}] that disrupts the [http://flybase.org/reports/FBgn0001168.html ''h''] gene | an insertion of the construct [http://flybase.org/reports/FBti0002095.html P{GawB}] that disrupts the [http://flybase.org/reports/FBgn0001168.html ''h''] gene | ||
[http://flybase.org/reports/FBtp0000910.html ''H{w<sup>+mC</sup> Ecol\ori Tn\kanR Ecol\lacZ<sup>HZ50a</sup>=Lw2}''] | [http://flybase.org/reports/FBtp0000910.html ''H{w<sup>+mC</sup> Ecol\ori Tn\kanR Ecol\lacZ<sup>HZ50a</sup>=Lw2}''] | ||
+ | |||
the full genotype of the hobo transgene construct [http://flybase.org/reports/FBtp0000910.html ''H{Lw2}''] | the full genotype of the hobo transgene construct [http://flybase.org/reports/FBtp0000910.html ''H{Lw2}''] | ||
[http://flybase.org/reports/FBti0002564.html ''H{Lw2}dpp<sup>151H</sup>''] | [http://flybase.org/reports/FBti0002564.html ''H{Lw2}dpp<sup>151H</sup>''] | ||
+ | |||
an insertion of the transgene construct [http://flybase.org/reports/FBtp0000910.html H{Lw2}] that disrupts the [http://flybase.org/reports/FBgn0000490.html ''dpp''] gene | an insertion of the transgene construct [http://flybase.org/reports/FBtp0000910.html H{Lw2}] that disrupts the [http://flybase.org/reports/FBgn0000490.html ''dpp''] gene | ||
Revision as of 17:11, 13 February 2014
Preamble
The nomenclature guidelines below explain how FlyBase assigns canonical symbols and names to its genetic objects (genes, alleles, transposons, insertions, aberrations and balancers). We encourage the community and journals to adhere to FlyBase-approved symbols/names for consistency in published datasets. While these guidelines cover most circumstances, there may be exceptional cases not clearly covered here. Please contact FlyBase to discuss such cases or any other aspect of the nomenclature.
Policy for establishing FlyBase-approved gene symbols and names
Justification for unique approved symbols/names.
It is of great value to the research community that there is a single officially sanctioned (approved) symbol and name for each gene in FlyBase. Use of unique symbols/names, together with corresponding unique identifiers (e.g., FBgn numbers) minimizes ambiguity in referring to these genes in the scientific literature.
Assigning approved symbols/names.
It is inevitable that multiple synonyms for a gene arise in the literature, typically as a result of publications on the same gene by multiple laboratories or the realization that genes previously thought to be independent are actually part of the same genetic unit. In such cases, FlyBase adheres to the following rules for establishing or changing the approved gene symbol/name.
1.2.1. Chronological precedence. Approved gene symbols/names are normally established by the earliest date of publication of the proposed symbol/name in a peer-reviewed primary research paper. (No other form of publication is relevant to chronological precedence.)
1.2.2. Selection of lower or upper case of initial letter. Gene symbols/names begin with a lowercase letter if the gene is FIRST named for the phenotype of a recessive mutant allele, and begin with an uppercase letter if they are FIRST named for the phenotype of a dominant mutant allele. Gene symbols/names also begin with an uppercase letter if they are FIRST named for an aspect of the wild-type molecular function or activity of the gene product, which includes genes named after an ortholog or paralog.
1.2.3. Community usage. The chronological precedence and capitalization rules can be overridden in favor of an alternative gene symbol/name that is clearly favored by the research community. This can be on a gene-by-gene basis or to rationalize the nomenclature for an entire gene family or other functional grouping.
1.2.4. Placeholders. Certain classes of generic gene symbols/names are placeholders (see sections 2.3.1 and 2.4) and are subject to replacement by a more meaningful symbol/name according to the rules of 1.2.1, 1.2.2 and 1.2.5. However, generic symbols/names based on a phenotype shall be retained by FlyBase if they are re-used by the first peer-reviewed research paper to characterize that gene and/or are clearly favored by the research community.
1.2.5. Validity criteria. Authors' preferred symbols/names will be used as the FlyBase-approved gene symbols/names whenever possible. However, the validity criteria set out in section 2.2 must be adhered to, and FlyBase will modify authors' preferred gene symbols/names where necessary.
Gene symbols and names
Symbols versus names.
The gene symbol is typically an abbreviation of the full gene name and as such, should ordinarily consist of a minimal number of characters. The gene symbol and name should use comparable capitalization and character sets.
Requirements of FlyBase-approved Drosophila gene symbols and names.
2.2.1. Uniqueness. Each approved gene symbol and name must be unique amongst all FlyBase-approved symbols and names.
2.2.2. Relevance. The name should allude to the gene's function, mutant phenotype or other relevant characteristic.
2.2.3. Restricted and non-permissible characters. There are several characters which have specific meanings in a genotype string. Use of these characters in a gene symbol would complicate interpretation of genotypes. Therefore, approved gene symbols shall adhere to the following rules:
2.2.3.1. Approved symbols shall not contain the following characters: /, \, {, }, <, >, [, ], ;, *.
2.2.3.2. Approved symbols shall not contain spaces. Where a separator is needed to keep characters from losing meaning by running together, a hyphen "-" should be used.
2.2.3.3. Approved symbols shall not contain letters from any character sets other than English or Greek.
2.2.3.4. Colons ":" shall only be used in the approved symbols of certain classes of non-protein-coding genes, genes encoded in the mitochondrial genome, and synthetic fusion genes, as described in section 2.6.
2.2.3.5. Round brackets "( )" shall only be used in certain classes of approved gene symbols as separators to designate a chromosome or an allele whose phenotype is modified by the gene in question.
2.2.4. Capitalization. The rules governing the capitalization of the initial letter of gene symbols/names are described in sections 1.2.2 and 1.2.3 Subsequent letters are normally lowercase.
2.2.5. Superscripts and subscripts. Gene symbols and names should not normally contain superscripts or subscripts. The only exception is when an allele name is an integral part of a gene symbol or name, e.g., su(wa).
2.2.6. Italicization. All gene symbols and names should be italicized.
2.2.7. Genus/species prefixes. Genes from all species, except D. melanogaster, automatically get a unique species abbreviation prefix appended to their FlyBase-approved symbol (see section 2.5.1). Any different/additional indication of a gene's origin (e.g. D, Dro or Dm) is redundant and/or ambiguous and will not form part of the FlyBase-approved gene symbol/name.
2.2.8. Symbols and names must be inoffensive.
Common prefixes
2.3.1. Prefixes based on phenotype, EST or STS. Several generic gene symbol/name prefixes have been used for genes sharing a common mutant phenotype or originally identified by virtue of an EST or STS. A non-exhaustive list is shown below:
Class | Prefix used in gene symbol * |
---|---|
anonymous gene | anon- |
Berkeley Drosophila Genome Project | BEST: |
EST cluster-based gene | |
enhancer | e(a)m, E(a)m |
European Drosophila Genome Project STS-based gene | ESTS: |
female sterile | fs(n)m, Fs(n)m |
lethal | l(n)m |
male sterile | ms(n)m, Ms(n)m |
male & female sterile | mfs(n)m, Mfs(n)m |
maternal | mat(n)m, Mat(n)m |
meiotic mutant | mei- |
Minute | M(n)m |
mitotic mutant | mit(n)m, Mit(n)m |
mutagen sensitive | mus |
NIDDK EST Project-based gene | NEST: |
resistance | rst(n)m, Rst(n)m |
suppressor | su(a)m, Su(a)m |
'tumor' | tu(n)m, Tu(n)m |
* n designates the chromosome, m a distinguishing symbol, and a a gene whose phenotype is modified by an enhancer or suppressor
Gene symbols/names using these generic prefixes are placeholders and are subject to replacement by a more meaningful symbol/name according to the rules set out in sections 1.2.1 and 1.2.4.
2.3.2. Prefixes based on common molecular function. Genes encoding products of similar molecular function may be given symbols/names with identical prefixes and unique suffixes. This is to be encouraged and FlyBase will rationalize the nomenclature for an entire gene family or other functional grouping if favored by the research community. Historically, the unique suffix may refer to a gene's cytological location (e.g. Actin-5C, Actin-42A, Actin-57B etc). More recently, the unique suffix may simply be an incremental numerical value (e.g. Sdic1, Sdic2, Sdic3 etc.), or reflect some other distinguishing feature, such as orthology with a reference data set (e.g. RpL3, RpL4, RpL5 etc.). Also see section 2.6.
Annotation IDs.
Gene annotation IDs, which are distinct from gene symbols, exist for all molecularly defined gene models in the 12 sequenced species of Drosophila.
2.4.1. Format. Annotation IDs are represented in a common way: a species-specific 2 letter prefix followed by a four or five digit integer. For historical reasons, there are two 2-letter prefixes for D. melanogaster: CG for protein-coding genes and CR for non-protein-coding-genes. For all other species, there is a single two-letter code to be used for gene models, regardless of which class of gene they identify.
Prefix | Species |
---|---|
CG, CR | Drosophila melanogaster |
GA | Drosophila pseudoobscura pseudoobscura |
GD | Drosophila simulans |
GE | Drosophila yakuba |
GF | Drosophila ananassae |
GG | Drosophila erecta |
GH | Drosophila grimshawi |
GI | Drosophila mojavensis |
GJ | Drosophila virilis |
GK | Drosophila willistoni |
GL | Drosophila persimilis |
GM | Drosophila sechellia |
2.4.2. Use as approved gene symbols. In the absence of other information, the annotation ID is used as a placeholder for the gene symbol (while the gene name field is left blank) and is subject to replacement by a more meaningful symbol/name according to the rules set out in sections 1.2.1, 1.2.2 and 1.2.4.
Approved gene symbols/names for non-D. melanogaster genes.
FlyBase includes genes from all species of Drosophilidae plus genes from other species that have been introduced into Drosophila.
2.5.1. Species abbreviation prefixes. For species other than Drosophila melanogaster, the FlyBase-approved gene symbol follows a species abbreviation indicating the species of origin. The prefix has the form 'Nnnn\', where N is the initial letter of the genus and nnn is a unique code for a given species of that genus, usually the first three letters of the species name. (For example, Dsim is the species abbreviation for Drosophila simulans.) A complete list of valid abbreviations is available on the species abbreviations page. By convention, a 'Dmel' prefix is not used for D. melanogaster gene symbols in FlyBase (unless this is important in context). Gene names are not prefixed with species information.
2.5.2. Approved gene symbols/names. The FlyBase-approved gene symbols/names may correspond to the meaningful symbol/name of the D. melanogaster orthologs, distinguished by the relevant species prefix (as described in 2.5.1). (It should be noted that the assignment of orthology can be problematic in the absence of whole genome sequence information.) D. melanogaster gene symbols/names that are defined as placeholders (see sections 2.3.1 and 2.4) or contain D. melanogaster-specific cytological information should not be used as the symbols/names of orthologs in other species.
Special cases.
2.6.1. rRNA genes. Genes encoding ribosomal RNAs have symbols of the format 'nSrRNA', where n denotes the respective rRNA's sedimentation rate in Svedberg units, e.g., 28SrRNA. By historical convention, the locus containing the genes encoding the 5.8SrRNA, 18SrRNA and 28SrRNA is called bobbed (bb).
2.6.2. tRNA genes. Genes encoding transfer RNAs have symbols of the format 'tRNA:Xn:m', where X is the 1-letter amino-acid code (in upper-case); n is a number signifying the particular isoform; m is the cytogenetic map position of the gene; and a (if used) is a lower-case letter to distinguish between functionally similar tRNA genes mapping to the same location, e.g., tRNA:S7:23Ea.
2.6.3. snRNA genes. Genes encoding small nuclear RNAs have symbols of the format 'snRNA:XX:ma', where XX is the type of snRNA; m is the cytogenetic map position of the gene; and a (if used) is a lower-case letter to distinguish functionally similar snRNA genes mapping to the same location, e.g., snRNA:U6:96Aa.
2.6.4. snoRNA genes. Genes encoding small nucleolar RNAs have symbols of the format 'snoRNA:X'. X usually represents the type of modification catalyzed and/or the substrate, e.g. snoRNA:MeU2-C28, which encodes a snoRNA that guides methylation of nucleotide C28 of the U2 snRNA; or snoRNA:Ψ28S-612, which encodes a snoRNA that guides pseudouridylation of nucleotide 612 of the 28S rRNA. If the substrate is unknown, then 'Or' is used in the symbol to indicate that it encodes an 'Orphan' snoRNA. A suffix is used where necessary to distinguish functionally similar snoRNA genes, e.g., snoRNA:Me18S-G1358b, or snoRNA:U3:9B (where the suffix is based on cytogenetic position).
2.6.5. miRNA genes. Genes encoding microRNAs have symbols of the format 'mir-N', where N is simply a sequential number according to the conventions outlined in Ambros, Bartel, et. al. 2003 e.g., mir-125.
2.6.6. Pseudogenes. Pseudogenes have symbols of the format symbol_of_parental_gene-psX, where X (if used) is a number to distinguish between multiple pseudogene copies of a particular parental gene. If only one pseudogene copy of a particular gene has been found, it should be given the suffix -ps1.
2.6.7. Mitochondrial genes. Genes encoded by the mitochondrial genome have symbols prefixed with 'mt:', e.g., mt:ND4.
2.6.8. Ribosomal protein genes. Genes encoding ribosomal proteins are named based on orthology to their mammalian counterparts. Genes encoding cytoplasmic ribosomal proteins have symbols of the format 'RpSn' or 'RpLn', where S denotes a gene encoding a protein of the small subunit and L a gene encoding a protein of the large subunit, and n is a number reflecting orthology, e.g., RpL3, RpS6. Genes encoding mitochondrial ribosomal proteins have symbols of a similar format and are prefixed with 'm', e.g., mRpL1, mRpS2. Some ribosomal proteins are encoded by duplicate genes; these are distinguished by using a a or b suffix, e.g., RpS14a and RpS14b. Some ribosomal protein genes were originally named after a mutant phenotype, e.g. sop or tko; these have been retained as the approved gene symbols/names in FlyBase.
Allele symbols and names
Superscripts.
Alleles at a particular gene are designated by the same name and symbol and are differentiated by distinguishing superscripts. In written text the allele designation may be separated from that of the gene by a hyphen, e.g., white-apricot.
Symbols.
Allele symbols should be short, preferably no more than three characters long, and cannot contain spaces, superscripts, or subscripts. Whenever possible superscript characters should be limited to the following set:
a-z A-Z 0-9 - + : .
The + symbol is reserved for the wild-type allele. Consecutive allele numbers should be used wherever possible.
Greek characters may be used but are discouraged.
The character \ is reserved in all gene symbol contexts for species identification.
The character / is reserved as a homologue separator in genotypes and cannot be used in allele symbols.
In text in which superscripting is not possible, such as ASCII files, superscripted text should be enclosed between the characters [ and ].
FlyBase makes exceptions to the brevity rule when recording in vitro mutagenesis constructs that are represented with alleles. Where these are not otherwise named FlyBase confers symbols according to a system including the initial of the last name of the first author of the first paper in which the allele was initially reported ('I' in the following examples). The most frequently used classes include:
Symbol | Meaning |
---|---|
cIa | for 'construct a of Author-lastname' |
Scer\UAS.cIa | for 'S. cerevisiae UAS construct a of Author-lastname' |
tIa | for 'transgene a of Author-lastname' |
mIa | for 'minigene a of Author-lastname' |
hs.PI | for 'heat shock construct of Author-lastname' |
gene_symbol.PI | for 'gene promoter fusion of Author-lastname' |
In addition, exceptions have been required for some large series of alleles and collections of mutations. Nevertheless, brevity of allele symbols is very much to be encouraged.
3.2.1. It is unacceptable to use, as a superscripted allele symbol, elements of the genotype in which the allele arose, since such a designation implies something more than a trivial connection between allele and element. Alleles that are revertants of a pre-existing allele are an exception to this rule.
3.2.2. While historically, the numeral 1 has been the implied superscript of nonsuperscripted symbols, this practice has created considerable ambiguity and is now discouraged. As with all other alleles, the numeral 1 should be explicitly designated (e.g., sc1, not sc).
3.2.3. For a recessive allele of a gene named as a dominant, or a dominant allele of a gene named as a recessive, the superscripts r and D, respectively, may be used; e.g., Hnr, Hnr2, and ciD.
3.2.4. For a wild-type allele, a superscripted plus character may be used; e.g., b+ or B+. The plus symbol alone implies the normal (wild-type) allele or alleles in any context, such as y1/+.
It may be necessary to distinguish among more than one 'wild-type' allele. In such cases the different wild-type alleles should be given a distinguishing number, which would follow the + character in the superscript, e.g., ry+3.
3.2.5. Absence of a particular locus may informally be noted by use of a superscript minus character with the symbol; e.g., bb-. This is not acceptable as a designation of a particular allele.
3.2.6. Revertants or partial revertants of mutant alleles are designated by the superscript rv followed by a distinguishing number; these are placed after the allele designator, e.g., D4rv32, the 32nd revertant of D4. Revertants of dominant mutations that are deficiencies are treated not as alleles but as deficiencies and are accordingly not superscripted but listed with the distinguishing number, e.g., Df(2L)Scorv4.
3.2.7. Alleles specifying the absence of a particular enzyme or other protein are designated by the superscript n (null) followed by a distinguishing number or letter, e.g., Adhn1, or, where lack of function is inviable, by l (lethal), followed by a distinguishing number, e.g., Nrgl2.
3.2.8. An allele known to be mutant but whose specific identity is unknown is given an asterisk as an allele designation, e.g., w*.
Transposons and Transgene Constructs
Transposons or transgene constructs integrated into the Drosophila genome, if they cause a mutant phenotype, are both alleles and aberrations (similar to other classes of aberrations that are associated with mutant phenotypes). Where such insertions produce no mutant phenotype, they are named purely according to aberration conventions. Where transposon/transgene insertions produce a mutant phenotype by disrupting an endogenous gene, they are given names both as an allele of the mutated endogenous gene and as an aberration. The name of the allele follows conventions outlined in section 2. Rules for naming natural transposons and transgene constructs and their insertion into the genome follow.
Generic naturally occurring transposons are symbolized as ends{}, where ends stands for the symbol of a given transposon, such as P for P-element. Doc{}, copia{} and P{} are examples. A defined natural variant of the transposon family can be named by including a symbol for that name inside the brackets. A specific insertion of a given transposon is described by including an additional unique symbol following the brackets.
Insertions of natural transposons annotated as genome sequence features also have synonyms of the form TEnnnnn, for example, copia{}910 has the synonym TE20021.
Symbols for constructed transposons, or transgene constructs, must always include a construct symbol, which defines a particular construct. A full transgene construct genotype consists of the source of transposon ends, included genes, construct symbol, and insertion identifier, in the form ends{genes=construct-symbol}. Once defined, ends{construct-symbol} (or less formally, construct-symbol alone) can be used in most circumstances to refer to a specific transgene construct. The symbol for a specific insertion of a given transgene construct has the form ends{construct-symbol}insertion-identifier. Further details are given in the sections that follow.
Some examples:
the full genotype of the P-element transgene construct P{ovoD1-18}
a viable insertion of the construct P{ovoD1-18}
P{Scer\GAL4wB>/sup> w+mW.hs Ecol\ampR Ecol\ori=GawB}
the full genotype of the transgene construct P{GawB}
an insertion of the construct P{GawB} that disrupts the h gene
H{w+mC Ecol\ori Tn\kanR Ecol\lacZHZ50a=Lw2}
the full genotype of the hobo transgene construct H{Lw2}
an insertion of the transgene construct H{Lw2} that disrupts the dpp gene
This nomenclature is formally similar to that used for aberrations, where the ends{symbol} prefix is similar to the Df(n), Dp(n;m), etc., prefixes of aberrations, and the identifier suffix is similar to the gene-allele suffix of aberrations with associated alleles, or the alphanumeric string suffix of other aberrations. Specific rules for assembling the components of a transgene construct genotype follow.
Transposon ends.
Pairs of terminal repeats which together form a transposon are symbolized by opposing braces, {}. The source of the transposon ends is indicated outside the braces, at the left end of the string by a symbol derived from the name of the transposon family:
Transposon ends | Transposon family | |
---|---|---|
P | = | P-element |
H | = | H-element (hobo) |
I | = | I-element |
M | = | mariner-element |
Mi | = | Minos-element |
4.1.1. Isolated terminal repeats are indicated with the family symbol followed by 3' or 5', e.g., P5' represents the isolated 5' end of a P{} transposon.
4.1.2. Multiple sets of matched transposon ends are indicated by nesting ends{} symbols, e.g., P{I{neo[RT]W[+]}}. A P transgene construct containing ry+t7.2 and an isolated hobo terminal repeat from the 5' end of a hobo element would be described as P{ry+t7.2 H5'}.
Formally, this system can be extended to any insertion of mobile DNA, for example, the copia, gypsy and FB elements. Thus, the ctMR2 mutation, caused by the insertion of a gypsy element, is called gypsy{}ctMR2. When a mobile element inserts into a mutant gene already carrying a mobile element, it is the new insertion that is named. For example, a jockey insertion into ctMR2 generates ctMRpD, this is called jockey{}ctMRpD. The name describes the new insertion which has caused the new phenotype. A full genotype description, including all sets of transposable element ends, is only provided when the progenitor allele is also fully described.
FlyBase uses this nomenclature not only because of its rigor, but also because its more general use may be needed if such elements are engineered.
Included genes.
A full transgene construct description lists within the braces all functional genes, including non-Drosophila genes such as antibiotic resistance genes, bacterial and phage origins of replication, and the FLP1 recombination target (FRT), separated by spaces. The left-right order of these elements reflect their 5' to 3' order (with respect to the transposon ends) within the construct. If the order of a gene is unknown, it is placed at one end of the list, followed or preceded by a comma.
4.2.1. Drosophila melanogaster genes. Valid gene symbols are used to name D. melanogaster genes. Wild-type alleles of intact genes are indicated by a superscripted '+t' followed by an identifier, e.g., ry+t7.2 or Adh+t3.2. A convenient identifier (used in these examples) is the size of the genomic fragment carrying the wild-type gene. Transgene-construct-borne genes that do not confer wild-type function are given unique allele designations without the preceding '+t', e.g., ftzB or yD225. Replacement of promoter or other control sequences can be indicated in the allele designation: dpphs.PP, e.g., for a dpp gene controlled by a heat shock promoter.
4.2.2. Species of origin. Species of origin is indicated for non-melanogaster Drosophila genes present in transgene constructs. A species code composed of the first letter of the genus (capitalized) and a three letter code, usually the first three letters of the species (lower case) is added to the gene symbol with a separating backslash, e.g., Dvir\Dfd+t7.6 for the wild-type Deformed gene from Drosophila virilis (see paragraph 2.2.7.).
For genes from species other than those of Drosophila the valid gene symbols are used following a four-letter symbol, as above, indicating the species of origin, e.g., Hsap, for humans, Gdom, for chicken, Hsim, for Herpes simplex, Ecol for E. coli etc. For viruses, the name or abbreviation, e.g., Abelson, Adeno5, Cmeg, or symbolic name, e.g., T4, M13, the greek symbol lambda, is sometimes used instead of a genus-species-derived four-letter symbol. In all cases, these symbols are separated from the gene symbol by a backslash \. A file of these species abbreviations is available on FlyBase.
FlyBase considers transposable elements, the mitochondrial DNA and other similar entities to be species (this is because each can contain several different genes). It is for this reason that, for example, the P-element Transposase has the symbol P\T in constructs.
4.2.3. Fusion genes. Fusion genes are defined (by FlyBase) as the fusion of protein coding regions of distinct genes constructed by in vitro mutagenesis. They are named using the gene symbols of their component parts, separated by a double colon, e.g., Antp::Scr or Act88F::Scer\act1 .
The order of gene symbols stated in the fusion gene will be alphabetical. The complexity of these constructs is such that were each to be named according to its molecular composition, for example in the 5' to 3' direction, the number of named fusion genes would rapidly become impractical.
An exception to the 'alphabetical order' rule will be made for cases where the fusion is between a D. melanogaster and a non- melanogaster gene. In such cases the melanogaster gene symbol will be stated first, e.g., tra2::Hsap\SFRS2.
For historic reasons, some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of Ecol\lacZ. The symbol for the additional gene(s) contributing to the fusion indicated as part of a superscript, e.g., Ecol\lacZP\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.
4.2.4. Modified genes. Modified genes, cDNAs and in vitro mutagenized sequences are treated as alleles, and will be curated by FlyBase as such. They should be named, therefore, by the same conventions used to name classical alleles. The following allele symbols have been assigned by FlyBase to the commonly used modified genes of D. melanogaster:
The mini-white gene constructed by Pirrotta (1988) by deleting the Hin dIII- Xba I fragment from the long 5'-intron of the w+ gene. Carried by Casper plasmids and their derivatives.
The mini-white gene constructed by Klemenz et al. (1987). Carried by the W6, W8 family of plasmids and their derivatives.
Genes modified by the addition of a tag allowing the product to be identified, marked or purified represents a special class of modified genes. Tags are used to mark a transcript, e.g., with a piece of M13 DNA allowing the transcript to be identified by in situ hybridization. Tags are also be used to mark a protein, for purposes of purification (e.g., (His)6), for purposes of identification (epitope tags) or for purposes of targeting to a cellular compartment (nls tags). FlyBase considers as tags constructs designed for these purposes and curates these modified genes as alleles of the tagged gene. Tagged genes have symbols with the format T:y where T stands for Tag and y is the species\gene symbol of the tag, e.g., T:Hsap\Myc, T:Ivir\HA1, T:Hsap\p53, T:Zzzz\His6 (the Zzzz 'species' prefix is used when the tag is artificial).
A complete list of tagged gene symbols and their definitions is available from FlyBase through QuickSearch. Change the 'Species' option from the default 'Dmel' to 'All species'. Ensure the 'Search' option is set as 'ID/Symbol/Name' and 'genes' is selected as the 'Data Class'. Type 'T:*' (don't use the quotation marks) in the 'Enter text' field and submit the query.
Construct symbol.
Every construct must be assigned a symbol which, in conjunction with the description of the terminal repeats, uniquely describes a transgene construct, for example, P{lacW}, H{PDelta2-3}. Symbols must be unique, but should be kept as short as possible.
4.3.1. Full genotype. In the full genotype of a transgene construct, the construct symbol is the final entry within the braces, separated from the final gene symbol by the equal sign, e.g., P{lacZP\T.W w+mC ampR ori=lacW} is the full genotype of P{lacW}.
4.3.2. Short form and partial genotypes. Once defined, a transgene construct can be referred to by either the transgene symbol, e.g., P{lacW} (or, less formally, lacW), or the symbol plus insertion identifier (see below) in most contexts. Additional components can be added as needed for clarity. For example, in stock genotypes it is preferable to include the visible markers, as in P{w+mC=lacW}thj5C8 or P{w+t11.7 ry+t7.2= wA}3-1, to avoid misunderstandings about the expected phenotypes of the flies.
Insertion identifier.
The right-most position of the transgene symbol, outside the outer-most bracket, is reserved for a string that identifies a specific insertion into the genome of the defined construct. There are four cases to consider for naming insertions.
4.4.1. Insertion hits a known gene. When a mutant phenotype associated with a transgene construct insertion is assigned to a known gene, the insertion-induced allele should be named by the normal rules. Since such insertions cause new alleles, the gene-allele description is used as the identifier of the associated insertion (just as with other alleles identified as aberrations). For example, a P{lacW} insertion referred to as l(2)k05007 and then shown to be an allele of CycE becomes P{lacW}CycEk05007. Insertion-induced alleles in stock genotypes should include the aberration name of the construct, i.e., P{lacW}CycEk05007. In most other circumstances the insertion aberration prefix can be dropped and the mutation referred to in the usual way, in this case, CycEk05007.
4.4.2. Insertion defines a new gene. Often insertions cause a phenotype that cannot be associated with any known gene. In that case the insertion defines the first allele of a new gene, which is named by the normal rules, e.g., P{lacW}Trf1.
4.4.3. A mapped insertion with no phenotype. If an insertion has no phenotype but is mapped to the polytene chromosomes, then it is preferable to use the polytene chromosome subdivision to which it maps as its identifier, e.g., P{bw+L}60B. If a similar construct already has this name then that of the new one would be P{bw+L}60B-2 or similar.
If the insertion is not mapped then there is no alternative but to give the insertion an arbitrary number or code, e.g., P{A92}A45. This symbol must be unique and as simple as possible using only characters from the set:
a-z A-Z 0-9 -
Cytogenetic descriptions
Breakpoints should be according to the revised salivary gland chromosome maps published by C. B. and P. N. Bridges (see Lindsley and Zimm, 1992), except for chromosome 4, where the map of Sorsa (Chromosome maps of Drosophila Vol. II, CRC Press, 1988) should be used.
Range designations.
For the location of a single object (breakpoint of aberration, gene position, site of transposon insertion, etc.) the range is given as "(d1)(S1)(b1)-(d2)(S2)(b2)", where:
Symbol | Designation | |
---|---|---|
d | = | numbered division (1 to 102) |
S | = | lettered subdivision (A to F) |
b | = | band number (1 to n, depending upon the particular subdivision) |
For ranges not known to the accuracy of a band, see paragraph 5.5.
If the range encompasses two different numbered divisions (i.e., d1 does not equal d2), then the full designations for both the left end and the right end of the range will be used, e.g., 32A3-33A2.
If the range is within a single numbered division (i.e., d1=d2) but within different subdivisions (i.e., S1 does not equal S2), then the numbered division designation is not repeated to the right of the hyphen, e.g., 32A3-D4.
If the range is within both the same single numbered division and the same lettered subdivision (i.e., d1S1=d2S2), then neither the division nor the subdivision designation will be repeated, e.g., 32A3-5.
If a location is known to a single band, then the location will be given as (d1)(S1)(b1) with no hyphen and no repetition of the band location, e.g., 32A3.
If a location is known to a single doublet, then the location will be given as (d1)(S1)(b1)-(b1+1) where (b1) and (b1+1) represent the two succeeding bands of the doublet, e.g., 32A1-2.
If only one end of a location range is within a doublet, the location will simply refer to the band number maximizing the range, e.g., 32C1-D5 will be used, not 32C1,2-D5 and 32B4-C2 will be used, not 32B4-C1,2.
It is sometimes necessary to represent interbands in data curated by FlyBase. Interbands have the same symbol as the immediately preceding band, with the suffix symbol +. The interband between the Bridges' bands 3A4 and 3A5 is, therefore, represented as 3A4+.
Telomeres.
Telomeres are designated by nAt, where n is a chromosome number, A is the chromosome arm, and t indicates the telomere:
Symbol | Meaning | |
---|---|---|
1Lt | = | the telomere of the left arm of X |
1Rt | = | the telomere of the right arm of X |
YLt | = | the telomere of the left arm of Y |
YSt | = | the telomere of the short arm of Y |
2Lt | = | the telomere of the left arm of 2 |
2Rt | = | the telomere of the right arm of 2 |
3Lt | = | the telomere of the left arm of 3 |
3Rt | = | the telomere of the right arm of 3 |
4Lt | = | the telomere of the left arm of 4 |
4Rt | = | the telomere of the right arm of 4 |
If the telomere is of unknown origin, use: | ||
?t | = | undefined telomere |
Centromeres and centric heterochromatin.
Centromeres are designated as ncen, where n indicates the chromosome, i.e.,1cen, Ycen, 2cen, 3cen and 4cen.
5.3.1. Centric heterochromatic blocks will be indicated as hn, where n is a consecutive number.
Composite chromosome architecture.
The designations of the chromosomes, including polytene band ranges, heterochromatic blocks and centromeres are:
YLt h1 -- h17 Ycen h18 -- h25 YSt 1Lt 1A1 -- 20F4 h26 -- h32 1cen h33 -- h34 1Rt 2Lt 21A1 -- 40F7 h35 -- h37 h38L 2cen h38R h39 -- h46 41A1 -- 60F5 2Rt 3Lt 61A1 --- 80F9 h47 -- h52 h53L 3cen h53R h54 -- h58 81F1 -- 100F5 3Rt 4Lt h59 -- h61 4cen 101F1 -- 102F8 4Rt
Note that the centromeres of chromosomes 2 and 3 lie within heterochromatic bands h38 and h53 respectively. Some heterochromatic bands, (h25, h42) are divided into two (h25A, h25B, h42A, h42B) in some stocks.
Accuracy of cytological descriptions.
In designating cytological position, the level of accuracy of the determination should be reflected in the specificity of the statement.
Some examples should make these distinctions clear. Note that the polytene subdivision described here, 77B, has 9 bands.
Case 1 - High level of uncertainty about subdivision location:
If the observer thinks that the location of a rearrangement breakpoint might be in 77B but could also possibly be in 77A or 77C, then the position should be reported as 77A-C.
Case 2 - Low level of uncertainty about subdivision location:
If the observer's best estimate is that the true breakpoint position is very likely to be in 77B, then the observer should report the position as 77B.
Case 3 - No uncertainty about subdivision location:
If the observer is absolutely certain that the location is within 77B, then the location should be reported as 77B1-9.
Chromosome aberrations
Chromosome aberrations have names that consist of a prefix, indicating the class of aberration, an indication of the chromosome, or chromosomes (or their arms) involved contained within parentheses and a specific designation which identifies the particular rearrangement.
General principles for naming aberrations.
6.1.1. Aberrations not named after a gene: The suffix (i.e., the component of the name following the parentheses) should include only letters and digits. There should be no superscripts or subscripts except for the particular cases of synthetic inversions with L and R superscripts (see 6.4.4). They should not contain spaces. The characters ( and ) are only to be used to enclose the designation of a chromosome or chromosome arm.
6.1.2. Aberrations named after a gene but not associated with an allele: Here the association with the gene carries circumstantial information about the aberration's breakpoints. The suffix should comprise the gene symbol, followed by a hyphen if needed for clarity, followed by any alphanumeric of the investigator's choosing. There should be no superscripts.
6.1.3. If a gene whose symbol appears in an aberration changes its name, e.g., for reasons of newly-discovered allelism, then this name change is propagated to the aberration(s) in question. The old name will become a synonym.
6.1.4. Aberrations named for a specific associated allele: Here the suffix should be exactly the same as the allele designation, i.e. the gene symbol followed by the superscripted allele symbol. If the allele designation (either gene or allele part) changes, that change will be propagated to the aberration.
Translocations.
6.2.1. Translocations have the symbol T(n1;n2...)m, where n1, n2 ... indicate the numbers of the chromosomes involved in the translocation.
When chromosomes are listed within the parenthetical information of a translocation symbol they are listed in the order: 1, Y, 2, 3, 4. The numbers of the different chromosomes are separated by semicolons, with no spaces.
6.2.2. The separable components of translocations.
Previous conventions for naming such aneuploid segregants have been difficult to employ and do not contain sufficient information in the derivative name to permit automated recognition of the relationship between aneuploid segregant and euploid progenitor.
FlyBase will employ the following conventions for different classes of euploid chromosomal aberrations and their aneuploid derivatives.
6.2.2.1. Translocation segregants. Translocations, standardly named T(n1;n2)m, consist of two or more translocated chromosomes, each of which can potentially exist as an aneuploid segregant. Such segregants will be named using telomeres of the rearranged chromosomes as landmarks for specific segregants. Two-break translocations are often called reciprocal translocations if two chromosome segments have simply been exchanged.
The general form of the name of a segregant will be Ts(n1Pt;n2Qt)m. Ts stands for 'Translocation segregant,' n1Pt and n2Qt for the designation of the landmark telomere(s) (e.g., 2Lt, 3Rt) and m is the same suffix as the progenitor translocation from which the segregant is derived.
Example 1: Two-break reciprocal translocation. No ambiguity about the locations of either breakpoint relative to the centromere.
T(2;3)rg35 (= T(2;3) 27E-F;62C2-D1) The two aneuploid segregants are therefore named:
Ts(2Lt;3Rt)rg35 (= 2Lt-27E|62D1-3Rt) Ts(2Rt;3Lt)rg35 (= 2Rt-27F|62C2-3Lt)
Example 2: Three-break reciprocal translocation. No ambiguity about the locations of any breakpoint relative to the centromere.
T(1;2;3)OR9 (= T(1;2;3)19-20;49F;81F) The three aneuploid segregants are accordingly named:
Ts(1Lt;3Lt)OR9 (= 1Lt-19|81F-3Lt) Ts(1Rt;2Rt)OR9 (= 1Rt-20|49F-2Rt) Ts(2Lt;3Rt)OR9 (= 2Lt-49F|81F-3Rt)
6.2.2.2. Complex segregants and recombinants. For many complex translocations or inversions with four or more breakpoints, multiple aneuploid segregants or recombinants can potentially occur. It is impossible to invent a naming scheme for these complex cases that would automatically reveal the specific aneuploid chromosome complement. In such instances, resulting aneuploids will be given appropriate names as follows:
The first duplication or deletion is assigned the unique suffix of the parental euploid rearrangement. The new order of the resulting chromosome must be reported.
Succeeding duplications or deletions are assigned other unique suffixes. Their new orders must also be reported.
Rings.
Ring chromosomes have the symbol R(n)m , where n indicates the number of the chromosome and m is a specific designation.
Inversions.
6.4.1. Inversions have the symbol In(nA)m, where n indicates the number of the chromosome involved, A the arm or arms involved and m is a specific designator.
In the case of multiple-break intrachromosomal rearrangements, the distinction between inversions and transpositions often becomes ambiguous. An intrachromosomal rearrangement that can be partitioned into a duplicated and a deficient product by exchange with a normal-sequence chromosome is designated a transposition even though it may carry an inverted segment; otherwise, it is designated an inversion.
6.4.2. If it is not known whether or not an inversion is paracentric (does not include the centromere) or pericentric (includes the centromere) then the indicator of chromosome arm(s) is omitted, i.e., In(n)m.
6.4.3. By convention, In(1) implies In(1L).
6.4.4. Recombinant products between two inversions. Recombination between similar inversions may produce viable recombinant inversions with the left end of one and the right end of the other. Superscripts L and R are used to identify the sources of the two ends; for example; In(2L)CyLtR.
Transpositions.
Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type. Rearrangments that alter the pairing of telomeres are classified as translocations.
In the case of multiple-break intrachromosomal rearrangements, the distinction between inversions and transpositions often becomes ambiguous. An intrachromosomal rearrangement that can be partitioned into a duplicated and a deficient product by exchange with a normal-sequence chromosome is designated a transposition even though it may carry an inverted segment; otherwise, it is designated an inversion.
6.5.1. Transpositions have the symbol Tp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the 'recipient' chromosome and m a specific designation. For intrachromosomal transpositions n1 = n2.
6.5.2. Separable components of transpositions.
6.5.2.1. Interchromosomal transpositions. Segregants of interchromosomal transpositions will continue to be referred to as in the past. For a transposition with the name Tp(n1;n2)m, the chromosome segregant containing the duplicated material will be named Dp(n1;n2)m, and the chromosome containing the deleted material will be named Df(n1A)m, where A refers to the chromosome arm of the deletion.
Example: Tp(3;1)kar5l (= Tp(3;1)87C7-D1;88E2-3;20)
The two aneuploid segregants are: Dp(3;1)kar5l (= 1Lt-20|87D1-88E2|20-1Rt) Df(3R)kar5l (= 3Lt-87C7|88E3-3Rt)
6.5.2.2. Intrachromosomal transpositions. Segregants here are produced by recombination with a structurally normal chromosome, not by chromosome segregation. For transpositions in which the transposed segment is in the uninverted orientation relative to the standard map, there may be two potential duplication and two potential deletion derivatives (one set resulting from recombination events in the region between the deficiency and duplication components of the transposition, and one set resulting from recombination events within the transposed segment). For transpositions of the type Tp(n1;n1)m, the reported duplication segregant will be named Dp(n1;n1)m and the new order must be reported to eliminate any ambiguity. Similarly, the reported deletion recombinant is referred to as Df(n1A)m, where A refers to the chromosome arm bearing the deletion. In rare cases in which the alternative duplication or deletion recombinant (generated by recombination within the transposed segment) is also reported, it will be given a different suffix from the progenitor transposition and the new order will be reported.
Example: Tp(3;3)Dl,sup>II13 (= Tp(3;3)88F5-9;91A3-8;92A2)
The primary aneuploid recombinants would then be: Dp(3;3)DlII13 (= 3Lt-92A2|88F9-91A3|92A2-3Rt) Df(3R)DlII13 (= 3Lt-88F5|91A8-3Rt)
If subsequently, the other deletion or duplication recombinant is generated, it will be given a novel suffix, perhaps completely unrelated to the progenitor, e.g.:
Df(3R)xxx (= 3Lt-91A3|92A2-3Rt) Dp(3;3)xxx (= 3Lt-88F5|91A8-92A2|88F5-3Rt)
Deficiencies (deletions).
Deficiencies (deletions) have the symbol Df(nA)m, where n is the number of the deleted chromosome, A is the chromosome arm and m is a specific designator.
Intragenic deletions are not treated as deficiencies, but as alleles; at least two adjacent loci must be removed or disrupted before a lesion is considered a deletion.
Duplications.
Duplications have the symbol Dp(n1;n2)m, where n1 is the 'donor' chromosome, n2 the recipient and m a specific designator; n1 may equal n2.
Duplications may be: tandem (in direct or inverted order), insertional or free. Direct and inverted tandem duplications are not distinguished by their symbols. Ambiguity must be avoided by explicit description of the new order (see section 7.1 New order).
6.7.1. When the duplicated sequences are carried as a free centric element, the letter f (free) follows the semicolon within the parentheses, replacing n2; e.g., Dp(1;f)101.
6.7.2. Higher order repeats. Higher-order repeats are also symbolized Dp, with the number of repeats indicated in the parenthetical chromosomal designation, i.e., Dp(1;1) = duplication, Dp(1;1;1) = triplication, and so forth.
Y derivatives.
In the past many Y chromosome derivatives (e.g., marked- Y chromosomes) were named in a rather special way, as m1Ym2 , where m1 is a marker (or markers) carried on YL and m2 a marker (or markers) carried on YS. Such chromosomes should be named as duplications, following the normal rules. Thus a y+Y is Dp(1;Y )y+ and Ymal+ is Dp(1;Y)mal+.
Autosynaptic elements.
A pericentric inversion can be converted to two reciprocal autosynaptic elements by recombination between the inverted segment and a normal homolog. For a pericentric of the type In(nLR)m, the two autosynaptic products are LS(n)m and DS(n)m, where LS refers to the product carrying the two left (L = levo) telomeres and DS to that carrying the two right (D = dextro) telomeres. Chromosome elements of very similar structures to autosynaptic elements can be recovered by other means; by convention, these are also called autosynaptic elements if autosynaptic elements were used in their recovery.
6.9.1. In stocks, autosynaptic elements must be carried as balanced pairs; their symbols are then separated by a double slash thus, LS(n)m1//DS(n)m2. In the special case where the two members of such a balanced pair are reciprocal recombinant products (e.g., LS(n)m1//DS(n)m1) then such a genotype can be called AS(n)m1.
Compound chromosomes.
Compound chromosomes may be subdivided into two classes, homocompounds, consisting of two copies of the same chromosomal arm attached to a common centromere, and heterocompounds in which two arms from different chromosomes are connected through the centromere of one of them. They are designated by the symbol C followed parenthetically by the designation of the involved chromosome arm or arms.
In stock genotypes, the linkage relationship of markers on compound chromosomes is indicated with a colon, e.g., C(4)RM-P2, ci1 eyR: gvl1 svn.
6.10.1. Homocompounds. Homocompound chromosomes are classified according to relative orientation of their arms (i.e., tandem, reversed or ring) and the position of their centromeres (i.e., acrocentric or metacentric): reversed acrocentrics (C(n)RA), reversed metacentrics (C(n)RM), reversed rings (C(n)RR), tandem acrocentrics (C(n)TA), tandem metacentrics (C(n)TM), and tandem rings (C(n)TR), where n is a the number of a chromosome or chromosome arm. In each case the symbol is followed by a specific designator, separated by a hyphen.
6.10.1.1. When the component arms differ in sequence by something other than whole-arm inversion, the tandem or reversed classification becomes ambiguous. Furthermore, when the component arms are separable from each other by a single break, the terms acrocentric and metacentric are descriptive; however, when elements of the two arms become interspersed (as for example by interarm rearrangements), these terms lose meaning. Consequently, the more-complex compounds are given arbitrary symbols.
Heterocompounds.
Heterocompound chromosomes have the symbol C followed by the chromosome or arms involved within parentheses, e.g., C(1;Y), C(2L;3R). The chromosomal origin of the centromere in such compounds is frequently ambiguous. It is usually necessary to describe the structure of any given heterocompound in some more detail, by its new order. The distinction between some heterocompound chromosomes and whole-arm translocations can be moot.
Free chromosome arms.
The term 'free' is used with respect to the left and right arms of the major autosomes, and to the long and short arms of the Y chromosome, when an arm exists as an individual chromosome element. The symbol for a free arm is: F(nA)m, where n = Y, 2 or 3, A = L, R, or S and m is a symbol (note that L indicates Left for the X chromosome and autosomes, but Long for the Y chromosome). In practice, all free arms carry some chromosome material from another chromosome arm or element.
Complex rearrangements.
Occasionally an author must report an aberration whose cytology is either ambiguous or cannot (with existing knowledge) be described within one of the usual classes of aberration. These aberrations should be named according to the format Ab(N1;N2;..)identifier or, when associated with a named allele, Ab(N)geneallele. Ab stands for Aberration, N represents the chromosome(s) or chromosome arm(s) that are known to be involved. If one or more of these cannot be identified then a ? symbol is used. If one break is heterochromatic but no further identification is possible then h is used. Examples are: Ab(3R)fafBX9 and Ab(3L;h)ME178.
The availability of the Ab prefix is only for the last resort, and should not be used without very good reason. If further information becomes available allowing a more formal description of a complex aberration then the Ab symbol should be replaced and relegated to synonymy.
Combinations of rearrangements.
The elementary categories of chromosome aberrations are not mutually exclusive, and some aberrations combine several of them. In such cases the symbol used should be the one most relevant to the anticipated value of the aberration, such as Df for a deficient translocation that was generated in a screen for deficiencies. When no preference exists, the symbol used is the one that stands highest in the following ranking: T > interchromosomal Tp > R > In > intrachromosomal Tp > Dp > Df. This is especially so when the components are inseparable.
FlyBase uses the following verbal definitions for classes of three-break aberrations:
Deficient translocation
A translocation in which one of the four broken ends loses a segment before re-joining, e.g., T(1;3)ct268-21.
Inversion-cum-translocation
The first two breaks are in the same chromosome, and the region between them is rejoined in inverted order to the other side of the first break, such that both sides of break one are present on the same chromosome. The remaining free ends are joined as a translocation with those resulting from the third break, e.g., T(1;2)C324.
Bipartite duplication
The (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break, e.g., Dp(1;2)K1.
Cyclic translocation
Three breaks in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third, e.g., T(1;2;3)OR14.
Bipartite inversion
Three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed), e.g., In(3LR)BTD7.
Uninverted insertional duplication
A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments, e.g., Dp(1;1)hdp-b2.
Uninverted insertional transposition
The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments, e.g., Tp(1;1)B263-48.
Inverted insertional duplication
A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments, e.g., Dp(1;1)ybl.
Inverted insertional transposition
The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments, e.g., In(2R)C72''.
Unoriented insertional duplication
A copy of the segment between the first two breaks listed is inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded, e.g., Dp(1;1)hdp-b4.
Unoriented insertional transposition
The segment between the first two breaks listed is removed and inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded, e.g., Tp(1;2)v+75d.
6.14.1. A complicated rearrangement may be separable genetically into its simpler component aberrations, which are usually sufficiently designated with the distinguishing symbol of the original aberration. When, however, the original is named after a phenotype associated with one of the component aberrations, designation of the other component with the symbol of the mutant is inappropriate.
6.14.2. A rearrangement superimposed upon another rearrangement may be given a name, which more often than not refers to the entire complex since the newly induced aberration is likely to be inseparable from the original; e.g., In(2LR)SM1 is a large pericentric inversion superimposed upon In(2L)Cy In(2R)Cy.
Balancers
Balancers can be described in one of three ways: by a complete genotype, by a short genotype or by a single symbol. For FlyBase purposes a single symbol is needed for every balancer variant. If a symbol is not reported for a new balancer variant FlyBase will assign one.
Balancer symbols should be concise, contain no spaces and should contain characters from the following set:
a-z A-Z 0-9 : - ( ) {}
Marked variants of classical balancers should be named beginning with the symbol of the parental variant followed by a hyphen followed by a concise distinguishing string, e.g., TM3-DZ.
Where new balancer variants are reported in the literature the authors' symbol for the variant, if provided, is used by FlyBase. Commas used by authors in publications may be transmuted into hyphens by FlyBase for purposes of making use of a genotype-like string that almost qualifies as a symbol. Likewise, when authors use [] to denote limits of an element insertion, these are transmuted into {} by FlyBase, to maintain consistency with other sections of the database. The use of invalid gene symbols and complete transposable element construct/insertion symbols in balancer symbols is discouraged.
As an alternative to the concise balancer symbol, balancers may be reported using balancer short genotypes, which combine the symbol of a classical balancer with new allele, aberration or transgene insertion symbols to define a unique balancer variant, e.g., TM3, ryRK Sb1 (= TM3-vKa).
Balancers may, of course, also be reported using a full balancer genotype that lists all aberration, allele and insertion symbols that comprise the unique balancer variant.
Any variant reported in the literature or donated to a stock center but not given a symbol by the authors is given the symbol 'parental_variant-vIa' and the name 'parent_variant-variant a of Initial (of first author last name)' by FlyBase, e.g., TM3-vKa for TM3-variant a of Karess .
The cytological description of aberrations
For all but the simplest two-break chromosome aberrations the explicit description of the new chromosome order is essential (see paragraph 4.5).
In descriptions of aberrations the cytological breakpoints of the aberration are listed after the symbol, the different items of chromosomal information being separated by semicolons without spaces. Cytological descriptions of new orders are always in roman type.
New order.
The following conventions for specifying sequences of aberrations are to be adopted. The sequence of each chromosome involved in an aberration is specified from one end to the other according to salivary gland chromosome band terminology. Points of breakage and reunion are indicated by vertical bars, and segments between these points are designated by the most extreme band known to be present at each end, separated by a dash. Thus, the new order of
Tp(2;3)P (= Tp(2;3)58E3-F2;60D12-E2;96B5-C1)
is represented as
2Lt-58E3|60E2-2Rt; 3Lt-96B5|60D12-58F2|96C1-3Rt.
Ambiguities.
Were the order of the inserted segment 60D12-58F2 not known, the segment would have been included within parentheses; i.e., 3Lt-96B5|(58F2-60D14)|96C1-3Rt.
Hierarchies of ambiguities are represented by parentheses within parentheses.
Complex rearrangements.
Breaks rejoin cyclically to produce chromosome aberrations (e.g., A with B and B with A) and multiple breaks may rejoin in one or more cycles. Thus four breaks may interact to form one four-break rearrangement or two two-break rearrangements. A complex rearrangement consisting of two or more simple cyclic rearrangements is indicated in the descriptive symbol; e.g.
T(1;2)OR72 (= T(1;2)19E;29F + In(2LR)24F;54B) or T(1;2)C314 (= T(1;2)5D;40-41 + T(1;2)9D;51D + T(1;2)20;56F)
New symbols are required if any of these components (or any new combination of these components) were to be derived separately.
Order of description.
Information on new order is written as follows: each chromosomal element starts at the free end with the lower value and the elements are listed in ascending order, Y falling between 20 and 21.
Rings.
Rings are differentiated from rod-shaped chromosomes by vertical bars at the beginning and end of the element; the circle is broken for linear designation at the breakpoint with the lowest numerical value; e.g., |1A4-20 1cen 20F-20A1| for R(1)2.
New orders of Y derivatives.
The constitution of a Y fragment may be designated by listing its genetic elements in order with any ambiguities in order enclosed within parentheses, e.g., KL(bw+--ba+) Ycen bb+ KS. When there is a hierarchy of ambiguities in order, a hierarchy of parentheses is used, as in ((ci+--spa+)KL) Ycen bb+KS.
Naming genotypes
Gene separators.
In designations of genotypes with several mutant genes, allele symbols of genes on the same chromosome are separated by spaces (e.g., y1 w1 f1 B1).
Homologue separators.
Allele symbols of genes on homologous chromosomes are separated by a slash bar (e.g., y1 w1 f1/B1). The X and Y chromosomes are considered to be homologues for this purpose and the different genotypes of males and females are not usually made explicit. For example, Dp(1;Ybb-)BS/ y1 car1 describes a stock in which females are homozygous for the y1 car1 X chromosome, and males are hemizygous for y1 car1 and the BS-marked Y chromosome. If desired, multiple genotypes in a stock can be fully described, using an ampersand (&) to separate the genotypes, e.g., y1 car1 & Dp(1;Ybb-)BS/ y1 car1.
It is convention to list allele symbols only once for a genotype that is homozygous for all of the mutations on a particular chromosome, i.e., y1 w1 f1 implies y1 w1 f1/y1 w1 f1. If, however, any one of these mutations were to be heterozygous, then the mutant genotypes of each chromosome would be given, i.e., y1 w1 f1/y1 f1.
It is convention to write genotypes with the maternally contributed chromosomes preceding those paternally contributed. For example, in the cross of cn1/cn1 females to cn+/cn+ males, the progeny genotype would be written cn1/cn+; from the reciprocal cross it would be written cn+/cn1.
Nonhomologue separators.
Allele symbols of genes on nonhomologous chromosomes are separated by semicolons and spaces (e.g., bw1; es; ey1).
Chromosome descriptions.
8.4.1. In describing a chromosome, inclusion of several types of information is often desirable; e.g., arrangement and mutant allele content. Such categories are separated by a comma followed by a space; e.g., In(1)FM7, y31d wa vOf B1, which designates an X chromosome carrying the FM7 inversion, the recessive alleles yellow-31d, white-apricot and vermillion-of-Offermann, and the dominant allele Bar-1. Alleles are listed in the order of the standard genetic map irrespective of their order on the chromosome in question.
8.4.2. Description of the gene content of autosynaptic elements requires particular rules. Mutations mapping distal to the breakpoint are indicated after a comma that follows the name of the element itself; mutations mapping proximal to the breakpoint (i.e. within the heterosynaptic region and necessarily hemizygous) are indicated after a second comma; e.g., LS(2)m, b1, cn1 would be homozygous for b1 but hemizygous for cn1. If the status of a particular mutation is unknown, then its symbol is enclosed within ().
8.4.3. Mutant alleles on the different chromosomal components of translocations or interchromosomal transpositions are separated by a colon. The translocated chromosomes are separated from their homologues by a slash. For example: T(2;3)CyO-TM2, Cy1 l(2)DTS5131: Ubx130/S1.
In contrast with past practice the + character is not to be used to indicate the presence of more than one separable aberration on the same chromosome, i.e., In(2L)Cy In(2R)Cy is used, rather than either In(2L+2R)Cy or In(2L)Cy + In(2R)Cy.
Cross descriptions.
It is a convention that when genetic crosses are described the female genotype is written to the left of the times symbol (x), and the male genotype to the right.
Uncertainty.
Uncertainty of specific alleles, genes, and aberrations are all indicated in genotypes with an asterisk, e.g., w* for a mutant allele of w when the specific allele is unknown, l(2)* for a lethal allele on the second chromosome when the gene is unknown, and C(1)* for a compound X chromosome when the nature of the attachment is unknown.
Nicknames.
In a relatively few cases, FlyBase will support an alternative symbol for a genotype component, a nickname. Nicknames are supported when a simplified symbol is already in use by Drosophila workers and is more widely understood than the rigorous valid symbol. For example, Dp(2;2)Cam11 is a valid nickname for In(2LR)TE35B-226LTE35B-4R and w67c23 is a valid nickname for Df(1)w67c23. Implementation of nicknames within FlyBase is still in progress and the distinction between nicknames and synonyms may not be evident in FlyBase reports.
Cytotype
It may be necessary to indicate the cytotype of a stock with respect to one or more systems of hybrid dysgenesis. We suggest that this is done by appending the indication of cytotype to the end of the stock description as a single letter code enclosed within <>. This symbol should be separated by the last component of the genotype by a comma, e.g., y1 w1 f1, <P> would indicate a P-cytotype stock with these three markers. If more than one cytotype needs to be designated then these should be separated by a semi-colon, e.g., <P;I>.
Representation of gene, allele and aberration names and symbols in text
Italic
Gene, allele, aberration and transposon/transgene-construct names and symbols are italicized in printed text.
Non-italic.
When a full gene name or gene symbol is used to indicate phenotype, rather than genotype, then that name or symbol is printed in roman (non-italic) type; i.e., white indicates a genotype and white a phenotype.
Superscripts and subscripts.
In ASCII text the characters [ and ] are used to enclose superscripted characters, and [[ and ]] used to enclose subscripts.
Cytogenetic terms.
Cytogenetic designations are not italicized except when part of an aberration symbol.
Reserved characters.
The following characters are reserved for special use in gene, allele, and aberration names and symbols or in genotypes:
Symbol | Use |
---|---|
\ | reserved for use in symbols of genes from species other than D. melanogaster |
/ | reserved for use as a homologue separator in stock genotypes |
{} | reserved for use in transposon and transgene construct symbols |
<> | reserved for use in transgene construct names and for cytotype designation in stocks |
[] | reserved for indicating superscripts in ASCII text |
[[]] | reserved for indicating subscripts in ASCII text |
() | reserved for use in compound gene names and symbols (e.g., l(1)) and for aberration symbols, and for the indication of ambiguous genotypes |
; | reserved as a separator of chromosome (chromosome arm) numbers in aberration names and symbols, and to separate markers or aberrations on non-homologous chromosomes in stock genotypes |
: | reserved for use in symbols of defined classes, i.e., transgene constructs, genes encoding special RNAs (tRNAs, snRNAs), fusion genes and mitochondrial genes, and, in stock genotypes, to indicate the association between markers on reciprocol components of translocations, or arms of compound chromosomes. |
Representation of gene products in text
Proteins.
Protein product names and symbols should not be italicized in printed text. Where feasible, proteins that are named for the gene should be further distinguished by capitalizing the initial letter of the gene symbol or name. For example, the protein product(s) of the hh (hedgehog) gene could be correctly denoted as Hh or Hedgehog; the protein product(s) of the RpL38 (Ribosomal protein L38) gene could be correctly denoted as RpL38 or Ribosomal protein L38; and the the protein product(s) of the AGO1 (Argonaute-1) gene could be correctly denoted as AGO1 or Argonaute-1.
There are no fixed rules for denotation of proteins not named for the gene.
RNAs.
There is no convention for symbolically designating generic RNA products of genes in text.
Updates.
Substantive changes made to this document since its presentation at the Atlanta Drosophila meeting in April, 1995, are noted here.
Version 2.01, April 25, 1995: The rules for naming fusion genes (para 3.2.3) have been changed.
Version 2.02, May 13, 1995: A new paragraph (7.7) on the naming of ambigous genotypes has been added.
Version 2.06, November 22, 1995: Corrections have been made to the examples of names of transposons to conform with current FlyBase practice. The list of 'honorary genes' has been updated.
Version 3.0, March 18, 1996: The symbol for complex aberrations has been changed from complex to Ab. The placement of the <> symbols indicating orientation of an FRT has been changed to conform with current usage. The colon is introduced as a separator of markers on reciprocal components of translocations and arms of compound chromosomes as a way of clarifying the relationships and expected behaviors of these elements in stocks. The list of 'honorary genes' has been updated. A table of contents has been added. Assorted small changes have been made in the document.
Version 3.01, March 29, 1996: The rules for naming genes identified by sequencing projects have been changed, and new FlyBase mirror sites have been added.
Version 3.02, August 7, 1996, corrects the explanation of < and > used to indicate FRT orientation. The list of 'honorary genes' has been removed.
Version 3.03, August 21, 1996, clarifies what constitutes a transposon symbol.
Version 4.0, February 19, 1997, includes naming of in vitro mutagenesis constructs (Section 2.2.) and balancers (Section 5.14.).
Version 4.1, June 3, 1997, includes modification of rules for naming multiple transposon insertions (Section 3.1.2.), a clarification of rules for representing proteins in text (Section 11.1.), and a proposal for naming genes that encode ribosomal proteins (Appendix A.).
Version 4.2, March 8, 1998, includes modified rules for naming genes identified only by genomic sequencing projects (Section 1.1.3).
Version 4.3, May 21, 1998, includes minor changes to the Introduction and format of the document.
Version 4.4, February 9, 1999, includes a change to Section 5.13. supporting the identification of an unknown breakpoint as heterochromatic.
Version 5.0, July 6, 1999, all references to 'honorary genes' have been removed (this category is no longer used by FlyBase) and a description of nicknames has been added (Section 7.7.).
Version 5.01, August 23, 1999, assorted minor corrections were made.
Version 6.0, November 23, 1999, updates Section 9.1. to include FlyBase's new policy on the use of sequence accessions to determine precedence of gene names and symbols. Many links have been added and assorted corrections made.
Version 6.1, December 27, 1999, updates Section 1.1.3. to include genes identified by Celera.
Version 6.2, April 5, 2000, updates Section 1.1.3. with the derivation of anonymous gene symbol prefixes.
Version 6.3, May 12, 2000, updates Section 2.2. to clarify the current rules for allele symbols.
Version 7, August 28, 2000, updates Section 11.2., eliminating the convention (which was never adopted by Drosophilists) that RNA products of genes are designated in text by the gene symbol in all italic capital letters.
Version 7.1, April 18, 2001, updates Section 1.1.3. to make explicit the need for authors to provide the CG gene symbol when renaming a CG-named gene.
Version 7.2, April 24, 2001, updates Section 1.1.3. to clarify the on-going assignment of CG names.
Version 8.0, August 1, 2001, updates Sections 1.1. to clarify the one-gene:one-valid-symbol rule, 1.1.1. to clarify the case of certain gene symbols, and 5.15. to make explicit the various ways in which balancers can be described.
Version 8.1, August 28, 2001, updates Section 5.13. to change 'h?' to 'h' as the symbol for undefined heterochromatic breakpoints in complex aberration symbols.
Version 8.2, October 25, 2001, updates Section 3 to include foreign gene prefixes in example genotypes.
Version 8.3, November 26, 2001, rewords Section 7 to clarify that genotypes specify alleles of genes.
Version 8.4, March 22, 2002, updates cytology in examples in Sections 5.5.2.2. and 6.2.
Version 9, August 16, 2004, updates sequence annotation nomenclature in Section 1.1.3. and emphasizes in 1.7.2. and 10.5. the prohibition against use of the character / in gene and other symbols. 11. is slightly modified to clarify that these options apply to generic proteins and transcripts from a given gene.
Version 10, November 16, 2006, updates the proteins of mitochondrial ribosomes entry in the Appendix
Version 10.1, August 23, 2007, clarification of the column heading of the common prefixes table Section 1.3
Version 10.2, February 6, 2008, created section 1.1.4 and ammended section 1.1.3 to specify the annotation prefixes used for genes identified in Drosophila genomic sequencing projects Sections 1.1.3 and 1.1.4
Version 11, October 17, 2008, The guidelines on gene symbols and names were updated. The Preamble replaced the "Introduction", and the sections Policy for establishing FlyBase-approved gene symbols and names and Gene symbols and names were created, while the sections "Gene names and symbols" and "Valid Symbols & Synonyms", and Appendix A "Naming of genes encoding ribosomal proteins" were removed. The sections Allele names and symbols to Cytotype were renumbered accordingly.