From FlyBase Wiki
Jump to: navigation, search


The nomenclature guidelines below explain how FlyBase assigns canonical symbols and names to its genetic objects (genes, alleles, transposons, insertions, aberrations and balancers). We encourage the community and journals to adhere to FlyBase-approved symbols/names for consistency in published datasets. While these guidelines cover most circumstances, there may be exceptional cases not clearly covered here. Please contact FlyBase to discuss such cases or any other aspect of the nomenclature.

1. Policy for establishing FlyBase-approved gene symbols and names

1.1. Justification for unique approved symbols/names.

It is of great value to the research community that there is a single officially sanctioned (approved) symbol and name for each gene in FlyBase. Use of unique symbols/names, together with corresponding unique identifiers (e.g., FBgn numbers) minimizes ambiguity in referring to these genes in the scientific literature.

1.2. Assigning approved symbols/names.

It is inevitable that multiple synonyms for a gene arise in the literature, typically as a result of publications on the same gene by multiple laboratories or the realization that genes previously thought to be independent are actually part of the same genetic unit. In such cases, FlyBase adheres to the following rules for establishing or changing the approved gene symbol/name.

1.2.1. Chronological precedence.

Approved gene symbols/names are normally established by the earliest date of publication of the proposed symbol/name in a peer-reviewed primary research paper. (No other form of publication is relevant to chronological precedence.)

1.2.2. Selection of lower or upper case of initial letter.

Gene symbols/names begin with a lowercase letter if the gene is FIRST named for the phenotype of a recessive mutant allele, and begin with an uppercase letter if they are FIRST named for the phenotype of a dominant mutant allele. Gene symbols/names also begin with an uppercase letter if they are FIRST named for an aspect of the wild-type molecular function or activity of the gene product, which includes genes named after an ortholog or paralog.

1.2.3. Community usage.

The chronological precedence and capitalization rules can be overridden in favor of an alternative gene symbol/name that is clearly favored by the research community. This can be on a gene-by-gene basis or to rationalize the nomenclature for an entire gene family or other functional grouping.

1.2.4. Placeholders.

Certain classes of generic gene symbols/names are placeholders (see sections 2.3.1 and 2.4) and are subject to replacement by a more meaningful symbol/name according to the rules of 1.2.1, 1.2.2 and 1.2.5. However, generic symbols/names based on a phenotype shall be retained by FlyBase if they are re-used by the first peer-reviewed research paper to characterize that gene and/or are clearly favored by the research community.

1.2.5. Validity criteria.

Authors' preferred symbols/names will be used as the FlyBase-approved gene symbols/names whenever possible. However, the validity criteria set out in section 2.2 must be adhered to, and FlyBase will modify authors' preferred gene symbols/names where necessary.

2. Gene symbols and names

2.1. Symbols versus names.

The gene symbol is typically an abbreviation of the full gene name and as such, should ordinarily consist of a minimal number of characters. The gene symbol and name should use comparable capitalization and character sets.

2.2. Requirements of FlyBase-approved Drosophila gene symbols and names.

2.2.1. Uniqueness.

Each approved gene symbol and name must be unique amongst all FlyBase-approved symbols and names.

2.2.2. Relevance.

The name should allude to the gene's function, mutant phenotype or other relevant characteristic.

2.2.3. Restricted and non-permissible characters.

There are several characters which have specific meanings in a genotype string. Use of these characters in a gene symbol would complicate interpretation of genotypes. Therefore, approved gene symbols shall adhere to the following rules:

Approved symbols shall not contain the following characters: /, \, {, }, <, >, [, ], ;, *.

Approved symbols shall not contain spaces. Where a separator is needed to keep characters from losing meaning by running together, a hyphen "-" should be used.

Approved symbols shall not contain letters from any character sets other than English or Greek.

Colons ":" shall only be used in the approved symbols of certain classes of non-protein-coding genes, genes encoded in the mitochondrial genome, and synthetic fusion genes, as described in section 2.6.

Round brackets "( )" shall only be used in certain classes of approved gene symbols as separators to designate a chromosome or an allele whose phenotype is modified by the gene in question.

2.2.4. Capitalization.

The rules governing the capitalization of the initial letter of gene symbols/names are described in sections 1.2.2 and 1.2.3 Subsequent letters are normally lowercase.

2.2.5. Superscripts and subscripts.

Gene symbols and names should not normally contain superscripts or subscripts. The only exception is when an allele name is an integral part of a gene symbol or name, e.g., su(wa).

2.2.6. Italicization.

All gene symbols and names should be italicized.

2.2.7. Genus/species prefixes.

Genes from all species, except D. melanogaster, automatically get a unique species abbreviation prefix appended to their FlyBase-approved symbol (see section 2.5.1). Any different/additional indication of a gene's origin (e.g. D, Dro or Dm) is redundant and/or ambiguous and will not form part of the FlyBase-approved gene symbol/name.


Symbols and names must be inoffensive.

2.3. Common prefixes.

2.3.1 Prefixes based on phenotype, EST or STS.

Several generic gene symbol/name prefixes have been used for genes sharing a common mutant phenotype or originally identified by virtue of an EST or STS. A non-exhaustive list is shown below: