FlyBase:RefMan G.
G.1. Nontraditional alleles
In addition to 'alleles' in the traditional sense, FlyBase now names and curates further classes of allele so that phenotypic or expression pattern data can be captured for in vitro construct alleles and alleles of reporter (e.g., Ecol\lacZ), effector (e.g., Scer\FLP) or toxin (e.g., Rcom\DT-A) genes. Since these alleles have not historically been named by researchers, and have been named by FlyBase, their presentation in FlyBase requires some explanation:
G.1.1. Alleles of reporter genes
Alleles of reporter genes currently fall into two main classes, those resulting from enhancer trap experiments, and those resulting from promoter (or other regulatory region) analysis, where a fragment is used to drive the expression of a reporter gene. Ecol\lacZ will be used for illustration.
Enhancer trap results:
- The enhancer trap construct causes an allele of a gene and is expressed in a pattern consistent with insertion in that gene. The resulting insertion will be described with the format P{A92}hL43a, and the Ecol\lacZ allele symbol is of the format Ecol\lacZh-L43a.
- The reporter gene reflects the expression of a gene without causing a mutant allele of that gene. The resulting insertion will be described with the format P{PZ}P2023-44, where P2023-44 reflects the insertion identifier, and the Ecol\lacZ allele symbol is of the format Ecol\lacZhh-P2023-44.
- The reporter gene reflects the expression of an undescribed gene/enhancer. The resulting insertion will be described with the format P{lacW}1.28, and the Ecol\lacZ allele symbol is of the format Ecol\lacZ1.28.
Promoter analysis results:
- Generally some fragment of a gene promoter/intron/3'-region is fused to the reporter gene. In this case the allele symbol is of the form 'gene symbol.fragment descriptor' e.g., Ecol\lacZeve.prox54. The fragment descriptor reflects that used in the publication, even though this may be long and cumbersome (this may not be strictly true for such alleles curated early in the FlyBase project).
- Where a reporter gene is simply described in a publication as being driven by, e.g., an arm promoter, the symbol of the Ecol\lacZ allele is 'arm.PI', where I is the first letter of the surname of the first author of the paper, e.g., Ecol\lacZarm.PV for 'Ecol\lacZ arm promoter construct of Vincent'.
- For logistical reasons some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of the reporter gene. The symbol for the additional gene(s) contributing to the fusion is indicated as part of a superscript, e.g., Ecol\lacZP\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.
G.1.2. Alleles of ectopically expressed Drosophila gene products
Products of genes may be ectopically expressed due either to juxtaposition with different regulatory sequences in the genome (as a result of being inserted into different-than-wild-type locations by chromosome rearrangement or P element transposition) or due to in vitro construction creating a different constellation of regulatory sequences than in wild type.
By analogy with alleles of Ecol\lacZ for enhancer traps, P-element-borne insertions of genes e.g., w or ve that have a qualitatively distinct _position-dependent_ mutant phenotype will be curated as new alleles of e.g., w or ve, e.g., veStg caused by a particular insertion of P{HS-rho}, P{HS-rho}Stg.
The 'in vitro construct' ectopic expression alleles currently fall into two main classes, one component or two component systems:
One component systems:
- Gene A is expressed from a promoter of gene B. The allele is typically generated by in vitro construction. In such cases the allele symbol is of the format 'gene-Agene-B.PI', e.g., phylsev.PC or 'gene-Agene-B.fragment descriptor' where the author includes a promoter fragment descriptor, e.g., phylninaE.GMR.
- An occasional exception is made for promoter fusions that are widely used to provide essentially wild-type gene function; these alleles have the mini-gene '+m construct' designation (see below) prepended to an, e.g., heat shock designation, e.g., w+mW.hs.
- It is common that authors report a construct where e.g., ftz is expressed under a 'heat shock' or Hsp70 promoter, while providing no further details about the nature of the promoter. For these cases the allele symbol hs.PI is employed, e.g., Antphs.PZ for 'Antp heat shock construct of Zeng'. An 'hs' designation should be reserved for when the heat inducible, not just the minimal, promoter fragment is used.
- Where the allele is both altered in its coding region and being expressed from an ectopic promoter the sequence 'alteration.promoter' is used in the allele designation, e.g., tor13D.hs.sev to denote the coding sequence of tor13D expressed from a heat shock (undefined) promoter with a sev enhancer. An exception to this rule is made for Tags, which appear as the last component of the allele symbol (see below).
Two component systems:
- GAL4-UAS The allele symbol for the gene whose expression is dependent upon Scer\GAL4 shall include 'Scer\UAS' and an identifier. The identifier should reflect the construct as named by author e.g., l(1)scDeltaB.Scer\UAS. In the absence of any other identifier '.cIa' is used, where 'c' stands for construct, I for the first author's last name initial and 'a' for the first in the series (subsequent ones will be b, c, etc). e.g., aseScer\UAS.cBa for 'Scer\UAS construct a of Brand'.
- FLP-FRT Alleles of Scer\FLP are named as outlined above for reporter genes, and allele symbols of genes whose expression is dependent upon that of Scer\FLP include 'Scer\FRT'.
G.1.3. Alleles of ectopically expressed non-Drosophila effector products
A note on ribozymes: FlyBase has a foreign ribozyme gene, symbol LTSV\RBZ. Alleles of LTSV\RBZ capture the different variants, e.g., for a heat inducible ftz-targeted ribozyme: LTSV\RBZhs.ftz (syntax 'promoter.target gene') will be named.
'+m' minigenes
The minigene allele designation is used in its narrow sense, i.e., where the only difference between the allele and the wild type is the removal of more or less non-essential sequences. Thus the minigene allele symbol designation reserved for those cases where the gene's own promoter is driving its expression.
The minigene allele symbols begin with 'm', for minigene, and are followed by the construct symbol used in the publication. If no construct symbol has been used, the string 'mIa' where 'm' stands for minigene, 'I' for the first author's last name initial and 'a' for the first in the series is used. If the function of the minigene is stated to be indistinguishable from that of the wild type allele, the 'm' is preceded by a '+'.
Tags Genes can be modified by the addition of a tag allowing the product to be identified, purified, or targeted to a particular subcellular distribution. Tagged alleles have the syntax 'gene-symbol x.T:y' , where x is an identifier and y is the name of the tag, e.g., Hsap\MYC, T:Ivir\HA1, SV40\nls2, e.g., dap1gm.T:Hsap\Myc. Where a tag is artificial, the species prefix Zzzz is used, e.g. T:Zzzz\His6.
G.1.4. Classical alleles engineered into transgene constructs, including rescue constructs
A class of alleles are named to capture fragments of genomic DNA used in rescue constructs. The symbol for the rescuing allele symbol begins with '+t'. This is followed by length as stated by authors, construct symbol if length is not given or '+tIa', where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series (if neither length nor construct symbol is stated). When rescue is incomplete, the construct is considered as carrying a mutant allele. Allele designator is construct symbol, 'length of genomic insert.tIa' if no symbol is given or 'tIa' where neither length nor construct symbol is stated.
When a classical allele, e.g., wa, is put into a transgene construct it will get a new designation, e.g., wa.tIa, to reflect its transgenic environment, where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series
FlyBase is, of course, happy to discuss and advise on use of nomenclature of these non-traditional alleles.
The controlled vocabularies currently used by FlyBase are:
- The Gene Ontology (GO). This provides structured controlled vocabularies for the annotation of gene products (although FlyBase at present annotates genes with GO terms, as a surrogate for their products). The GO has three domains: the molecular function of gene products, the biological process (i.e. roles) in which they are involved and their cellular component (location).
- Anatomy. A structured controlled vocabulary of the anatomy of Drosophila melanogaster, used, for example, for the description of phenotypes and where a gene is expressed.
- Development. A structured controlled vocabulary of the development of Drosophila melanogaster, used, for example, for the description of phenotypes and when a gene is expressed.
- The Sequence Ontology (SO). A structured controlled vocabulary for sequence annotation, for the exchange of annotation data and for the description of sequence objects in databases. Its use by FlyBase means that the various components of the genome are described in a consistent and rigorous manner.
- FlyBase controlled vocabulary. A structured controlled vocabulary used for the annotation of various objects in FlyBase, including publications (by their type), alleles (for their mutagen etc). Although some of these domains will probably always remain local to FlyBase, in time, community ontologies will be available for others (e.g. chemical compounds for mutagens) and FlyBase will then use these.
All of these structured controlled vocabularies are in the same format, that used by the Open Biomedical Ontology group. This format is called the OBO format and files using it have the suffix '.obo', e.g. gene_ontology.obo. The OBO format is designed to be used with the freely-downloadable OBO-Edit tool.
Users should be aware that controlled vocabularies undergo continual development; terms and definitions are refined, added, merged, split and obsoleted in an effort to improve the way they represent their various subjects.
Both the current 'live' versions of each controlled vocabulary and the static versions taken at the time data for this FlyBase release was frozen are available to download from the Precomputed files download page under the Files menu of the Navigation bar.
The detail of each controlled vocabulary term is displayed in a CV Term Report in FlyBase. Individual CV Term Reports can be reached either by clicking on the controlled vocabulary term where it is displayed in a report page (e.g. the GENE ONTOLOGY: Function, Process, and Cellular component section of the Gene Report), or by using the TermLink tool, which allows users to search directly for controlled vocabulary terms from any of the controlled vocabularies used by FlyBase.
Controlled vocabulary terms can also be searched using the QueryBuilder tool, via their links to objects (such as genes) in FlyBase. If you wish to search using a controlled vocabulary term in QueryBuilder, you should select the GO/Anatomy CV DB dataset in the query segment box (see the QUERY BUILDER HELP section at the bottom of the QueryBuilder page for more details.