The GO style guide introduces new users to (and reminds old users of) both the philosophy and the practicalities behind developing and maintaining GO. Its main purpose is to serve as a user manual for GO curators. You will find it more useful if you first read An Introduction to GO for more general background information about the GO project and how the ontology works. Information on annotating genes and gene products to GO can be found in the GO Annotation Guide and information on the structure and syntax of the GO files can be found in the GO File Format Guide.
As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. Practically speaking, a term is simply the text string used to describe an entry in GO, e.g. cell, fibroblast growth factor receptor binding or signal transduction. A node refers to a term and all its children. GO does not contain the following:
The following stylistic points should be applied to all aspects of the ontologies.
General database cross references (general dbxrefs) should be used whenever a GO term has an identical meaning to an object in another database. For more information on syntax, and for a complete list of dbxrefs, please refer to the GO File Format Guide. GO cross references the following databases for certain types of terms. This list is up to date as of July 2002:
If you create a new term, or refine a term, define it in go/doc/GO.defs. A definition must contain the following:
For the syntax of these items in the GO.defs file, refer to the GO File Format Guide.
If you use DAG-Edit, the term and the GO ID associated with any new term will automatically be inserted into the GO.defs file when you save your work.
Definitions should explain clearly to the reader what is meant by a particular term. They should be concise, full sentences. They should begin with an upper-case letter and end with a period (full stop). Proofread your definitions carefully to eliminate typos and double spaces. The definition should be written at the same level of specificity as the term itself, e.g. in the case of sensu terms. It should also be consistent with the guidelines for the contents of each ontology.
Where a 'standard' definition exists for a group of related terms, use it. Below is a list of standard definitions. If you find yourself repeatedly using the same text string in a series of definitions, please send your 'standard definition' to Amelia Ireland, who keeps an up-to-date version of this list.
term: regulation of neuronal synaptic plasticity
A GO ID is really associated with a definition rather than with the term name. If we change the wording but not the meaning of a term, the GO ID stays the same, whereas a new meaning requires a new GO ID, even if the text string doesn't change. Here's a trivial example that illustrates when we do and don't change GO IDs:
Assume that we have a term 'mouse', GO ID GO:0000123, in an ontology; it is defined as a small furry mammal.
The GO.defs file contains an optional field for adding comments about an entry. The purpose of this is to help annotators, especially if you have obsoleted or redefined a term. Comments can be anything relevant to the term or term definition. If you write a comment, use the appropriate syntax, which is detailed in the GO File Format Guide.
If you define a term, you must document where your definition came from. If you use DAG-Edit, the software won't allow you to commit a definition without entering a cross reference for it. Database cross references have two parts, separated by a colon: an abbreviation for the database being cross referenced (a list of accepted abbreviations can be found here) and the ID of the item in that database.
NEVER delete a GO ID: GO IDs should be conserved at all times so that, even if a term is defunct or has a new GO ID, someone searching using the old GO ID can find it. Here's how we ensure that a GO ID is never lost:
A term that is no longer used is not deleted, but is tagged 'obsolete'. A term can become obsolete when it is removed or redefined, but a term will not be made obsolete due to changes in wording that do not alter the meaning of the term. When a term's definition changes meaning, the term should also be assigned a new GO ID, and the old ID considered obsolete.
In the flat files, an obsolete term becomes a child of the node obsolete. The obsolete node and its children are kept at the end of each ontology file. (Note that the handling of obsolete terms will change once the GO database is in production -- then obsolete GO terms will be identified by a tag.)
When you make a term obsolete, add a comment that explains why the term has become obsolete and suggests alternative terms for annotators to use. The correct syntax for comments in obsoleted terms is described in the GO File Format Guide.
Terms may also be merged or split (splits occur very rarely). We merge terms when we notice that two terms actually have the same meaning. (Usually this situation arises when one term exists, and another wording of the same concept is added as a new term instead of as a synonym, either because a curator didn't find the old term or didn't know it meant the same thing.) A term can be split if curators decide that it combines two or more concepts that should be represented by separate terms.
The conservation of GO IDs in these cases is dealt with in the GO File Format Guide.
In all the ontologies, a child (more specialized term) can have multiple parents (less specialized terms); i.e., they are directed acyclic graphs (DAGs). This makes GO a powerful system to describe biology, but can also create some pitfalls for curators. Keeping the following guidelines in mind should help you to avoid these problems.
A child term can be a subclass of (is a) or a part of its parent. For example, the child GOterm3 may be a subclass of its parent GOterm1 and a part of its other parent, GOterm2. Keep in mind that part of means can be a part of, not is always a part of: the parent need not always encompass the child. For example, in the component ontology, replication fork is a part of the nucleoplasm; however, it is only a part of the nucleoplasm at particular times during the cell cycle. For information on how these relationships are represented in the GO flat files, see the GO File Format Guide.
The pathway from a child term all the way up to its top-level parent(s) must always be true. Often, annotating a new gene product reveals relationships in an ontology that break this rule, or species specificity becomes a problem. In such cases, the ontology must be restructured by adding more nodes and connecting terms such that any path upwards is true. When a term is added to the ontology, the curator needs to add all of the parents and children of the new term.
This becomes clear with an example: consider how chitin metabolism is represented in the process ontology. Chitin metabolism is a part of cuticle synthesis in the fly and is also part of cell wall organization in yeast. This was once represented in the process ontology as follows:
cuticle synthesis chitin metabolism cell wall biosynthesis chitin metabolism chitin biosynthesis chitin catabolism
The problem with this organization becomes apparent when one tries to annotate a specific gene product from one species. A fly chitin synthase could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
Here's how we revised the ontology to ensure that the true path rule is not broken:
chitin metabolism chitin biosynthesis chitin catabolism cuticle chitin metabolism cuticle chitin biosynthesis cuticle chitin catabolism cell wall chitin metabolism cell wall chitin biosynthesis cell wall chitin catabolism
The parent chitin metabolism now has the child terms cuticle chitin metabolism and cell wall chitin metabolism, with the appropriate catabolism and synthesis terms beneath them. With this structure, all the daughter terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true. In addition, gene products such as chitin synthase can be annotated to nodes of appropriate granularity in both yeast and flies, and queries will yield the expected results.
These logical relationships must be true in the ontologies:
GO nodes should aggressively avoid using species-specific definitions. Nevertheless, many functions, processes and components are not common to all life forms. Our current convention is to include any term that can apply to more than one taxonomic class of organism.
Within the ontologies, there are cases where a word or phrase has different meanings when applied to different organisms. For example, embryonic development in insects is very different from embryonic development in mammals. Such terms are distinguished from one another by their definitions and by the sensu designation (sensu means 'in the sense of'), as in the term embryonic development (sensu Insecta). Using the sensu reference makes the node available to other species that use the same process/function/component. A node should be divided into sensu sub-trees where the chldren are or are likely to be different.
A GO node should never be more species-specific than any of its children. Child nodes can be at the same level of species specificity as the parent node(s), or more specific. When adding more species-specific nodes, curators should make sure that non-species-specific parents exist (or add them if necessary). For example, the process term antimicrobial humoral response (sensu Invertebrata) has the parent antimicrobial humoral response. This allows the sister term antimicrobial humoral response (sensu Vertebrata) to be added. Antimicrobial humoral response (sensu Invertebrata) cannot have a child named antibacterial response because the child term must be at least as species specific as the parent term, so this is modified to antibacterial response (sensu Invertebrata).
'Note that this term is intended for, but not restricted to, annotation of fungal gene products.'
This comment is only intended as a time-saving feature for annotators, clarifying a point that should already be implicit in the term name and definition.
Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:
A biological process is a biological goal that requires more than one function. Mutant phenotypes often reflect disruptions in biological processes.
Where there are several biosynthetic pathways leading to the same product, we list each of them as an subtype of a general pathway. For example, we have:
%phosphatidylethanolamine biosynthesis ; GO:0006646 %phosphatidyl-N-monomethylethanolamine (PMME) biosynthesis ; GO:0006647 %dihydrosphingosine-1-P pathway ; GO:0006648
It is straightforward to name well-known pathways (e.g. glycolysis and the pentose-phosphate pathway are two ways to accomplish glucose catabolism), but harder for nameless minor pathways, such as the dihydrosphingosine-1-P pathway above. So that we do not mistakenly give two names to the same minor pathway, use the name of the first intermediate, as a synonym if not as the primary GO name.
A molecular function is an activity or task performed by a gene product. It often corresponds to something (such as a catalytic activity) that can be measured in vitro.
Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, mitochondrial primase needs only be primase because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'Lysosome' are necessary.
Gene products in themselves are not nodes of the function ontology, although doing something with or to a specific gene product can be one. For example, being hedgehog or a hedgehog receptor are not functions, but hedgehog receptor binding and hedgehog binding are functions. Most GO molecular function terms include the word 'activity' to help differentiate them from the physical gene product. When defining molecular function terms, be careful not to describe them as gene products. For example, the molecular function term kinase activity is defined as 'Catalysis of the transfer of a phosphate group, usually from ATP, to a substrate molecule', not 'an enzyme that catalyzes the transfer of a phosphate group, usually from ATP, to a substrate molecule'.
Regulatory and catalytic subunits of kinases, heterotrimeric G proteins, etc., are handled in the function ontology as subclasses of the relevant catalytic activity. For example:
%myosin phosphatase activity %myosin phosphatase, intrinsic catalyst activity %myosin phosphatase, intrinsic regulator activity
When annotating, if you know which subunit is which, use the specific nodes; otherwise use the parent.
For more information, see the minutes of the June 2003 GO consortium meeting and Sourceforge item 763301.
Generally, a gene product is located in or is a subcomponent of a particular cellular component. The cellular component ontology includes multisubunit enzymes and other protein complexes, but not individual proteins or nucleic acids. Cellular component also does not include multicellular anatomical terms.
To distinguish cellular components from functions, use 'complex' in the term name of a component, and append enzyme names with the word 'activity'. For example, the molecular function term pyruvate dehydrogenase activity (GO:0004738) describes the enzyme activity whereas the cellular component term pyruvate dehydrogenase complex (GO:0045254) describes the multi-subunit structure in which the enzyme activity resides.
Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:
Copyright © 1999-2003 Gene Ontology Consortium. Permission to use the information contained in this database was given by the researchers/institutes who contributed or published the information. Users of the database are solely responsible for compliance with any copyright restrictions, including those applying to the author abstracts. Documents from this server are provided "AS-IS" without any warranty, expressed or implied.
|last modified October-2003||Report problems with this website to email@example.com|
|For problems with Netscape 4, please upgrade to Netscape 7|