GO Editorial Style Guide





How to use this guide  |  What is a GO term?  |  General conventions when adding terms  |  Spelling conventions/GO Dictionary  |  Synonyms  |  Cross- referencing other databases  |  Defining terms  |  Always define new terms  |  Use of 'standard definitions'  |  Redefining terms  |  Adding comments to definitions  |  Database cross-references  |  Obsoleting, merging and splitting  |  Obsolete terms  |  Merges and splits  |  Moving terms: understanding relationships  |  Parent-child relationships  |  True path rule  |  Logical relationships  |  Using sensu for species-specific terms  |  Implicit species-specificity  |  Ontology- specific guidelines  |  Biological process guidelines  |  Molecular function guidelines  |  Cellular component guidelines  |  Bi- directional terms  |  Use of gene product names  | 

 

How to use this guide

The GO style guide introduces new users to (and reminds old users of) both the philosophy and the practicalities behind developing and maintaining GO. Its main purpose is to serve as a user manual for GO curators. You will find it more useful if you first read An Introduction to GO for more general background information about the GO project and how the ontology works. Information on annotating genes and gene products to GO can be found in the GO Annotation Guide and information on the structure and syntax of the GO files can be found in the GO File Format Guide.

 

What is a GO term?

As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. Practically speaking, a term is simply the text string used to describe an entry in GO, e.g. cell, fibroblast growth factor receptor binding or signal transduction. A node refers to a term and all its children. GO does not contain the following:

  • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as electron transporter, are.
  • Processes, functions or components that are unique to mutants or diseases: e.g. oncogenesis is not a valid GO term because causing cancer is not the normal function of any gene.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and will be described in a separate sequence ontology (see the OBO web page for more information).
  • Protein domains or structural features.
  • Protein-protein interactions.

 

General conventions when adding terms

The following stylistic points should be applied to all aspects of the ontologies.


Spelling conventions: Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of 'words' used in GO terms in the file go/doc/GODict.DAT. This is updated approximately once a week. If in doubt, check this list or consult the GO curators.
Abbreviations: Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.
Greek symbols: Spell out Greek symbols in full: e.g. alpha, beta, gamma.
Upper vs. lower case GO terms are all lower case except where demanded by context, e.g. DNA, not dna.
Singular vs. plural Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).
Be descriptive Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is 'aromatic amino acid family biosynthesis', then the child should be 'aromatic amino acid family biosynthesis, anthranilate pathway', not just 'anthranilate pathway'.
Anatomical qualifiers Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase but neither nuclear DNA polymerase nor mitochondrial DNA polymerase. These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.

 

Synonyms


Adding synonyms to terms Often when terms are created, there are several words that could be used for the term name. In such cases, one form will be chosen as term name whilst the other possible names are added as synonyms. Synonyms are also added to facilitate searching of the database, in cases where there is a name in common use that does not adequately describe the concept. An example of this is the synonym 'cytochrome P450 CYP27B' which corresponds inexactly to the term 'calcidiol 1-monooxygenase activity'. Provision of this synonym allows biologists to search on P450 and find relevant information, whilst preserving the full activity name as term name.
  • It is useful to add synonyms to terms because they help other users to find them. GO IDs are not attached to synonyms. The number of synonyms for a term is not limited, and one synonym can be used for more than one GO term.
  • Add synonyms if you edit a term name but the old name is still a valid synonym; for example, if you change respiration to cellular respiration, keep respiration as a synonym. This helps other users to find familiar terms.
  • Add synonyms if the term has (or contains) a commonly used abbreviation. For example, FGF binding could be used as a synonym for fibroblast growth factor binding.
  • Keep track of the synonym type, as described in the GO Synonym Guide.
When NOT to add synonyms Don't add a synonym if the only difference is case (e.g. start vs. START).
Synonym format Synonyms, just like term names, are all lower case except where demanded by context (e.g. DNA, not dna). For information on synonym types and their syntax, see the GO Format Guide.

 

Cross-referencing other databases

General database cross references (general dbxrefs) should be used whenever a GO term has an identical meaning to an object in another database. For more information on syntax, and for a complete list of dbxrefs, please refer to the GO File Format Guide. GO cross references the following databases for certain types of terms. This list is up to date as of July 2002:


Ontology Database Sample dbxref
Function Enzyme Commission EC:3.5.1.6
Transport Protein Database TC:2.A.29.10.1
Biocatalysis/Biodegradation DB UM-BBD_enzymeID:e0310
Biocatalysis/Biodegradation DB UM-BBD_pathwayID:dcb
Process MetaCyc Metabolic Pathway DB MetaCyc:2ASDEG-PWY
Component None  

 

Defining terms

Always define new terms

If you create a new term, or refine a term, define it in go/doc/GO.defs. A definition must contain the following:

  • term: the GO term to which the definition refers.
  • goid: the term's unique identifier.
  • definition: the definition of the term.
  • definition_reference: one or more references for the definition.

For the syntax of these items in the GO.defs file, refer to the GO File Format Guide.

If you use DAG-Edit, the term and the GO ID associated with any new term will automatically be inserted into the GO.defs file when you save your work.


Write your definitions carefully

Definitions should explain clearly to the reader what is meant by a particular term. They should be concise, full sentences. They should begin with an upper-case letter and end with a period (full stop). Proofread your definitions carefully to eliminate typos and double spaces. The definition should be written at the same level of specificity as the term itself, e.g. in the case of sensu terms. It should also be consistent with the guidelines for the contents of each ontology.


Use of 'standard definitions'

Where a 'standard' definition exists for a group of related terms, use it. Below is a list of standard definitions. If you find yourself repeatedly using the same text string in a series of definitions, please send your 'standard definition' to Amelia Ireland, who keeps an up-to-date version of this list.


  • Binding: Interacting selectively with
  • Biosynthesis: The formation from simpler components of
  • Catabolism: The breakdown into simpler components of
  • Fermentation: An anaerobic process in which X is broken down into Y.
  • Metabolism: The chemical reactions and physical changes involving
  • Negative regulation: Any process that stops, prevents or reduces the rate of
  • Positive regulation: Any process that activates or increases the rate of
  • Regulation: Any process that modulates the frequency, rate or extent of
  • Response to: A change in state or activity of an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of exposure to
  • Secretion: The regulated release of X from a cell or group of cells.
  • Transport: The directed movement of X into, out of, within or between cells.
  • Transporter activity: Enables the directed movement of X into, out of, within or between cells.
Note: There are some "regulation" terms for which the standard definition wording given above does not fit. This is often the case when a process term represents regulation of a trait rather than another process. For example:

term: regulation of neuronal synaptic plasticity
goid: GO:0048168
definition: A process that modulates neuronal synaptic plasticity; the ability of neuronal synapses to change as circumstances require. They may alter function, such as increasing or decreasing their sensitivity; or they may increase or decrease in actual numbers.
comment: Note that the syntax of the definition of the this term is different from the usual regulation syntax because it describes regulation of a trait rather than regulation of a process.

Redefining terms

A GO ID is really associated with a definition rather than with the term name. If we change the wording but not the meaning of a term, the GO ID stays the same, whereas a new meaning requires a new GO ID, even if the text string doesn't change. Here's a trivial example that illustrates when we do and don't change GO IDs:

Assume that we have a term 'mouse', GO ID GO:0000123, in an ontology; it is defined as a small furry mammal.

  1. We decide to change the wording to 'Mus musculus', keeping the definition the same. In this case we merely update the text; the GO ID stays the same because the meaning stays the same. We may choose to keep 'mouse' as a synonym, but there would still only be one ID associated with the term.
  2. We decide that the term 'mouse' should instead mean a piece of computer equipment. In this case, the old term and ID are moved to the 'obsolete' category (see obsolete terms), and 'mouse' as newly defined gets a new GO ID, GO:0000456. The old GO ID and definitions are saved for posterity in case we ever need to know what happened to them.

Adding comments to definitions

The GO.defs file contains an optional field for adding comments about an entry. The purpose of this is to help annotators, especially if you have obsoleted or redefined a term. Comments can be anything relevant to the term or term definition. If you write a comment, use the appropriate syntax, which is detailed in the GO File Format Guide.


Database cross references for definitions

If you define a term, you must document where your definition came from. If you use DAG-Edit, the software won't allow you to commit a definition without entering a cross reference for it. Database cross references have two parts, separated by a colon: an abbreviation for the database being cross referenced (a list of accepted abbreviations can be found here) and the ID of the item in that database.

  • If the definition comes from your head, use the database abbreviation GO and your initials in lower case as the ID (e.g. a definition written by Michael Ashburner has the dbxref GO:ma).
  • If the definition comes from a book, use the ISBN (e.g. a dbxref to the Oxford Dictionary of Molecular Biology would be ISBN:0198506732).
  • If the definition comes from a paper, use the PubMed ID (e.g. PMID:119108).

 

Obsoleting, merging and splitting terms

NEVER delete a GO ID: GO IDs should be conserved at all times so that, even if a term is defunct or has a new GO ID, someone searching using the old GO ID can find it. Here's how we ensure that a GO ID is never lost:


Obsolete terms

A term that is no longer used is not deleted, but is tagged 'obsolete'. A term can become obsolete when it is removed or redefined, but a term will not be made obsolete due to changes in wording that do not alter the meaning of the term. When a term's definition changes meaning, the term should also be assigned a new GO ID, and the old ID considered obsolete.

In the flat files, an obsolete term becomes a child of the node obsolete. The obsolete node and its children are kept at the end of each ontology file. (Note that the handling of obsolete terms will change once the GO database is in production -- then obsolete GO terms will be identified by a tag.)

When you make a term obsolete, add a comment that explains why the term has become obsolete and suggests alternative terms for annotators to use. The correct syntax for comments in obsoleted terms is described in the GO File Format Guide.


Merges and splits

Terms may also be merged or split (splits occur very rarely). We merge terms when we notice that two terms actually have the same meaning. (Usually this situation arises when one term exists, and another wording of the same concept is added as a new term instead of as a synonym, either because a curator didn't find the old term or didn't know it meant the same thing.) A term can be split if curators decide that it combines two or more concepts that should be represented by separate terms.

The conservation of GO IDs in these cases is dealt with in the GO File Format Guide.

 

Moving terms: understanding relationships

In all the ontologies, a child (more specialized term) can have multiple parents (less specialized terms); i.e., they are directed acyclic graphs (DAGs). This makes GO a powerful system to describe biology, but can also create some pitfalls for curators. Keeping the following guidelines in mind should help you to avoid these problems.


Parent-child relationships

A child term can be a subclass of (is a) or a part of its parent. For example, the child GOterm3 may be a subclass of its parent GOterm1 and a part of its other parent, GOterm2. Keep in mind that part of means can be a part of, not is always a part of: the parent need not always encompass the child. For example, in the component ontology, replication fork is a part of the nucleoplasm; however, it is only a part of the nucleoplasm at particular times during the cell cycle. For information on how these relationships are represented in the GO flat files, see the GO File Format Guide.


True path rule

The pathway from a child term all the way up to its top-level parent(s) must always be true. Often, annotating a new gene product reveals relationships in an ontology that break this rule, or species specificity becomes a problem. In such cases, the ontology must be restructured by adding more nodes and connecting terms such that any path upwards is true. When a term is added to the ontology, the curator needs to add all of the parents and children of the new term.

This becomes clear with an example: consider how chitin metabolism is represented in the process ontology. Chitin metabolism is a part of cuticle synthesis in the fly and is also part of cell wall organization in yeast. This was once represented in the process ontology as follows:

   cuticle synthesis
      chitin metabolism
   cell wall biosynthesis
      chitin metabolism
       chitin biosynthesis
       chitin catabolism

Illustration

The problem with this organization becomes apparent when one tries to annotate a specific gene product from one species. A fly chitin synthase could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.

Here's how we revised the ontology to ensure that the true path rule is not broken:

   chitin metabolism
      chitin biosynthesis
      chitin catabolism
      cuticle chitin metabolism
       cuticle chitin biosynthesis
        cuticle chitin catabolism
      cell wall chitin metabolism
       cell wall chitin biosynthesis
        cell wall chitin catabolism

Illustration

The parent chitin metabolism now has the child terms cuticle chitin metabolism and cell wall chitin metabolism, with the appropriate catabolism and synthesis terms beneath them. With this structure, all the daughter terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true. In addition, gene products such as chitin synthase can be annotated to nodes of appropriate granularity in both yeast and flies, and queries will yield the expected results.


Logical relationships

These logical relationships must be true in the ontologies:

Relationship Example
(1) If A is part of B
and C is a subtype of B,
is A part of C? -- YES
Protein-nucleus import, docking is part of protein-nucleus import.
Ribosomal protein-nucleus import is a subtype of protein-nucleus import.
Protein-nucleus import, docking is part of ribosomal protein-nucleus import.
(2) if A is an subtype of B
and B is an subtype of C,
is A is an subtype of C? -- YES
Terminal N-glycosylation is an subtype of terminal glycosylation.
Terminal glycosylation is an subtype of protein glycosylation.
Terminal N-glycosylation is an subtype of protein glycosylation.
(3) if A is part of B
and B is part of C,
is A part of C? -- YES
Laminin-1 is part of basement lamina.
Basement lamina is part of basement membrane.
Laminin-1 is part of basement membrane.
(4) if A is an subtype of B
and C is part of A,
is C part of B? -- NOT NECESSARILY
Meiotic chromosome is an subtype of chromosome
Synaptonemal complex is part of meiotic chromosome
Synaptonemal complex is not necessarily part of chromosome

Using sensu for species-specific terms

GO nodes should aggressively avoid using species-specific definitions. Nevertheless, many functions, processes and components are not common to all life forms. Our current convention is to include any term that can apply to more than one taxonomic class of organism.

Within the ontologies, there are cases where a word or phrase has different meanings when applied to different organisms. For example, embryonic development in insects is very different from embryonic development in mammals. Such terms are distinguished from one another by their definitions and by the sensu designation (sensu means 'in the sense of'), as in the term embryonic development (sensu Insecta). Using the sensu reference makes the node available to other species that use the same process/function/component. A node should be divided into sensu sub-trees where the chldren are or are likely to be different.

A GO node should never be more species-specific than any of its children. Child nodes can be at the same level of species specificity as the parent node(s), or more specific. When adding more species-specific nodes, curators should make sure that non-species-specific parents exist (or add them if necessary). For example, the process term antimicrobial humoral response (sensu Invertebrata) has the parent antimicrobial humoral response. This allows the sister term antimicrobial humoral response (sensu Vertebrata) to be added. Antimicrobial humoral response (sensu Invertebrata) cannot have a child named antibacterial response because the child term must be at least as species specific as the parent term, so this is modified to antibacterial response (sensu Invertebrata).

Comments to indicate implicit species-specificity.

Where a term is only intended to be used for annotation of genes from particular species or groups of species, a comment can be added as an aid to annotators. For example, the term 'invasive growth ; GO:0007125' describes a process observed in fungi, and would not be suitable for annotation of gene products involved in the pathological mammalian process of invasive growth. A comment is added as an aid to annotators:

'Note that this term is intended for, but not restricted to, annotation of fungal gene products.'

This comment is only intended as a time-saving feature for annotators, clarifying a point that should already be implicit in the term name and definition.


Dependent ontology terms

Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:

  • If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
  • If X regulation exists, then the process X must also exist. Potentially any process in the ontology can be regulated.

 

Ontology-specific guidelines

Biological process guidelines

A biological process is a biological goal that requires more than one function. Mutant phenotypes often reflect disruptions in biological processes.

Different pathways/processes leading to the same product

Where there are several biosynthetic pathways leading to the same product, we list each of them as an subtype of a general pathway. For example, we have:

        %phosphatidylethanolamine biosynthesis ; GO:0006646
         %phosphatidyl-N-monomethylethanolamine (PMME) biosynthesis ;
             GO:0006647
         %dihydrosphingosine-1-P pathway ; GO:0006648

It is straightforward to name well-known pathways (e.g. glycolysis and the pentose-phosphate pathway are two ways to accomplish glucose catabolism), but harder for nameless minor pathways, such as the dihydrosphingosine-1-P pathway above. So that we do not mistakenly give two names to the same minor pathway, use the name of the first intermediate, as a synonym if not as the primary GO name.


Molecular function guidelines

A molecular function is an activity or task performed by a gene product. It often corresponds to something (such as a catalytic activity) that can be measured in vitro.

Avoid cellular components in the molecular function ontology

Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, mitochondrial primase needs only be primase because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'Lysosome' are necessary.

Avoid gene products in the molecular function ontology

Gene products in themselves are not nodes of the function ontology, although doing something with or to a specific gene product can be one. For example, being hedgehog or a hedgehog receptor are not functions, but hedgehog receptor binding and hedgehog binding are functions. Most GO molecular function terms include the word 'activity' to help differentiate them from the physical gene product. When defining molecular function terms, be careful not to describe them as gene products. For example, the molecular function term kinase activity is defined as 'Catalysis of the transfer of a phosphate group, usually from ATP, to a substrate molecule', not 'an enzyme that catalyzes the transfer of a phosphate group, usually from ATP, to a substrate molecule'.

Function terms for subunits

Regulatory and catalytic subunits of kinases, heterotrimeric G proteins, etc., are handled in the function ontology as subclasses of the relevant catalytic activity. For example:

          %myosin phosphatase activity
           %myosin phosphatase, intrinsic catalyst activity
           %myosin phosphatase, intrinsic regulator activity

When annotating, if you know which subunit is which, use the specific nodes; otherwise use the parent.

For more information, see the minutes of the June 2003 GO consortium meeting and Sourceforge item 763301.


Cellular component guidelines

Generally, a gene product is located in or is a subcomponent of a particular cellular component. The cellular component ontology includes multisubunit enzymes and other protein complexes, but not individual proteins or nucleic acids. Cellular component also does not include multicellular anatomical terms.

Use of ... complex

To distinguish cellular components from functions, use 'complex' in the term name of a component, and append enzyme names with the word 'activity'. For example, the molecular function term pyruvate dehydrogenase activity (GO:0004738) describes the enzyme activity whereas the cellular component term pyruvate dehydrogenase complex (GO:0045254) describes the multi-subunit structure in which the enzyme activity resides.


Bi-directional reactions

For bi-directional reactions, we will create a single term that describes both directions of the reaction unless there is reason to believe that there is a biological justification to separate the two directions of the reaction into separate concepts.

A single term covering both directions of a reaction:
  • NADP phosphatase activity
    Definition: Catalysis of the reaction: H2O + NADP = NAD + phosphate.
Two separate terms covering the opposing directions of a single reaction:
  • ATPase activity ; GO:0016887
    Definition : Catalysis of the reaction: ATP + H2O = ADP + phosphate. May or may not be coupled to some other reaction.
  • ATP synthesis coupled electron transport ; GO:0042773
    Definition : The transfer of electrons through a series of electron donors and acceptors, generating energy that is ultimately used for synthesis of ATP.

Should gene product names be included in synonyms or term names?

As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.

Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:

Example containing p53 class: DNA damage response, signal transduction by p53 class mediator ; GO:0030330

definition: A cascade of processes induced by the cell cycle regulator phosphoprotein p53, or an equivalent protein, in response to the perception of DNA damage.

Copyright © 1999-2003 Gene Ontology Consortium. Permission to use the information contained in this database was given by the researchers/institutes who contributed or published the information. Users of the database are solely responsible for compliance with any copyright restrictions, including those applying to the author abstracts. Documents from this server are provided "AS-IS" without any warranty, expressed or implied.

 last modified October-2003 Report problems with this website to webmaster@geneontology.org
For problems with Netscape 4, please upgrade to Netscape 7