Decision making in the curation of GO

Adding Gene Products to GO by use of Slots

Adapted from text by Chris Mungall at cjm@fruitfly.Org.

GO should not allow gene product names inside GO terms. It is still useful, however, to be able to annotate to this level of granularity; for instance to able to state that a gene product IL18_HUMAN is involved in 'interleukin-13 biosynthesis'. Cross products between orthogonal concepts are currently hard to maintain using available tools.

Some GO terms could have 'slots', which would be filled in the gene_associations file. For instance, 'biosynthesis' would have a 'slot' named 'synthesizes'. In this case the GO term 'interleukin-13 biosynthesis' would not exist. Instead, the annotation for IL18_HUMAN would include an entry to GO term 'cytokine biosynthesis ; GO:0042089' or just plain 'biosynthesis ; GO:0009058'; this entry/line would also have a column for 'slot', which would read "synthesises(interleukin-13)". Interleukin-13 could be replaced with an identifier from a product/family/physical-entity ontology.

Examples of use

1) Freddy and the Walrus

Freddy is annotating the walrus genome. Freddy has a bunch of sequence similarities to actin binding genes. He eagerly gets ready to annotate these but then discovers, to his consternation, that there is no actin binding term in GO. He subsequently discovers that he must annotate it as 'binding' and that he must fill the 'binds' slot with 'actin'. He does this either by asking the email-list, or perhaps by using the automated feature in AmiGO-2 that writes out correctly formatted gene_association lines for actin binding genes (which means he wouldn't even care about slots, it would appear to all intents and purposes as if actin binding was a term).

A year goes by. Freddy is mauled by a rabid walrus whilst conducting fieldwork. Sadly, freddy dies. The walrus genome sits lonely and unloved, no one willing to take on the task of maintaining the annotations. Meanwhile, those cheeky GO curators sneak 'actin binding' back into GO. Maybe they decided it was too radical to remove it; maybe they just hadn't got round to making an actin binding term at the time freddy annotated the walrus (substitute actin binding for a more plausible and obscure cross product term here). Have freddy's actin binding annotations become outdated, never to be updated? No! It turns out that an annotation that says

Product GO-term Slots
mygene binding binds('actin')

is logically equivalent to an annotation to the GO term 'actin binding'

2) Blossom's Data Extravaganza

blossom runs a series of crazy microarray experiments and runs an advanced clustering algorithm that also does some automated image analysis on a large set of in-situs, inferring gene network activations for various drosophila body parts at different developmental stages. Throw in some kind of screen to observe the results of various mutations. (This is a far fetched experiment but i'm just using this to illustrate boundary cases). The result is a list of (possibly shoddy) IEAs that show positive and negative regulation of every body part imaginable. However, on attempting to annotate with GO, blossom becomes vexed. She sees 'positive regulation of cell growth' but no 'negative regulation of cell growth'. There are terms for regulation of growth of synapses and eyes and vulvas, but nothing for development or regulation of wings, cuticles, antenna complexes (this is the case as of writing; obviously this is something the long running cross product solution hopes to solve).

blossom could talk to the GO curators and have all the new terms she requires added, but it seems easier to allow the flexibility, especially for a set of IEA assignments like this.

maybe a better way to do it would be to have blossom annotate using the slot system, make a whole bunch of assignments

CG1 regulation IEA type(+),part('eye'),process(growth)
CG2 regulation IEA type(-),part('leg'),process(growth)
CG2 regulation IEA type(-),part('B800-850 antenna complex'),process(development)

and so on

(aside: even though this is a microarray analysis, the associations are based on a possibly dodgy, purely computational analysis)

perhaps someone later comes along and checks blossom's work and turns some of the IEAs into IEPs. This can be seen as a validation of both the association and the implicit term; we could then automatically instantiate the terms. So for example,

CG2 regulation IEA -,B800-850 antenna complex , development

would become "negative regulation of B800-850 antenna complex development" (this would require some kind of grammar synthesis, or perhaps searching medline for meaningful permutations of the phrase units). Occasionally there will be synonyms more recognisable to a biologist, and of course these can be added manually.

provided the following 4 edges are present in the GO graph:

'negative regulation of B800-850 antenna complex development'
      IS-A regulation
      HAS-REGULATION-TYPE negative
      REGULATES-PROCESS development
      HAS-BODY-PART  B800-850 antenna complex

[I have written the above as child/relationship/parent to make it into english sentences; you would invert this to make it look like a normal GO tree]

blossom's annotation becomes logically equivalent to the new term, as it satisfied the logical conditions.

Entry of EC numbers in the Gene ontology

Enzymes in the molecular function ontology carry a cross-reference to an EC number from the enzyme nomenclature database http://www.Chem.Qmw.Ac.Uk/iubmb/enzyme/ This allows automated cross-referencing by other databases and in the past was used for automatic transfer of definitions from the enzyme nomenclature database to the gene ontology. The EC numbers are not entered into the cellular component ontology.

Incomplete EC numbers:

The EC numbers all follow the pattern EC n.N.N.N, for example EC
which corresponds to acetoin dehydrogenase. Each of the individual digits indicates a more precise level of definition of enzyme function.
EC 1 Oxidoreductases
EC 1.1 Acting on the CH-OH group of donors
EC 1.1.1With NAD or NADP as acceptor
EC Alcohol dehydrogenase

In some cases the function of an enzyme has not been completely characterized and so the EC number will also be incomplete and would take the format EC n.N.-.-. Incomplete EC entries have all been removed from the molecular function ontology, but they have been retained in the EC to GO flatfile. In the future we will work through these numbers to distinguish between the EC numbers corresponding to uncharacterized enzymes and those that simply correspond to enzymes whose functions require less than four digits for a full definition.

Enzyme PH ontology

Some similar enzymes are found to act at different pHs. In this case a separate entry will be made only if the two enzymes carry different EC numbers.

Coupled and uncoupled enzymatic reactions: ATPase

EC 3.6.3.N. And EC 3.6.4.N all correspond to ATPases. EC Is the number for the enzyme that can hydrolyse ATP without being coupled to any other reaction, whilst all other ATPases have another reaction coupled to their activity, as in the case of EC, Which is the Ca2+-transporting ATPase. For the purposes of GO, the enzymes are categoriesed according to whether they perform coupled and uncoupled reactions.

What is the distinction between the membrane fraction term and a subset of the membrane term?

These two concepts are very different because one is derived from a type of experimental data (cellular fractionation by centrifugation) and the other corresponds to a cellular component (the membrane):

1) Membrane term GO:0016020: Double layer of lipid molecules that encloses all cells, and, in eukaryotes, many organelles; may be a single or double lipid bilayer, also includes associated proteins.

2) Membrane fraction term GO:0005624: That fraction of cells, prepared by disruptive biochemical methods, that includes the plasma and other membranes.

The experiments that give rise to membrane fraction data involve the complete disruption of cellular structure followed by the mixing of cellular components and their division into soluble and insoluble fractions by centrifugation. There are various other fraction term names to encompass data from similar kinds of experiment which distinguish between substances with different solubilities. It is clear, that the xxx fraction term is not synonymous with a part of child of xxx.

Do we derive annotations to the xxx term from annotations to the xxx fraction term?

The experiments that give rise to xxx fraction data involve the complete disruption of cellular structure followed by the mixing of cellular components and their division into soluble and insoluble fractions by centrifugation. Therefore, whilst we can derive information about the solubility of components from the result, we cannot draw any conclusions about their original cellular location. For this reason, the xxx fraction term is retained only as a repository for experimental data, and no interpretation of the data is carried over to the children of the xxx term.


Where more than one expression can be used to refer to the same concept, a single term will be created and the other names will be entered as synonyms. This may also include expressions that are in common usage, and that are likely to be used by scientists searching for the particular term that pertains to their work. It may also include concepts that are a more general or more specific version of an existing term. In this case it is likely that the synonym will eventually be added as a term in it's own right, and that the addition of the synonym is simply a temporary measure. In order to distinguish these different types of synonyms, a set of symbols was created to show the different relationships of synonyms to their respective terms:

Synonym types
=	the term is an exact synonym
~	the terms are related (eg. XXX complex and XXX)
<	the synonym is broader than the term name
>	the synonym is more precise than the term name
!=	the term is not the same as the synonym (no precise relationship assigned)

For more information about synonyms and their documentation and use please read the Guide to Synonyms.

The Distinction between Function and Process

A gene product can have one or more functions and can be used in one or more processes; a gene product may be a component of one or more supra-molecular complexes. Function is what something does. It describes only what it can do without specifying where or when this usage actually occurs. Process is a biological objective. A process is accomplished via one or more ordered assemblies of functions.

How granular should we get in the ontologies- what belongs?

In general, the answer is that the GO should be as granular as possible, within the bounds already defined. Community feedback into the ontologies is especially important as a means to improve granularity. The general sentiment is that annotation should be done to the highest level of granularity available in order to provide an ontology that is useful for querying domain-specific databases. Note that the stated utility here is in the GO's use in interrogating domain-specific databases, not across domains. One example of the currently appropriate level of granularity was found in a discussion of the sensu Magnoliophyta terms:

Sensu Magnoliophyta

A problem was discovered with the sensu Magnoliophyta terms. Many of these temrs seem misleading because they actually refer to phenonmena that also occur more broadly outside Magnoliophyta. However it was pointed out that that 'sensu Magnoliophyta' just means 'in the sense of Magnoliophyta' and so does not exclude annotation of non-flowering plant gene products to such a term.

To get rid of the Magnoliophyta term and add a sensu term covering all groups that could actually by annotated to a term would be quite time consuming because we would have to check in the case of each term, whether it applied to all plants and whether all green algae were included etc.

At the moment there are no non-flowering plant species being annotated and so there is not an urgent need for terms to be created for the annotation of non-flowering plants.

With these points in mind it was decided that we should concentrate on making the flowering plant terms exhaustive and stick to sensu Magnoliophyta and then we can do the rest once we have non-flowering plants being annotated.

This illustrates some of the forces governing the level of granularity of the GO. For plant annotation, it is currently appropriate to make terms to this level of granularity, but this will change in the future when other plant species are being annotated to the GO. Therefore, the granularity of the GO may be assumed to be constantly changing in response to the various forces at work.

Sensu terms

Very often, similar processes operate in markedly different ways between organisms. The sensu terms were created to address this problem.

An example of the use of 'sensu':

  • embryonic development ; GO:0009790 The development of an organism from zygote formation until the end of its embryonic life stage. The end of the embryonic stage is organism-specific and may be somewhat arbitrary. For example, it would be at birth for mammals, larval hatching for insects and seed dormancy in plants.

  • embryonic development (sensu Magnoliophyta) ; GO:0009793 The embryonic development of flowering plants that ends with seed dormancy.

  • embryonic development (sensu Animalia) ; GO:0009792 The embryonic development of an animal from zygote formation until the end of its embryonic life stage. The end of the embryonic life stage is organism specific and may be somewhat arbitrary; for mammals it is usually considered to be birth; for insects the hatching of the first instar larva from the eggshell.

If more than one taxonomic group is included then the names are separated by commas, e.g.: %female meiotic spindle assembly (sensu Drosophila, sensu Mus) ; GO:0007056 One risk is that the many differences between process in different groups migh lead to the excessive proliferation of sensu terms. It was decided that sensu terms should only be created in the case of homonym terms; those which have the same text strings with unique meanings for each organism. To help enforce this, each time a 'sensu' terms is made, an explicit reference is included made, so that there need be nothing implicit in the definition.

In addition, the creation of sensu terms was restricted to those with class-level distinctions. For instance, mouse will represent the class Mammalia; Drosophila will represent the class Insecta.

In spite of the creation of these rules to minimise the number of sensu terms, the word 'sensu' has found it's place as one of the most frequently used words in the ontologies, alongside 'of' and 'and'.


Should GO include behavior terms or are there too few that are proven to be directly affected by gene activity? Peter Midford in Arizona, already is working on behaviour ontologies for loggerhead turtles, jumping spiders and we feel this level of detail seems to be beyond the scope of GO. However, there still needs to be some descriptive capability for behaviour within GO, both for Drosophila and maybe for mouse, to be able to annotate certain genes since both species have genes known to directly affect behavior. The essential questions relate to what should be included in Process. It is clear in Drosophila that one can pin certain genes to behaviours like walking or circadian rhythms because these are hard-wired. Conversely, there is need for an auxiliary ontology developed specifically to deal with behaviors in mouse since much knowledge in this area is not tied directly to specific gene activity

Conclusion - we do want behaviour in GO, but there may be other ontologies, for groups like mouse, that will extend these. In these cases we'll recommend that these auxiliary ontologies be consistent with GO and include any necessary cross-references to GO terms. To support this the GO terms should be at a level that can be used for many organisms for behaviours that have a genetically defined component.

 last modified October-2003 Report problems with this website to
For problems with Netscape 4, please upgrade to Netscape 7