The Beginner's Guide to Modifying the Ontologies





The CVS repository  |  DAG-Edit  |  Setting up  |  Accounts  |  Mailing lists  |  Downloading software  |  UNIX aliases  |  Accessing the CVS repository  |  Editing the GO files using DAG-Edit  |  Assigning your range of GO numbers  |  Setting your range of numbers within DAG-Edit  |  Emacs  | 

 


This guide is intended for new GO curators. It assumes that you have a basic understanding of GO's structure and scope (See GO Documentation) but no technical knowledge whatsoever.

The CVS repository

Each of the ontologies, along with a separate file containing definitions of the terms from all three ontologies, is stored as a separate flatfile at the GO CVS repository in Stanford. The CVS (concurrent versions system) is just a tool that allows multiple users to edit a file simultaneously. You can learn more about CVS at this CVS web site.

DAG-Edit

Editing the flatfiles using a text editor such as emacs is rather cumbersome, so most of our editing is done using a wonderful piece of software for editing directed acyclic graphs (DAG), called DAG-Edit. For more information on how to use DAG-Edit, see the Gene Ontology DAG-Edit User Guide or the CVS repository DAG-Edit user guide. GO users can request changes to the ontologies in two ways - through a tracking system called Sourceforge or by e-mail to the GO or GO-FRIENDS mailing lists (see below). We encourage people to register new requests at the SourceForge, but bear in mind that not all GO users have SourceForge accounts so if, when dealing with a SourceForge request, you decide that you need to change a significant chunk of an ontology it's worth mailing GO to let everyone know what you're planning to do.

Setting up

Before you can assign SourceForge requests to yourself and edit the ontologies, you'll need to set up the following:

Accounts

  • UNIX account at your work site
  • GO CVS account - e-mail gocvs-update@genome.stanford.edu
  • GO database account (optional) - At the moment there is no database and so an account is not necessary but at some point in the future this account will allow users to modify the database directly using DAG-Edit. At the moment we modify the flatfiles using DAG-Edit and then our modifications are passed on to the main database.
  • SourceForge account via Sourceforge web site
Jane and Midori are SourceForge administrators so they can grant the privileges (tech and admin) that you need.

Mailing lists

  1. GO: This is a private mailing list for members of the GO consortium. There's also a publicly available mailing list called GO-FRIENDS, but all GO friends messages are automatically sent to GO, so there's no point in signing up to both. It's also worth bearing in mind that if you send a message to GO-FRIENDS, there's no point in sending it to GO too, because all members of the GO mailing list would get your message twice. To subscribe to the GO mailing list, send the words 'subscribe go' in the body of an e-mail to the GO mailing list administrator or to GO-request.

  2. GO-DIFF: This sends a daily update of all the things that have changed in the flatfiles. If it's huge, it may be a warning that someone has inadvertently deleted a chunk of one of the files or something similar, so it's a good way of alerting yourself to the possibility that you've done something that needs fixing. Alternatively it may indicate that many terms have been added or other significant changes have been made. To subscribe to GO-DIFF, send the word 'subscribe' in the body of an e-mail to the GO-DIFF list administrator

  3. GO-DATABASE: This is a mailing list for users and developers of the GO database. To join this mailing list please mail the GO-database mailing list adminitrator.

Downloading software

Download and install the latest version of DAG-Edit. For instructions, see the DAG-Edit user guide If you'd like to use the splitpane GUI, you need to download the plugins as well. Then, in DAG-Edit, go to Plugins, choose DAG-Edit Configuration Manager from the pull-down menu, go to the Display Layout tab and hit the 'Import XML file' button. This will allow you to browse for your copy of the plugin and import it.

Accessing the CVS repository

We run CVS in UNIX. If you're new to UNIX, see the notes at the very end of this document.

Open a terminal window (or, if using Windows, access your UNIX account). When you start UNIX, you are placed in a program called the shell. Shells come in different flavors, but we use either the c-shell (csh) or a variant of it called the tc-shell (tcsh). The shell interprets commands (which you type in) and passes them to the kernel, which performs the commands. The first file that UNIX reads is your .tcshrc file; you can control your working environment by setting environment variables (e.g. telling it what text editor to use, what printer to use) and aliases (i.e. use this short text string to do something that would normally involve typing a long text string). To do this, you need to edit the .tcshrc file (or your .cshrc file) by opening it in a text editor such as emacs. Type:

  emacs .tcshrc 

First, set an environment variable to change your default text editor to emacs:

  setenv EDITOR emacs 

If you fail to do this, you might end up having to use a different text editor called vi when you access the CVS log (see later). There are two ways of setting things up so that you don't have to type the long path to the CVS repository every time you need to access it. EITHER set up an environment variable that allows you to access it simply by typing CVS:

   setenv CVSROOT \:pserver:[username]@gocvs.geneontology.org:/share/go/cvs 

OR use a UNIX alias. Type:

 alias CVS 'cvs -d:pserver:[username]@gocvs.geneontology.org:/share/go/cvs'

If you use this method, you can change the alias to anything you like. It will also save you a lot of typing in the future if you add some more aliases to your .tcshrc file for when you need to access cvs. If you're quite new to this kind of thing and you're planning to follow this series of guides right through then you can add the aliases below, and then the commands you need to type will correspond exactly to those in the instructions which follow.

alias en 'emacs go/numbers/go_numbers'
alias e 'emacs'

alias logp 'cvs log go/ontology/process.ontology | more' alias logc 'cvs log go/ontology/component.ontology | more' alias logf 'cvs log go/ontology/function.ontology | more' alias logd 'cvs log go/doc/GO.defs | more' alias logs 'cvs log go/doc/synonyms/Synonyms.txt | more' alias logn 'cvs log go/numbers/go_numbers' alias log 'cvs log'
alias upc 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs update -A go/ontology/component.ontology' alias upf 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs update -A go/ontology/function.ontology' alias upp 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs update -A go/ontology/process.ontology' alias upd 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs update -A go/doc/GO.defs'
alias cpc 'cp go/ontology/component.ontology go/ontology/old/component.ontology' alias cpf 'cp go/ontology/function.ontology go/ontology/old/function.ontology' alias cpp 'cp go/ontology/process.ontology go/ontology/old/process.ontology' alias cpd 'cp go/doc/GO.defs go/doc/old/GO.defs'
alias cpgo 'cpc; cpf; cpp; cpd'
alias dcvsc 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs diff go/ontology/component.ontology | more' alias dcvsf 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs diff go/ontology/function.ontology | more' alias dcvsp 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs diff go/ontology/process.ontology | more' alias dcvsd 'cvs -d :pserver:jenclark@gocvs.geneontology.org:/share/go/cvs diff go/doc/GO.defs | more'
alias dc 'diff go/ontology/component.ontology go/ontology/old/component.ontology | more' alias df 'diff go/ontology/function.ontology go/ontology/old/function.ontology | more' alias dp 'diff go/ontology/process.ontology go/ontology/old/process.ontology | more' alias dd 'diff go/doc/GO.defs go/doc/old/GO.defs | more'
alias cvsp 'cvs ci go/ontology/process.ontology' alias cvsc 'cvs ci go/ontology/component.ontology' alias cvsf 'cvs ci go/ontology/function.ontology' alias cvsd 'cvs ci go/doc/GO.defs' alias cvsn 'cvs ci /go/numbers/go_numbers'
The function of these aliases will be explained later on in the Guide to Addressing a SourceForge Request

Editing the GO files using DAG-Edit

Before you start, you need to download the current version of the GO flatfiles. To fit in with the aliases and directory structures described in this guide you should make sure you are in the 'Documents' directory. Type:

  cvs login 
It'll ask you for your password (which you should have got from gocvs-update@genome.stanford.edu)

'Checkout' GO by typing

  cvs co go 

This downloads the current GO files into a new directory which is called the working directory. You will also need to create a couple of new directories. Find your way to the 'go' directory. Within 'go' you will find the directory called 'ontology', and within this you will need to make a new directory called 'old'. There is another directory called 'go/doc' and within this you should make another directory called 'old'. (These directories can actually be anywhere but the aliases above will only work if you use the directory structure and naming convention described here.)

Assigning your range of GO numbers

The cvs repository contains a list of who 'owns' each range of GO-IDs. To assign yourself some numbers on this list, you need to edit your local copy then commit it back to the cvs repository. Type


  emacs go/numbers/go_numbers

Scroll down to

  Allocated number ranges for additions:

and add your chosen range of numbers in the format:

  CB:   GO:0048001 to GO:0050000

For example, if your range of numbers is just above the wormbase set, you could type

   ctrl-s wormbase 

to locate the bottom of the preceding range. Then type

  Yourname's GO numbers 

Then type the range of numbers. Underneath that, start your list of GO numbers, using the following format:

   GO:0048001      CB      erythrose-4-phosphate dehydrogenase 

Once you've finished editing the file, save and close it, then commit it back to the cvs repository:

  cvs ci go/numbers/go_numbers 


Setting your range of numbers within DAG-Edit

Once you have claimed a set of numbers from the ontology nubers file, you must also set these numbers within the configuration file of DAG-Edit.To do this you should open DAG-Edit, and chose from the 'plugins' menu the 'DAG-Edit Configuration Manager'. Within this window you can now fill in your range of numbers, starting in the 'start of id range' line and finishing in the 'End of id range' line. Press 'Save Configuration' to save your changes.



Emacs

There are several UNIX text editors, but the easiest to use is emacs. To open a text file in emacs, type

emacs filename

The online emacs tutorial can be accessed by typing

ctrl-h t

Once you have done everything in this web page you can move on to Guide to Addressing a SourceForge Request



 last modified October-2003 Report problems with this website to webmaster@geneontology.org
For problems with Netscape 4, please upgrade to Netscape 7