Chicago GO Users Meeting - October 2001

Nathan Salomonis,, Phone: (415) 826-7500, FAX: , URL:

GenMAPP and Gene Ontology: Developing new tools for the organization and analysis of DNA-microarray data.
There is a critical need for analytical approaches that allow for systematic identification of specific biological cascades altered in a gene expression dataset. To address this need, we have utilized an approach that takes advantage of the vast array of annotations currently being developed by the Gene Ontology project. Using the computer application GenMAPP, Gene Micro-Array Pathway Profiler, a freely available program designed to visualize gene expression data upon biological cascades, we have begun to represent Gene Ontology Biological Process, Molecular Function, and Cellular Component groups as MAPP files (GenMAPP format files). Once created, any user can modify the MAPP to illustrate specific biological interactions between genes within this program, thus providing a higher level organization to groups of biologically related genes described within Gene Ontology. In the context of gene expression data, such MAPPs will be amenable to advanced search queries that will enable the identification of those specific MAPPs that contain coordinate changes in gene expression according to the users own statistical criterion. Such an approach will allow researchers to quickly assess which biological cascades contain altered expression patterns and visually identify such changes on two- dimensional maps. The GenMAPP program, a growing collection of MAPPs, and gene databases for other model organisms can be downloaded from

Elizabeth Nickerson,, Phone: 516-367-6977, FAX: 516-367-8389, URL:

The Human Genome KnowledgeBase
In response to the rapidly expanding body of knowledge concerning human biology we have created the Human Genome KnowledgeBase (GKB). The GKB serves as an integrative online resource for the scientific community. Taking the form of peer-reviewed, electronic mini-reviews describing processes in human biology, GKB summations link researchers to all information relevant to the summation topic including genome and protein databases, literature references and the Gene Ontology. Complete pathways are described comprehensively as text and associated assertions implementing a controlled vocabulary. Assertions are simple statements describing accepted facts, for example A binds B, accompanied by all relevant references. All information documented as an assertion is searchable for cross-referencing. We are currently in the pilot phase of this project. The GKB, including a portion of our first summation, DNA Replication, can be found at

Christopher Larsen,, Phone: 718-777-6558, FAX: 801-459-8850, URL:

Ontologies and the world of Ubiquitin.
Cognia is a bioinformatics company involved in the aggregation of biological proteomics knowledge, and the creation of databases and tools for their protein analysis. We have created a working database regarding protein turnover, and specifically, the ubiquitin system. With the inclusion of over 1200 individual proteins so far, many novel functions and processes have been uncovered, some not yet in the gene ontology. We find that evolution has paralleled many of the originally discovered functions of the ubiquitin system for new and parallel processes. Thus, many of the new enzymology of sister systems to the ubiquitin system are used to promote entirely new effects. Here we present our database, compare the GO usage to our own ontology, and propose new additions to the current GO version. Our current website details the focus of our company, and shows the key personnel involved in the project. Contact Chris Larsen for further details, at

Michael D. Gonzales,, Phone: 505-995-4436, FAX: 505-995-4432, URL:

Interpreting sequence, expression and phenotype data in the context of cellular interaction models.
To facilitate research on biological networks we have developed PathDB 2.0, a curated relational database of information on the mapping or interaction data between cellular building blocks (i.e. genes, proteins and metabolites) in order to build and analyze complex cellular models. This interaction data is derived from the literature, sequence annotation and from analytical methods (i.e. prediction of an ATP binding domain or a regulatory site). The data describing how these building blocks are modified and regulated is also curated. Our content development efforts are currently focused on Arabidopsis, comprised mainly of interactions between proteins and metabolites, as well as yeast which also includes annotation derived from other public databases, protein-protein, protein-DNA and protein-lipid data sets. Our data model includes many features of the Gene Ontology model that allows for easier navigation among biological classes. Information about subcellular location or phenotype information like programmed cell death or disease resistance can be displayed in the context of known biological interactions. Using PathDB, a pathway model can be used to visualize gene, protein and metabolite expression data and its view can then be synchronized with other software programs using NCGR's integration system ISYS. Since our analytical tools will help generate new interactions among building blocks, we allow users the ability to store the information in the database.

Lisa Matthews,, Phone: 978-922-1643, FAX: 978-816-0285, URL:

Integration of the Gene Ontology vocabularies into Incyte's model organism and mammalian proteome databases
The BioKnowledge(r) Library represents a constantly growing collection of searchable databases that integrates knowledge from the research literature with genomic information. The current volumes in the BioKnowledge Library are: YPD for S. cerevisiae, PombePD for S. pombe, MycoPathPD for a collection of human fungal pathogens, WormPD for C. elegans, as well as HumanPSD and GPCR-PD both focusing on human, mouse and rat proteins. Protein report pages are interlinked to allow searching across multiple proteins and species for common protein characteristics. The descriptive terminology of the Gene Ontology Consortium ( has now been integrated into the BioKnowledge Library. Ph.D.-level scientific curators use the Gene Ontology (GO) system of controlled vocabulary to record structured textual annotations. Examples will be shown revealing how curation with GO terms alone creates a detailed picture of protein function, based solely on GO molecular function, biological process and cellular component properties. The information in our databases enables transfer of knowledge about known proteins to unknown but related proteins. The common vocabulary provided by GO can also be used to describe the predicted properties of these uncharacterized proteins. Examples will be shown describing how knowledge incorporated into the BioKnowledge Library combined with GO terms predicts functional features of proteins of interest.

Simon Twigger,, Phone: 414-456-8802, FAX: 414-456-6595, URL:

[Rat Genome Database]
The primary focus of RGD is to aid Rat researchers in their wor[k on] rat as a model organism for human disease. Comparative genomics and the ability to incorporate data from Human and Mouse studies are a key aspect of this focus. To support sequence-based comparative studies we have released VCMap, a dynamic sequence-based homology tool, which enables Rat, Mouse and Human researchers to view mapped genes and sequences and their locations in the other two organisms. Gene Ontology annotations will provide a further layer of annotation to these comparative studies in addition to creating alternative ways to query the database in a more scientifically meaningful fashion. Two different algorithms have been used to obtain initial informatic annotation results of the rat gene data within RGD. The analysis of these results and our plans for the further incorporation of GO into RGD will be presented.

Natalia Maltsev,, Phone: 630-252-5195, FAX: , URL:

In the past decade the scientific community has witnessed a rapi[d increase in the amount] of sequence data and data related to the physiology and biochemistry of organisms. Analysis of the genetic sequences and metabolic processes in phylogenetically diverse set of organisms revealed a substantial degree of similarity between the biochemical pathways in eukaryotic and prokaryotic cells, suggesting their common evolutionary origin. Developing of a dynamic controlled vocabulary of biological function relevant to metabolism and providing a functional context for genomic data is vital to further understanding of biological systems and their evolution. The Computational Biology group at the Mathematics and Computer Science Division of Argonne National Laboratory has substantial expertise in designing integrated systems for sequence analysis and metabolic reconstruction. WIT2 system currently available at Argonne ( contains The General Functional Overview -- a structured dictionary of function that provides a hierarchical representation of major metabolic subsystems in prokaryotic organisms. A talk will concentrate on the possibilities of extending the Gene Ontology approach to the description of metabolic function in prokaryotic organisms. We believe that such development will provide a framework for evolutionary analysis of the biological function and will benefit studies of metabolism both in prokaryotic and eukaryotic organisms.

 last modified October-2003 Report problems with this website to
For problems with Netscape 4, please upgrade to Netscape 7