-------------------------------------------- TIGR Arabidopsis thaliana README The Institute for Genomic Research 9712 Medical Center Drive Rockville MD 20850 USA ============================================ Table of Contents -------------------------------------------- 1. Statistics 2. Methodology 3. File format 4. Data Access 5. Contacts ============================================ ------------------------------------------------------------------------------ 1. Information/Statistics ============================================ Arabidopsis thaliana has 29,085 genes, encoding 27,288 gene products. 5881 gene products have gene ontology terms assigned (as of October 8, 2002) by TIGR. TAIR has separately assigned GO terms to Arabidopsis. 2. Methodology ============== Each gene product is examined by an annotator. Terms are assigned by examining the literature, and translating the experimental information contained in experimental results into the appropriate GO terms. Other gene products are compared by sequence similarity, phylogeny and family membership to characterized gene products, and are assigned GO terms based upon the quality of the matches. 3. File format ============== The file format is based on the the guidelines provided by the Gene Ontology Consortium (www.geneontology.org) with several mij 1. DB Database from which annotated entry has been taken. Here: TIGR_Ath1 is used to signify the database at TIGR where the data are stored. (http://www.tigr.org) 2. DB_Object_ID A unique identifier in the DB is normally used for the item being annotated. Here it is a TIGR Arabidopsis public locus identifier. Useful for pulling the sequence out of the sequence file at ftp://ftp.tigr.org/pub/data/a_thaliana/ 3. DB_Object_Symbol A unique identifier (locus) of a form that identifies the BAC and the order of gene on the BAC. 4. NOT Here: Currently not applicable, always empty. 5. GOid The GO identifier for the term attributed to the DB_Object_ID. Example: GO:0005625 6. DB:Reference Reference cited to support the attribution. In the case of 'Inferred from Sequence Similarity' (ISS) evidence, the reference is usually 'TIGR_Ath1:annotation', which is defined as follows: author: TIGR Arabidopsis annotation team name: TIGR annotation based upon multiple sources of similarity evidence description: TIGR_Ath1:annotation denotes a curator's interpretation of a combination of evidence. Our internal software tools present us with a great deal of evidence based domains, sequence similarities, signal sequences, paralogous proteins, etc. The curator interprets the body of evidence to make a decision about a GO assignment when an external reference is not available. The curator places one or more accessions that informed the decision in the "with" field." What this says is that we have used many sequence similarity hits, etc., to make our decision. However, we choose only 1-3 pieces of information as "with" information, as it is not practical to enter and submit many entries for each annotation. We also have internal calculations of paralogy and new domains we are identifying which have not yet been published, but which help inform our decisions. 7. Evidence A code that refers to the type of evidence cited. For TIGR Arabidopsis GO annotations, these contain experimental codes and ISS, but not IEA to date. 8. With|From For ISS, IPI and IGI, this field is usually populated. See the document at http://www.geneontology.org/doc/GO.evidence.html 9. Aspect One of the three ontologies: P (biological process), F (molecular function) or C (cellular component). Example: P 10. DB_Object_Name The common name of the gene product. Example: DEAD/DEAH box helicase carpel factory (CAF) 11. Synonym Here: Currently not applicable, always empty. 12. DB_Object_Type What kind of entity is being annotated. Here: 'protein' or 'gene product'. Example: protein 13. Taxon_ID Identifier for the species being annotated. Here: taxon:37546 14. Date Date that the analysis was performed 4. Data Access ============== All sequences are available from ftp://ftp.tigr.org/pub/data/a_thaliana/ The FASTA file header contains the locus identifier to which GO terms were assigned. 5. Contacts =========== Linda Hannick (lhannick@tigr.org)