------------------------------------------------------------ 06/10/2004 bhaas ** Please note that the file datestamps may differ from the actual version 5 release date. Also, the directory structure has been refined due to some IT-related issues. The data remains unchanged and corresponds to the original Jan, 2004 release 5 data set. -------------------------------------------------------------- TIGR releases latest Arabidopsis Annotation (Version 5.0) Date: Thu 29 Jan 2004 - 03:45:32 GMT We are pleased to announce to the plant community that TIGR has released its final contribution to the Arabidopsis genome reannotation, ATH1 version 5.0. This has been provided to NCBI and to TAIR. Both the sequence and the annotation data are available from the TIGR ftp site at: ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/ The PSEUDOMOLECULES directory contains the fully annotated chromosome sequences in XML format along with associated data. The SEQUENCES directory contains just the sequence files for chromosomes, CDS, proteins, etc. TIGR Annotation for individual BACs in XML format is in the BACS directory. Release 5.0 provides the annotation for: 26,207 protein-coding genes and 3,786 pseudogenes. Nearly 15,000 of the protein-coding gene structures are supported by full-length cDNA alignments. Also, this release provides our most comprehensive annotation for alternative splicing isoforms in the Arabidopsis genome: just over 2,000 genes are annotated with alternatively spliced transcripts based on EST and full-length cDNA alignments. During the course of this 3 year process, we have re-visited both the structural and functional annotations of essentially every gene in the genome. Over 18,000 transcripts are annotated with upstream and/or downstream untranslated region (UTR) annotations. In the SEQUENCES directory, the ATH1.cdna fasta file contains the tentative cDNA sequences for protein-coding genes with UTR regions provided in lowercase and the protein-coding region (CDS) provided in uppercase. The following TIGR web resources have been recently updated using the release v5.0 annotation data: -Genes found within segmentally duplicated regions of the Arabidopsis genome http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml -Genes found within tandem gene duplications: http://www.tigr.org/tdb/e2k1/ath1/TandemDups/TandemGenes.html -User-customizable interface that permits retrieval of sets of Arabidopsis genes based on the type of physical evidence support, including matches to other Arabidopsis proteins, non-Arabidopsis proteins, Arabidopsis ESTs, other plant ESTs, and matches to full-length cDNAs. http://www.tigr.org/tigr-scripts/e2k1/arab_gene_phys_ev_classification.cgi Additional resources and links to the pages above can be found from our TIGR Arabidopsis Annotation Database (ATH1) page at: http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml We thank TAIR, MIPS and individual community members for their input, assistance and feedback over the past three years. We believe that Arabidopsis is one of the most completely sequenced and best annotated genomes at this time and hope that these latest annotation improvements and updated resources will be useful to the community. On behalf of all the users of the Arabidopsis genome sequence and annotation, we would like to thank NSF for their continuous and generous support of this project. At this point, TAIR will assume responsibility for future public releases of updated annotation. The ATH1 database, more or less in its present form, will remain accessible at TIGR for the foreseeable future. Please send comments or questions to arabhelp@tigr.org