The TIGR XML Frequently Asked Questions with Answers:

1. Where can I learn about the TIGR XML format and corresponding data models?

:The tigrxml.dtd provides both the specification for the XML files and describes in considerable detail where and how the annotation data is provided and the basis for the underlying data structure. Illustrations are included to faciliate comprehension.

2. How do I identify the alternative splicing variants?

:A gene (<TU>) which has annotated alternative splicing isoforms will contain multiple children <MODEL> elements, each model corresponding to a different splicing isoform.

3. How can alternative splicing variants encode identical proteins?

:In many cases, the splicing variations are restricted to untranslated regions of the gene. In these cases, the differences will appear within the <EXON> and <UTR> elements, although the <CDS> elements will appear identically, hence identical protein sequences.

4. How can I identify the annotated gene models which are supported by full-length cDNA sequences?

:The EST and full-length cDNA accessions that support the structures for gene annotations can be identified by the <CDNA_SUPPORT> elements which are children to the <MODEL> elements. The IS_FLI attribute differentiates between the FL-cDNAs (value: 1) and the ESTs (value: 0).

5. How can the annotated untranslated regions be identified?

:The coordinates of untranslated regions (UTRs) are explicitly provided in the XML formatted data set. The untranslated regions of each annotated gene model can be easily identified under the relevant EXON element as <UTR> elements. This is further described in the tigrxml.dtd.