Abstract for presentation at 11th International Congress of Human Genetics

Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions

  • France Denoeud, IMIM, Barcelona, Spain
  • Philipp Kapranov, Affymetrix Inc., Santa Clara, California, United States
  • Catherine Ucla, University of Geneva, Switzerland
  • Adam Frankish, Wellcome Trust Sanger Institute, United Kingdom
  • Robert Castelo, IMIM, Spain
  • Jorg Drenkow, Affymetriz Inc.,, United States
  • Julien Lagarde, IMIM, Spain
  • Caroline Manzano, University of Geneva, Switzerland
  • Jacqueline Chrast, University of Lausanne, Switzerland
  • Sujit Dike, Affymetrix, Inc, United States
  • Carine Wyss, University of Geneva, Switzerland
  • Charlotte Henrichsen, University of Lausanne, Switzerland
  • Jennifer Harrow, Wellcome Trust Sanger Institute, United Kingdom
  • Nancy Holroyd, University of Lausanne, Switzerland
  • Mark Dickson, Stanford University School of Medicine, United States
  • Ruth Taylor, Wellcome Trust Genome Campus; Hinxton, Cambridgeshire, United Kingdom
  • Zahra Hance, Wellcome Trust Genome Campus; Hinxton, Cambridgeshire, United Kingdom
  • Richard Myers, Stanford University, United States
  • Jane Rogers, Wellcome Trust Sanger Institute, United Kingdom
  • Tim Hubbard, Wellcome Trust Sanger Institute, United Kingdom
  • Roderic Guigo, IMIM, Barcelona, Spain
  • Tom Gingeras, Affymetrix Inc., Santa Clara, California, United States
  • Stylianos Antonarakis, University of Geneva, Switzerland
  • Alexandre Reymond, Center for Integrative Genomics, University of Lausanne, Switzerland
  • This report presents results of a systematic empirical annotation of mRNAs products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5'Rapid Amplification of cDNA Ends (RACEs) and high-density resolution tiling arrays. RACE allows detection of low copy number transcripts/isoforms and a high-resolution analysis of genes individually, while pooling strategies and array hybridization permit to reach high-throughput readout. We identified previously unannotated and often tissue/cell line specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). These novel exons have lower GC contents than those of annotated exons. Notably, more than 50% of the novel transcripts resulting from inclusion of novel exons have changes in their open reading frames. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results might revise our current understanding of the architecture of protein-coding genes. They have significant implications for our views on locations of regulatory regions in the genome and for the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "non-coding" ultimately relating to the identification of disease-related sequence alterations.

    Conference Organiser - ICMS Pty Ltd