Islands of euchromatic-like sequence and expressed genes within the short arm of HSA21: sequence and copy number variability
Although the sequence of the euchromatic portion of the human genome is essentially complete, the heterochromatic regions remain unknown. These regions include the short arms of the acrocentric chromosomes. Among these, the short arm of HSA21 (21p) has special significance because of its involvement in translocations resulting in trisomy21. We constructed a BAC library from the human-mouse somatic hybrid cell line WAV17, monoallelic for HSA21. We generated 1.3Mb of 21p sequence from 8 BACs. Surprisingly, 21p contains islands of sequence showing euchromatic-like features with an interspersed repeat content similar to that found on 21q. In silico and EST-based predictions identified 29 gene models, a third of which were shown to correspond to bona-fide genes by RT-PCR in 24 human tissues. We mapped the 5' ends of these transcripts by RACE and defined their structures. Analysis of these transcripts in different individuals shows extensive nucleotide variability and alternative spliced isoforms among different tissues suggesting multiple inter- and intrachromosomal copies. Moreover they map to the short arms of multiple acrocentrics as determined with monochromosomal cell hybrids. Quantification of their copy number by qPCR suggests that they are present in 4-50 copies in the human genome. Since the gene content of the heterochromatic regions of the genome appears to be underestimated, more efforts should be made towards the characterization of these unexplored regions. For this goal we have end-sequenced the entire BACs WAV-17 library and selected 47 clones that do not correspond to 21q for further sequencing.