The human genome structural variation project
Most studies of human genetic variation are focused on understanding the pattern and nature of single-nucleotide differences within “unique” regions of the genome. To complete our understanding of the full spectrum of variation, we recently initiated a project to catalogue all structural variation (>8 kb) in 10 individuals using a fosmid paired-end sequence analysis approach. Based on an analysis of a single individual, we recently identified 295 putative inversions, deletions and insertions by selecting clusters of fosmids that were discordant by length or orientation when mapped to the human reference genome. Complete sequencing of corresponding fosmids from these regions confirmed 84% (165/198) structural variants providing insight into the underlying molecular mechanisms for these changes. Nine additional individuals have been selected for this analysis based on the greatest within-population genetic diversity from the HapMap collection (5 Yoruban Nigerians, 2 Asian and 2 Europeans). Preliminary analysis of three individuals suggest that this form of variation is common among humans, and preferentially occurs within duplicated and specific gene-rich regions of the genome. We anticipate that the HGSV will provide the complete characterization of ~2000 structural variants which will include 90% of common variants with >5% frequency. The nature and pattern of such variation will likely be an important consideration in genetic association studies of human disease.