Functional conserved non-coding sequences add a new dimension to candidate disease regions of the human genome
Our recent comparative analyses have identified thousands of non-coding regions that are conserved between the human and pufferfish genomes; an evolutionary distance of 900 million years. The evolutionary constraint upon such regions, which are not found in invertebrate genomes, suggests that they play fundamental roles in defining the vertebrate lineage. Our highly effective functional assay in zebrafish has established that the majority of these conserved non-coding elements (CNEs) are able to up-regulate a GFP reporter gene in a tissue specific manner. That the CNEs are largely distributed around developmental genes is significant when considering developmental anomalies and disease implications. Significantly, CNEs can act over long distances and do not necessarily act upon the nearest gene. Historically, disease association has been focused on the coding region of a candidate gene. The discovery of large repertoires of functional non-coding elements around many developmental genes, permits analysis of complete regulatory landscapes of the genome.
We are now accumulating functional profiles of individual CNEs and compiling a database as a valuable public resource for bioinformatic interrogation. Common profiles are being used in order to decipher sequence language relating to discrete enhancer activity. In relation to this, we are studying duplicate genes and their associated CNEs in order to investigate neo- or sub- functionalisation of regulatory regions. With the use of recombineering, we are able to analyse sections of the genome so that we can observe the function of the CNEs in context, and then investigate the impact of deleting one or more CNE(s) from the region. At a more global level we are mapping CNEs in relation to breakpoints in human chromosomal rearrangements, so that we may in include these data for disease consideration.