Regions transcribed in the human germline have characteristic nucleotide substitution patterns
Transcription-coupled repair occurs on the transcribed strand of active genes and is thought to result in a prevalence of T over A and G over C nucleotides on the coding strand of sequences transcribed in germline cells. The extent of compositional asymmetry between DNA strands appears related to the extent of germline transcription, however replication is also believed to contribute to compositional asymmetry. We aimed to exploit neighbouring nucleotide influences on nucleotide substitutional tendencies to clarify the contribution of transcription to strand compositional asymmetry in the human genome. Using alignments of human, mouse and rat sequences, we have contrasted the substitution patterns of a set of genes known to be transcribed in the human germline and a set of putatively non-transcribed regions. The rate of substitution was estimated for each combination of dinucleotide motifs differing by a single nucleotide. Asymmetry between the rates of dinucleotide substitutions and their complementary substitutions was detected in both putatively non-transcribed regions and transcribed regions. Considerably more variation in substitution rate asymmetry between different dinucleotide-complement pairs was evident in the transcribed dataset. A “signature” set of substitutions characteristic of our transcribed dataset was identified. For these substitutions, the member of a dinucleotide-complement pair estimated to have the faster substitution rate on the transcribed strand was generally consistent for each gene in the transcribed dataset. The signature substitution pattern of transcription we have identified potentially provides a means of predicting sequences transcribed in the germline and of evaluating the contribution of transcription-associated processes towards generating heritable, disease-causing mutations.