Alignment of biological sequence pairs using two PHMMs
The pair hidden Markov model (PHMM) has been widely exposed in the literature as a useful device for aligning biological sequences. It is commonly used in some recursive algorithm that sums the contributions made to the likelihood function by all the possible alignments. This type of summation ensures that the surface of the function across the parameter space is continuously differentiable. Hence, maximum likelihood estimators with sensible confidence intervals can be found even for a complex function using a numerical optimisation method. Our approach to aligning biological sequences is motivated by the fact that hydrophilic patches tend to occur in loop regions of the protein. Within patches, substitutions have little or no effect on phenotype. Accordingly, the influence of natural selection is weaker inside these regions than it is outside. Thus we employ two PHMMs in our pairwise alignment method, and measure the effects on the alignment from each of the three main parameters: hydrophilicity, branch length, and indels, across the conserved and non-conserved regions of the protein sequence pair. In this way we aim to indirectly measure the influence of natural selection on the protein. To test our method, we constructed a random sample of 120 phylogenetically independent pairs taken from the BAliBASE database. We sampled across four bins based on the evolutionary distance between each pair starting from 0.25 - 0.50 (first bin) to 1.00 - 1.25 (fourth bin). Runs carried out prior to time of writing show that our estimators across the two regions are highly unstable across all the four bins. For almost every pair, the branch length estimator corresponding to one region was too small and the other too large. Hydrophilicity estimators reached either the upper or lower bound in almost every case when the hydrophilicity parameter was set to vary across the two regions. Estimators for the indel rate parameter were generally stable across the four bins, while the frequency of unstable estimators for the indel length parameter increased with increase in evolutionary distance. We are presently reassessing the specification of our PHMMs, and outcomes from this work will be presented.