Postdoctoral Fellow
NESCent Project:
Recent advances in statistical methodology allow phylogeny inference to make use of information in insertions and deletions, and to average over uncertainty in multiple sequence alignments. However, the accuracy of these methods could be improved by including some key features of the biological process that generates insertion and deletion mutations (indels). Two of these features are (I) spatial variation in the rate of insertion and deletion, and (II) higher rates for variation in the number of tandem repeats (VNTR). Ignoring spatial variation in insertion/deletion rates can decrease phylogenetic accuracy because the evidential weight of a shared indel is determined by the local indel rate. Proteins have higher indels rates in regions that are exposed to solvent, and so such indels should be down-weighted relative to indels that occur in the hydrophobic core. Additionally, when handling nearly-neutral sequences such as inter-genic spacers, ignoring VNTR mutations can undermine phylogeny inference by giving shared changes of these types too much weight.
I propose to extend the software BAli-Phy which jointly estimates alignments and phylogenies to handle indel hotspots. I have developed a simple transducer-based model for multiple alignments that allows each column to fall into a fast or slow rate category and clusters fast columns together. I am developing MCMC transition kernels to simultaneously Gibbs sample of alignments and column labels. Additionally, I plan to use importance sampling on posterior samples from BAli-Phy to correctly weight VNTR mutations. I will then estimate indel rate heterogeneity and VNTR rate increase in several data sets.
Improved probabilistic models of insertion/deletion for phylogenetic inference
PI(s): | Benjamin D Redelings |
Start Date: | 1-Sep-2009 |
End Date: | 31-Aug-2012 |
Keywords: | phylogenetics, gene structure and function |
NESCent Project:
Recent advances in statistical methodology allow phylogeny inference to make use of information in insertions and deletions, and to average over uncertainty in multiple sequence alignments. However, the accuracy of these methods could be improved by including some key features of the biological process that generates insertion and deletion mutations (indels). Two of these features are (I) spatial variation in the rate of insertion and deletion, and (II) higher rates for variation in the number of tandem repeats (VNTR). Ignoring spatial variation in insertion/deletion rates can decrease phylogenetic accuracy because the evidential weight of a shared indel is determined by the local indel rate. Proteins have higher indels rates in regions that are exposed to solvent, and so such indels should be down-weighted relative to indels that occur in the hydrophobic core. Additionally, when handling nearly-neutral sequences such as inter-genic spacers, ignoring VNTR mutations can undermine phylogeny inference by giving shared changes of these types too much weight.
I propose to extend the software BAli-Phy which jointly estimates alignments and phylogenies to handle indel hotspots. I have developed a simple transducer-based model for multiple alignments that allows each column to fall into a fast or slow rate category and clusters fast columns together. I am developing MCMC transition kernels to simultaneously Gibbs sample of alignments and column labels. Additionally, I plan to use importance sampling on posterior samples from BAli-Phy to correctly weight VNTR mutations. I will then estimate indel rate heterogeneity and VNTR rate increase in several data sets.
Related products
Software and DatasetsPublications- A new method for identifying exceptional phenotypic diversification Revell, L. J., D. L. Mahler, P. R. Peres-Neto, and B. D. Redelings. In press. A new method for identifying exceptional phenotypic diversification. Evolution.
- Comparative Genomics of Duplicate γ-Glutamyl Transferase Genes in Teleosts: Medaka (Oryzias latipes), Stickleback (Gasterosteus aculeatus), Green Spotted Pufferfish (Tetraodon nigroviridis), Fugu (Takifugu rubripes), and Zebrafish (Danio rerio) Law, S. H. W., B. D. Redelings and S. W. Kullman. In press. Comparative Genomics of Duplicate γ-Glutamyl Transferase Genes in Teleosts: Medaka (Oryzias latipes), Stickleback (Gasterosteus aculeatus), Green Spotted Pufferfish (Tetraodon nigroviridis), Fugu (Takifugu rubripes), and Zebrafish (Danio rerio). JEZ part B.
- Align, or not to align? Resolving species complexes within the Caloplaca saxicola group as a case study E Gaya, BD Redelings, P Navarro-Rosines, Xavier Llimona, Miquel De Cáceres, and Francois M. Lutzoni. 2010. Align, or not to align? Resolving species complexes within the Caloplaca saxicola group as a case study. Mycologia. DOI: 10.3852/10-120