Long-term Sabbatical
The evolutionary selection forces acting on a protein are commonly inferred using probabilistic codon models, by contrasting the rate of nonsynonymous to synonymous substitutions. Most of these models assume that the synonymous rate is homogenous across all sites, and thus it serves as a baseline to compare with the nonsynonymous rate. However, other and we have previously shown that synonymous substitution rates may vary substantially among sites, reflecting constraints acting at the mRNA and DNA levels. These observations led to the understanding that protein evolution is characterized by many layers of mutation and selection pressures. I intend to develop novel evolutionary models that distinguish between mutation pressure, selection at the mRNA and DNA level, and selection at the amino-acid level. These models will be tested on a large set of protein coding genes as well as by extensive simulation studies. Our preliminary results show that the suggested models fit vertebrate coding sequences significantly better than commonly used codon models, in which the multi-layer selection forces are ignored. Furthermore, a preliminary genomic screen shows that for vertebrates, accounting for variability of selection at the DNA and mRNA level substantially reduces the number of genes predicted to evolve under positive selection. The novel methods suggested in this proposal should provide a robust approach to detect extremely conserved regions within protein coding genes, as well as to infer sites and lineages experiencing adaptive evolution, a central task in today's genomic analyses.
Evolutionary models accounting for multi-layer selection pressures
PI(s): | Tal Pupko (Tel-Aviv University) |
Start Date: | 1-Sep-2010 |
End Date: | 31-Aug-2011 |
Keywords: |
The evolutionary selection forces acting on a protein are commonly inferred using probabilistic codon models, by contrasting the rate of nonsynonymous to synonymous substitutions. Most of these models assume that the synonymous rate is homogenous across all sites, and thus it serves as a baseline to compare with the nonsynonymous rate. However, other and we have previously shown that synonymous substitution rates may vary substantially among sites, reflecting constraints acting at the mRNA and DNA levels. These observations led to the understanding that protein evolution is characterized by many layers of mutation and selection pressures. I intend to develop novel evolutionary models that distinguish between mutation pressure, selection at the mRNA and DNA level, and selection at the amino-acid level. These models will be tested on a large set of protein coding genes as well as by extensive simulation studies. Our preliminary results show that the suggested models fit vertebrate coding sequences significantly better than commonly used codon models, in which the multi-layer selection forces are ignored. Furthermore, a preliminary genomic screen shows that for vertebrates, accounting for variability of selection at the DNA and mRNA level substantially reduces the number of genes predicted to evolve under positive selection. The novel methods suggested in this proposal should provide a robust approach to detect extremely conserved regions within protein coding genes, as well as to infer sites and lineages experiencing adaptive evolution, a central task in today's genomic analyses.
Related products
Publications- A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis Burstein, D., Gould, S.B., Zimorski, V., Kloesges, T., Kiosse, F., Major, P., Martin, W.F., Pupko, T., & Dagan, T. (2012). A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis. Eukaryotic Cell, 11(2), 217-228. doi: 10.1128/EC.05225-11
- Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons Gelfman, S., Burstein, D., Penn, O., Savchenko, A., Amit, M., Schwartz, S., Pupko, T., Ast, G. (2012). Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Research, 22(1) 35-50. doi: 10.1101/gr.119834.110
- Emergence of an HIV-1 cluster harbouring the major protease L90M mutation among treatment-naïve patients in Tel Aviv, Israel Turner, D., Amit, S., Chalom, S., Penn, O., Pupko, T., Katchman, E., Matus, N., Tellio, H., Katzir, M. and Avidor, B. (2012), Emergence of an HIV-1 cluster harbouring the major protease L90M mutation among treatment-naïve patients in Tel Aviv, Israel. HIV Medicine, 13: 202â206. doi: 10.1111/j.1468-1293.2011.00960.x
- Inference of gain and loss events from phyletic patterns using stochastic mapping and maximum parsimony: A simulation study. Cohen, O., Pupko, T. (2011) Inference of gain and loss events from phyletic patterns using stochastic mapping and maximum parsimony: A simulation study. Genome Biology & Evolution, 3, 1265-1275. doi: 10.1093/gbe/evr101
- The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer Cohen, O., Gophna, U., Pupko, T. (2011). The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer. Molecular Biology and Evolution, 28(4), 1481-1489. doi: 10.1093/molbev/msq333
- Evolution after gene duplication Pupko, Tal 2011 Evolution after gene duplication Trends in Evolutionary Biology 10.4081/eb.2011.e1
- Improving the performance of positive selection inference by filtering unreliable alignment regions Eyal Privman, Osnat Penn, and Tal Pupko 2011 Improving the performance of positive selection inference by filtering unreliable alignment regions Mol Biol Evol doi:10.1093/molbev/msr177
- Native homing endonucleases can target conserved genes in humans and in animal models A. Barzel, E. Privman, M. Peeri, A. Naor, E. Shachar, D. Burstein, R. Lazary, U. Gophna, T. Pupko and M. Kupiec 2011 Native homing endonucleases can target conserved genes in humans and in animal models, Nucleic Acids Research, volume 39, issue 15, pp. 6646-6659
- GLOOME: gain loss mapping engine Cohen, O., Ashkenazy, H., Belinky, F., Huchon, D., Pupko, T. (2010). GLOOME: gain loss mapping engine. Bioinformatics, 26(22), 2914-2915. doi: 10.1093/bioinformatics/btq549