Catalysis Meeting

SimBank: Planning population genetic simulations to test statistical genetics software

PI(s): Michael Whitlock (University of British Columbia)
Victoria Sork (UCLA)
Matthew Hahn (Indiana University)
Start Date: 15-Nov-2013
End Date: 30-Nov-2014
Keywords: population genetics, landscape ecology, software

This catalysis group will plan the implementation of SimBank, a large, openly-available series of population genetic landscape simulations, intended for easy testing and validation of statistical genetics methods and based on realistic scenarios from natural populations across a range of taxa.
Genetic and genomic data allow us to estimate numerous biological parameters through statistical genetics techniques. However, these techniques necessarily make many assumptions that do not match biology. As a result, the value of these statistical approaches may depend on the biological details of the evolutionary and demographic history of the populations being studied. Statistical genetics techniques need to be better tested and validated than they currently are, and the best way to do this is by comparisons to genetic simulation of biologically reasonable situations.
This catalysis group will plan for the creation of a test bank of simulated genomic data. We will create a core list of biological scenarios that can test a wide variety of statistical methods, over a range of assumptions about evolutionary history, demography, and genetic details.
Creating such simulations is non-trivial, because of the necessity of coding a variety of scenarios and the processor time required to do large-scale simulations. However, many types of statistical genetics techniques can be tested on a common set of simulations. In this way, we can share processor time and discuss as a community what issues are most important to cover.
The group will mix statisticians and programmers with empirical biologists, with expertise in evolution, landscape ecology, and geospatial pattern analysis.