Postdoctoral Fellow

A phylogenetic database for comparative biology in land plants

PI(s): Gordon Burleigh
Start Date: 1-Nov-2006
End Date: 15-Aug-2008
Keywords: phylogenetics, database, comparative methods

The rapid growth of available sequence data provides unprecedented opportunities for building large, well-supported phylogenies. These comprehensive phylogenies can be an invaluable resource for comparative biology or examining macroevolutionary patterns and processes. Yet, the published phylogenetic trees represent a small fraction of available sequence data and taxonomic coverage in GenBank. The goal of this study is to a build database that represents a clade-based organization of phylogenetic information for land plants in GenBank. This will be accomplished by organizing the DNA sequence data from GenBank for land plants into alignments of all potentially phylogenetically informative clusters of homologous sequences. These clusters will be filtered to remove paralogous sequences, and then the putative clusters of orthologs will be combined into supermatrices in order to build the largest possible phylogenetic trees of land plants. The database will allow comparative biologists to easily access benchmark sequence alignments for all phylogenetically informative genes available for taxa in a clade as well as the most comprehensive phylogenetic trees available for a clade. The sequence and tree databases will be designed so that comparative biologists can easily integrate their own data sets of morphological characters, natural history, or geographic distributions with the available sequence data from GenBank or large phylogenetic trees for large-scale macroevolutionary analyses.

