2024 W. Main Street, Suite A200
Durham, NC 27705-4667
Tel: (919) 668-4551
Fax: (919) 668-9198

[Intro] [Prerequisites] [Syllabus] [Schedule] [People] [Logistics] [Apply]

Computational Phyloinformatics

A Course at NESCent

Phylogenetics is key to studying evolution, systematics, comparative genomics, and bioinformatics — phylogenies are now ubiquitous in the biological literature. However, with the growth in computational power and DNA sequencing, and with ever more complex substitution models and analytical methods, it is less and less practical to run simple, one-shot analyses on a personal computer with an off-the-shelf program. As a result, we increasingly rely on custom-scripted analyses or custom-designed computational pipelines, and often on large compute machines or clusters. While books and courses on phylogenetics are common, it is harder to find information on how to script large-scale and complex analyses, or how to write your own scripts and programs.

This course aims to address this gap by introducing these skills in a practical, hands-on setting at NESCent. In comparison to 2007, when the course was first held, this year's course shifts even more emphasis to hands-on training in Perl, Java, and R, three of the most popular languages in scripting and programming phylogenetic or comparative analysis workflows. The course is also restructured to allow students with less prior programming exposure to fully benefit from the material being taught.

The course is divided into three parts:

  • Part I: Students bring their programming skills up to par with a review and tutorial in either Perl, Java, or R (while optional, this part is strongly recommended).
  • Part II: Students have the choice of pursuing (1) a Perl track, with focus on BioPerl and Bio::Phylo; (2) a Java track, with focus on how to write a Mesquite module and program workflows that utilize Mesquite modules; or (3) an R track, with a focus on programming using phylogenetic libraries Ape, Ouch, and Phylobase.
  • Part III: Students choose between (1) an SQL track (with focus on BioSQL and querying tree topologies); (2) a HyPhy track (with focus on scripting hypothesis testing in a phylogenetic framework); or (3) an advanced R track (with focus on automating analyses using vectorized calculations, advanced plotting and animations, and the R-LaTex document system using Sweave).

Regardless of which tracks are chosen, students will learn how to write basic phylogenetic or comparative analysis scripts: parsing NEXUS files; traversing and computing over trees; and making practical use of phylogenetic libraries. These skills will be learned in a biological context, touching on a diverse array of topics (depending on the track) such as automated base calling, ancestral state and continuous character reconstruction, model selection, parametric bootstrapping, etc.


When: July 24 - August 4 2008

National Evolutionary Synthesis Center

2024 W. Main Street, Suite A200

Durham NC 27705

Application Deadline: April 22, 2008
Target Students: The course is intended for graduate students, post-docs, and reserachers in biology who are interested in developing skills in phyloinformatics
Prerequisites: Biology: A solid understanding of phylogenetics — for example, having already taken the Workshop on Molecular Evolution or equivalent coursework or experience.
  Computing: Prior experience with either perl, java, or R; or having studied the books the books we recommend on these languages; and prior experience with basic operations in unix. We will offer two days of review to bring everyone up to speed, but the onus is on the students to have studied these languages ahead of time.