Working Group
The potential for synthetic research based on aggregating, integrating, and re-using data is enormous, yet most resources remain interoperable. To realize this potential, software and databases that handle evolutionary trees (and their associated annotations) must be interoperable. Interoperability, in turn, requires tools based on common standards. In the past few years, evolutionary informaticists, with help from NESCent, have been building a software toolbox for solving interoperability problems, based on the EvoIO “stack” of NeXML, CDAO and PhyloWS. This toolbox makes it possible to begin building a worldwide network of interoperable evolutionary resources. The HIP (Hackathons, Interoperability, Phylogenies) aims to use the hackathon mechanism (which we have helped to develop at NESCent) to grow this network directly, by adding links to it, and indirectly, by creating examples for others to follow. To support this project within a working-group budget, we leverage support from strategic partners. Each of the planned series of 3 hackathons will bring together scientific programmers with related challenges. The hackathons target early-career scientists, who often have the most technical expertise and the most potential to pass along their skills and enthusiasm.
HIP: Hackathons, Interoperability, Phylogenies
PI(s): | Arlin Stoltzfus (Center for Advanced Research in Biotechnology) Enrico Pontelli (New Mexico State University) Rutger Vos (University of Reading) |
Start Date: | 1-May-2011 |
End Date: | 30-Apr-2013 |
Keywords: | database, software, meta-analysis, phylogenetics, biodiversity |
The potential for synthetic research based on aggregating, integrating, and re-using data is enormous, yet most resources remain interoperable. To realize this potential, software and databases that handle evolutionary trees (and their associated annotations) must be interoperable. Interoperability, in turn, requires tools based on common standards. In the past few years, evolutionary informaticists, with help from NESCent, have been building a software toolbox for solving interoperability problems, based on the EvoIO “stack” of NeXML, CDAO and PhyloWS. This toolbox makes it possible to begin building a worldwide network of interoperable evolutionary resources. The HIP (Hackathons, Interoperability, Phylogenies) aims to use the hackathon mechanism (which we have helped to develop at NESCent) to grow this network directly, by adding links to it, and indirectly, by creating examples for others to follow. To support this project within a working-group budget, we leverage support from strategic partners. Each of the planned series of 3 hackathons will bring together scientific programmers with related challenges. The hackathons target early-career scientists, who often have the most technical expertise and the most potential to pass along their skills and enthusiasm.
Related products
Software and Datasets- Vaidya, G. 2013. Species Autocomplete. This tool provides basic Javascript and PHP code to run an autocompletion script. An included Perl script allows the ITIS-DwCA resource to be used to generate the autocompletion database (in SQLite). Further improvements may allow this software program to be integrated into websites which would like to validate the input of scientific names, either for data entry or to power searches.
- The TNRS group contacted David Mitchell, ITIS Data Specialist at the NMNH, who gave us permission to make the entire ITIS database available as a DarwinCore Archive. This file was generated by dwca-hunter (https://github.com/GlobalNamesArchitecture/dwca-hunter), a Ruby program developed by the Global Names Architecture. I wrote a small script to automatically organize dwca-hunter's results, and have been running it regularly and uploading the results to http://gaurav.github.com/itis-dwca/.
- A major upgrade to this well-used tool. Phylomatic version 3 is a fork of the phylocom phylomatic code and exists only as a web service. I can now: read in trees as NeXML and CDAO, graft in taxa to the megatree, and write out in a number of formats.
- A PoC javascript framework that invokes the right Phylotastic REST services at the right time.
- The Newick to CDAO ingestor is a Perl module. The module takes as input a tree described in Newick format and produces a CDAO representation of the tree. The module is capable of contacting the Phylotastic TNRS to resolve names and adds the result of the name resolution to the CDAO representation of the tree.
- To provide end-users with a familiar graphical user interface with which to access PhyloTastic services I have developed several wrapper classes that enable interaction with TNRS, DateLife, BabelPhysh and pruning functionality within the Galaxy web application. A demo Galaxy instance is available at http://galaxy.phylotastic.org, the source code at https://github.com/phylotastic/arch-galaxy and a screen cast that demonstrates the currently available functionality is available at http://youtu.be/kMME658xOu4
- Jim Balhoff, Karen A. Cranston, Mark T. Holder, Hilmar Lapp, Emily J. McTavish, and Enrico Pontelli. June, 2012. PhylotasticTreeStore. PhylotasticTreeStore RESTful adaptor for RDF-based tree store. It is a web2py application, which provides a RESTful interface by translating queries for trees into SPARQL queries to a triple store and then using DendroPy to translate the resulting RDF to NeXML. Built at the Phylotastic hackathon. Christopher Baron, Jeet Sukumaran, and Cam Webb provided helpful feedback.
- The CDAO Comparative Data Analysis Ontology was revised to meet OBO library ontology standards, such as numeric class identifiers.
- Lapp H. 2012. Ontology and RDF model for Taxonomic Name Resolution Service (TNRS) results. The ontology describes the entities that make up a TNRS result and the relationship between those and those between an OTU and a TNRS resolution result. The RDF model is accompanied by an instance document and a graph visualization.
- Rutger Vos, 2012. A phylotastic pruning service based on MapReduce. HIP working group of NESCent. This pruner was developed to provide automated pruning services, as part of the Phylotastic project. Given a set { S } of OTU names, and the name of a source tree, the pruner returns a topology for the OTUs that it can match from { S }. This kind of pruning can be done by recursive calls into a database (which probably would need to hit the database many times) or by loading the whole tree into memory (which might take a while to read in the file, and cost a bit of memory). The way it is done here is much cooler, because it never requires the whole tree to be in memory or in a database: the pruning is done in parallel using MapReduce. Some tests on the entire dump of the Tree of Life Web Project showed that this returns a pruned subtree within a few seconds, fast enough for a web service. The pruner has two interfaces, a web forms interface (with explanatory text and examples) and a web-services interface. The code and some documentation is available at a location indicated on the web page.
- A proof-of-concept SADI-based web service which uses RDF and SPARQL to return subtrees from larger phylogenetic trees.
- Taxosaurus is a meta TNRS that implements the TNRastic API. It's composed by 2 main modules (the handler_library and the processor) that sit behind an http handler. The handler_library implements the TNRastic API whereas the processor coordinates the execution of the downstream calls to the sources. The processor itself has a modular design that allows the addition of new service via adaptors that are registered through a simple JSON description.
- The TNRastic API is a lightweight RESTful API specification that provides a generalized framework to access Taxonomic Name Resolution Services. It's composed of a set of services that are essential for name resolution.
- The Perl controller coordinates stub CGI implementations of the Phylotastic TNRS, tree store, topology, and branch length services which produce correct output for one example input. However, the user may substitute real service implementations into the controller workflow via CGI parameters, allowing the services to be tested for conformance to the Phylotastic specification. Usage instructions for the CGI controller, along with example input/output files for the stub services, are provided at https://github.com/phylotastic/cgi.
- Reconcili-o-tastic starts with a gene tree, discovers the species sources, gets a tree for the species on the fly (phylotastically), then runs reconciliation software to identify which branchings represent speciations vs duplications.
- Midford, P. E. 2012. Mesquite-o-tastic - a Mesquite package for retrieving trees from Phylotastic. This is a prototype package that allows a user to retrieve a tree from phylotastic that matches the taxa present in a Mesquite character matrix. See the demo video by Arlin Stoltzfus at http://www.youtube.com/watch?v=Lak-zjwFuhQ&feature=youtube_gdata_player
- The phylogeny from Goloboff, et al. "Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups" (Cladistics 2009, 25:211-230) is a valuable resource, useful for the Phylotastic project. However, it is available only in TNT format, a nested-parenthesis format like Newick. The tree file has only numeric codes, with names encoded in a separate file. Converting this information into a single Newick tree is a 3-step process: 1. convert the TNT trees (in any of the *.tre files) to newick 2. pick the tree you want and put it in a file by itself 3. replace the numeric codes in the tree with species names from Taxon_Names_Only.tnt I developed and tested Perl scripts for steps 1 and 3. The scripts contain documentation using POD. They are available from a public repository using the URL below.
- Landing page for the Phylotastic project
- Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient Arlin Stoltzfus, Hilmar Lapp, Naim Matasci, Helena Deus, Brian Sidlauskas, Christian M Zmasek, Gaurav Vaidya, Enrico Pontelli, Karen Cranston, Rutger Vos, Campbell O Webb, Luke J Harmon, Megan Pirrung, Brian O'Meara, Matthew W Pennell, Siavash Mirarab, Michael S Rosenberg, James P Balhoff, Holly M Bik, Tracy A Heath, Peter E Midford, Joseph W Brown, Emily Jane McTavish, Jeet Sukumaran, Mark Westneat, Michael E Alfaro, Aaron Steele and Greg Jordan. 2013, Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient, BMC Bioinformatics, volume 14, issue 1, pp. 158
- Arlin Stoltzfus, Enrico Pontelli and Brian O'Meara. 2014. "Collaborative Research: ABI Development: An open infrastructure to disseminate phylogenetic knowledge". National Science Foundation, 3 years funding from July 2015 to 2018.