Application of taxon concepts in biodiversity informatics

PI(s): Robert Peet (University of North Carolina-Chapel Hill)
Start Date: 1-Sep-2008
End Date: 15-Aug-2009

ccurate identification and labeling of organisms is a critical part of collecting, recording, and reporting biological data. Names are also required for any subsequent re-interpretation or reuse of archived data, and for discovery of relevant articles and archived datasets. However, unless biologists label their data according to an agreed, stable naming standard, it will be impossible to accurately resolve, integrate and compare datasets with respect to the entities studied. Unfortunately, owing to the continual discovery of new information, a standard naming system with static, globally unambiguous identifiers for groups of organisms is unattainable. Although time-tested rules of nomenclature determine which scientific names should be applied to taxa in a particular taxonomic classification, these classifications evolve over time. Because the processes of nomenclature and taxonomy are partly independent, the circumscriptions of the organisms associated with scientific names may vary with taxonomic revision, the taxonomic authority cited, and even with the geographic range of the classification. Consequently, a valid scientific name can have multiple taxonomic interpretations and multiple scientific names can refer to the same taxa.

Increasingly research in biodiversity and ecology is based on the integration (and re-use) of multiple datasets discovered and obtained over the web. Integration and synthesis of large quantities of mixed-provenance data containing organism identifications becomes nearly impossible if based only on traditional Linnaean nomenclature. Over the last several years I have worked with colleagues to resolve this problem by developing a new approach where organisms are identified by ‘taxon concept’, which in its simplest form can be thought of as a name as used by a specific authority. My principal objectives for this leave will be to develop, refine, and implement protocols, procedures and standards for documenting concept relationships and for identification of organisms using concepts. This work should significantly contribute to moving concept-based biodiversity informatics from theoretical discussions to productive implementation, thereby creating new opportunities for large-scale data sharing, integration, and synthesis.