Content area
Full Text
About the Authors:
Indra Neil Sarkar
* E-mail: [email protected]
Affiliations Center for Clinical and Translational Science, University of Vermont, Burlington, Vermont, United States of America, Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America, Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America
Michael Trizna
Affiliation: Consortium for the Barcode of Life, Smithsonian Institution, Washington D.C., United States of America
Introduction
Computational approaches have increasingly become a keystone element for the advancement of biological science [1], [2]. This has especially been the case in light of significant advances in sequencing technologies, such as those employed by major initiatives like DNA Barcoding [3], [4]. The basic premise of DNA Barcoding is to make use of defined segments of molecular sequence data for robust identification of life on Earth. DNA Barcoding has emerged as a systematic framework for the identification of fauna [4], [5] (and, more recently, flora [6]) according to a standardized reference library of molecular sequences [7], [8]. A number of tools and approaches have been developed for the analysis of DNA Barcode data [9], [10], [11], [12], , including those associated with classification of previously un-identified sequences. Collectively, the suite of analytic tasks in DNA Barcoding involves a wide range of analytic procedures (from taxon identification to data visualization to linkage with complementary knowledge sources). However, a limiting step for the community at large has been the limited availability of these innovative approaches.
To meet the ultimate goal of developing robust and efficient mechanisms for identification of biota using fragments of molecular sequence, the DNA Barcoding community will depend on a reliable framework for leveraging the latest innovations in classification approaches. In advance of approaches and classification systems being incorporated into production level interfaces (such as the Barcode of Life Data System [BOLD; http://www.boldsystems.org]), there may be great benefit for researchers to explore a range of computational approaches that are still in an “experimental” phase.
Service oriented workflow management and analysis toolkits (e.g., Taverna [14] or CIPRES [http://www.phylo.org/]) have been developed that cater to a range of communities. These systems are pioneering a new set of computational approaches that enable integration of analytic approaches across the Internet through the use of “Web services.” Amidst...