Content area
Full Text
About the Authors:
Guillaume Marçais
Roles Conceptualization, Data curation, Investigation, Methodology, Software, Validation, Writing - original draft, Writing - review & editing
* E-mail: [email protected] (GM); [email protected] (AZ)
Affiliations Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America, Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
ORCID http://orcid.org/0000-0002-5083-5925
Arthur L. Delcher
Roles Data curation, Software, Validation, Writing - review & editing
Affiliation: Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
Adam M. Phillippy
Roles Data curation, Software, Validation, Writing - review & editing
Affiliation: National Human Genome Research Institute, Bethesda, Maryland, United States of America
Rachel Coston
Roles Formal analysis, Validation
Affiliation: Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
Steven L. Salzberg
Roles Funding acquisition, Methodology, Validation, Writing - review & editing
Affiliations Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America, Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
Aleksey Zimin
Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing
* E-mail: [email protected] (GM); [email protected] (AZ)
Affiliations Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America, Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
ORCID http://orcid.org/0000-0001-5091-3092Abstract
The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved...