Pages

Wednesday, 30 April 2014

Pubmed highlight: Comparison of mapping alghoritms

Map millions of reads to some reference genome sequence is in most cases the first step in NGS data analysis. Proper mapping is essential for downstream variant identification and assessing of the quality of each sequenced base. Various tools exist to perform this task and this paper present an interesting new tool to benchmark results from the different aligners. Authors developed CuReSim, a tool able to generate simulated reads dataset resembling differetn NGS technologies, and CuReSimEval, that perform the performance evaluation for a given aligner given the created dataset.
In the paper they apply this new method to compare the performance of some popular aligners (like BWA, TMAP and BowTie) working on Ion Torrent data. "The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes [...] demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used." they reports in the abstract.

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.
Caboche S, Audebert C, Lemoine Y, Hot D

Abstract
BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms.
RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established.
CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.

No comments: