Pages

Saturday, 25 May 2013

Evaluating variant detection methods: comparison of aligners and callers

Most of present NGS studies aim to the identification of genetic variants related to a condition of interest. To get to this final result you start with your bunch of sequencing reads and then you have to align them to a reference genome, refine the aligned data and finally call the variants, both SNPs or indels...Straighforward, isn't it? Actually there are several tools to perform each one of this steps and everyone of them produce different results and rely on different alghoritms, that make it more suitable for specific applications. So decide which one is better for your NGS data analysis is certainly not so easy...

Recently I've came into this good comparison on variant detection pipelines published on Blue Collar Bioinformatics blog. It considers the major aligner (bwa and novoaling), post-alignment analysis (using popular tools such as Picard and samtools rmdup) and variant callers (GATK UG and HC and freebayes).
For every steps the author report detailed metrics on the SNPs and indels called, their concordance and so on, giving a framework for the evaluation of the various solutions and assembly of your own analysis workflow.
For example in this picture from the original post on Blue Collar you can appreciate how even the choice of the aligner could impact your final variant dataset, mainly due to different strictness in dealing with indels that results in different depth of coverage in some regions.


As reported by the author: "This evaluation work is part of a larger community effort to better characterize variant calling methods. A key component of these evaluations is a well characterized set of reference variations for the NA12878 human HapMap genome, provided by NIST’s Genome in a Bottle consortium. The diagnostic component of this work supplements emerging tools like GCAT (Genome Comparison and Analytic Testing), which provides a community platform for comparing and discussing calling approaches."

Don't miss this!

No comments: