This paper recently published on Genome Medicine analyzes the differences in variants annotation when using different annotation tools and transcript datasets. Authors extensively report on pitfalls and peculiar aspects of the two mostly diffused softwares (ANNOVAR and VEP), using transcripts definition from both RefSeq and Ensembl databases.
From the paper abstract clearly emerges a high level of discrepancy, expecially for functional variants (namely missense and LoF) that are the far the most relevant in NGS analysis. "We find only 44% agreement in annotations for putative loss-of-function variants when using the REFSEQ and ENSEMBL transcript sets as the basis for annotation with ANNOVAR. The rate of matching annotations for loss-of-function and nonsynonymous variants combined is 79% and for all exonic variants it is 83%. When comparing results from ANNOVAR and VEP using ENSEMBL transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy" authors write.
What impress me the most is the low level of concordance between different transcript datasets, reflecting the fact that also the annotation of mRNA forms are far from definitely established.
So be careful with your annotations!
Here the full paper:
Choice of transcripts and software has a large effect on variant annotation
Davis J McCarthy, Peter Humburg, Alexander Kanapin, Manuel A Rivas, Kyle Gaulton, The WGS500 Consortium, Jean-Baptiste Cazier and Peter Donnelly
No comments:
Post a Comment