Thursday, 16 February 2012
Broken genes in healty people: sequencing errors and genome robustness
A new paper published in Science by Daniel McArthur provides the first exactly estimates of the presence and impact of loss of function (LoF) variants in healthy human genomes.
This extensive analysis, based on data from the 1000 Genome Project, applied informatic as well as experimental filters to distinguish true LoF variants from all those that are due to errors in sequencing, variant calling alghoritms or gene annotations. Results indicate that LoF are subjected to strong puryfing selection that tends to eliminate them from populations. Every genome harbors in mean 100 real LoF, most of them in an heterozygous state or in genes with few protein interactions. So completely disrupted genes are in fact rare, reducing previous estimates on their abundance. However the fact that the knock-out of a gene produces no phenotypic effects induces to reconsider the robustness of human genome and the redundancy of genes.
But the most interesting aspect, stressed also by the author in his post on MassGenomics, is the evaluation of the errors resulting in an high false positive rate when identyfing LoF. Authors stated that "the greater the predicted functional impact of a sequence variant, the more likely it is to be a false positive". This is caused by the fact that disrupting mutations are subjected to puryfing selection and thus tend to be removed from population, meanwhile errors are equally ditributed across all types of variants resulting in a greater proportion of false positives when looking to LoF variants.
So in the frenzy search for disease causing mutations you'd better be careful!