A little note before starting: when values in the literature are discordant, I've reported the different estimates with their own references. In particular, I think that differences in values reported by (1) and (2) could be explained mainly by 2 causes:
1. In (1) NHLBI GO Exome Sequencing Project project uses either Roche/Nimblegen capture or Agilent reagents for exome capture, while in (2) 1000 Genomes Project considers exome portion as defined by GENCODE. So the first includes sequences based on NCBI Consensus CDS database (CCDS) (containing protein-coding genes and some miRNA and snoRNA + UTRs), while the second includes all protein-coding loci with alternatively transcribed variants, non-coding loci with transcript evidence, and pseudogenes.
2. The dataset analyzed by the 1000 Genomes Project comprises a wider representation of Asian and African populations, thus resulting in a higher number of average variants, since the reference genome currently adopted is essentially based on subjects of American/Caucasian origin.
SNVs in Exomes:
Average SNVs in an individual exome: 12400-15000 (average 13600) (of which 66% heterozygous) (1); 24000 (2)
Average indels per individual (2): 440
Expected novel SNVs per exome given present data in public databases: 200-500 (1); however note that for any exome sequence 3.3% of observed heterozygous variants are predicted to be novel based on a recent model about human population growth (3).
Number of SNVs with functional effect on protein-coding genes expected in one genome: 320-510 (about 95% of functional SNVs are rare, MAF < 0.5%).
Indels in protein-coding genes: 110-186 (2).
Frameshift indels: 30-50 (2).
SNVs in disease genes reported by HGMD: 41-84 (2).
SNVs in COSMIC (Catalogue Of Somatic Mutations In Cancer) genes: 33-51 (2).
Mean number of SNVs per gene: 30-40 (2).
Large deletions (>100kb) per exome: 39 (2).
Note also that cause to the recent exponential growth of human population, rare SNVs are expected to account for about 15-20% of total diversity (1, 3).
Variants in Genome (2):
SNPs / genome (autosomes - ChrX): 3.6 M - 105 k.
SNPs / genome (autosomes - ChrX): 3.6 M - 105 k.
Indels / genome (autosomes - ChrX): 344 k - 13 k.
Large deletions / genome (autosomes - ChrX): 717 - 26.
De novo SNVs (4):
Mutation rate per gene per cell division: 10e-6 10e-7 (5).
Loss-of-Function variants:
LoF sites per individual: 100-120 (estimated in 7); 30-40 (observed in 1).
Number of genes completely inactivated due to homozygous LoF: about 20 (estimated in 7); at least 1 (observed in 1).
It has been reported that genes affected by LoF variants are relatively less evolutionary conserved, showing a higher ratio of protein-altering to silent substitutions in coding regions between human and macaque (P =2.8 × 10e−52) and less evolutionary conservation in their promoter regions (GERP score; P = 3.7 × 10e−16). On average, they have more closely related gene family members (paralogs) than other genes (P = 0.0058) and show greater sequence identity to paralogs (P = 0.0068). These data suggest that LoF strikes mainly genes with redundant or not essential function (7).
References:
(1) Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes (May 2012)
(3) Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants (May 2012)
(4) De novo mutations in human genetic disease (Aug 2012)
Mutation rate per gene per cell division: 10e-6 10e-7 (5).
Observed de novo SNVs per individual: 74, giving a mutation rate of 1.18 x 10e-8 per position.
Observed de novo Indels per individual: 3, giving a mutation rate of 4 x 10e-10 per position, with deletions being 3 times more frequent than insertions.
Observed de novo CNVs (>100kb) per individual: 1 de novo every 50 individuals. However It's worth noting that 10% of subjects with Intellectual Disability, Autism Spectrum Disorders and Schizophrenia present large CNVs.
The number of de novo SNVs and CNVs is strongly influenced by parental age and ethnicity, with an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years (6). On the other hand, maternal age correlates with increased probability of aneuploidies.
The number of de novo SNVs and CNVs is strongly influenced by parental age and ethnicity, with an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years (6). On the other hand, maternal age correlates with increased probability of aneuploidies.
LoF sites per individual: 100-120 (estimated in 7); 30-40 (observed in 1).
Number of genes completely inactivated due to homozygous LoF: about 20 (estimated in 7); at least 1 (observed in 1).
It has been reported that genes affected by LoF variants are relatively less evolutionary conserved, showing a higher ratio of protein-altering to silent substitutions in coding regions between human and macaque (P =2.8 × 10e−52) and less evolutionary conservation in their promoter regions (GERP score; P = 3.7 × 10e−16). On average, they have more closely related gene family members (paralogs) than other genes (P = 0.0058) and show greater sequence identity to paralogs (P = 0.0068). These data suggest that LoF strikes mainly genes with redundant or not essential function (7).
References:
(1) Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes (May 2012)
(3) Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants (May 2012)
(4) De novo mutations in human genetic disease (Aug 2012)
(5) A Quantitative Measurement of the Human Somatic Mutation Rate (Sep 2005)
(6) Rate of de novo mutations and the importance of father’s age to disease risk (Aug 2012)
(7) A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes (Feb 2012)
(6) Rate of de novo mutations and the importance of father’s age to disease risk (Aug 2012)
(7) A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes (Feb 2012)
No comments:
Post a Comment