Friday, 17 May 2013
Deletions of recessive disease genes: CNV contribution to carrier states and disease-causing alleles
Philip M Boone, Ian M Campbell, Brett C Baggett, Zachry T Soens, Mitchell M Rao, Patricia M Hixson, Ankita Patel, Weimin Bi, Sau Wai Cheung, Seema R Lalani, Arthur L Beaudet, Pawel Stankiewicz, Chad A Shaw and James R Lupski1
Over 1,200 recessive disease genes have been described in humans. The prevalence, allelic architecture, and per-genome load of pathogenic alleles in these genes remain to be fully elucidated, as does the contribution of DNA copy-number variants (CNVs) to carrier status and recessive disease. We mined CNV data from 21,470 individuals obtained by array comparative genomic hybridization in a clinical diagnostic setting to identify deletions encompassing or disrupting recessive disease genes. We identified 3,212 heterozygous potential carrier deletions affecting 419 unique recessive disease genes. Deletion frequency of these genes ranged from one occurrence to 1.5%. When compared with recessive disease genes never deleted in our cohort, the 419 recessive disease genes affected by at least one carrier deletion were longer and were located farther from known dominant disease genes, suggesting that the formation and/or prevalence of carrier CNVs may be affected by both local and adjacent genomic features and by selection. Some subjects had multiple carrier CNVs (307 subjects) and/or carrier deletions encompassing more than one recessive disease gene (206 deletions). Heterozygous deletions spanning multiple recessive disease genes may confer carrier status for multiple single-gene disorders, for complex syndromes resulting from the combination of two or more recessive conditions, or may potentially cause clinical phenotypes due to a multiply heterozygous state. In addition to carrier mutations, we identified homozygous and hemizygous deletions potentially causative for recessive disease. We provide further evidence that CNVs contribute to the allelic architecture of both carrier and recessive disease-causing mutations. Thus, a complete recessive carrier screening method or diagnostic test should detect CNV alleles.
Thursday, 2 May 2013
PubMed Highlight: Zebrafish genome sequenced and the systematic genome-wide analysis of zebrafish protein-coding gene function
The new assembly of the Zebrafish (D. rerio) genome has been recently published in Nature describing also the complete set of proteins encoded in the teleost DNA and their relationship to human orthologs. In a second paper the research group propose a complete genome-wide analysis on genotype-phenotype correlation for every single protein coding gene in the assembly.
The first paper describe the latest assembly of the D. rerio genome.
The first assembly of the Zebrafish genome was made available in 2002 (Zv1), and now the Zebrafish Genome Project has produced the latest detailed assembly based on NGS Illumina technology. The new version represents a great improvement: it provides a better coverage of the entire genome sequence, helped resolve tricky artifacts that are still sparse along the fish genome and allow better identification of the complete set of protein coding genes (about 26,000). Also the catalog of small RNA and repeated elements has been update and some mis-annotated genes has been removed (mostly genes from other species that was assigned to zebrafish in previous assemblies).
Having a complete and robust assembly of this teleost genome is a key factor for research community, giving its important role as animal model for studies on development and characterization of functional impact of mutations in disease genes.
The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013 Apr 25;496(7446):498-503.
Howe K, et al.
Wellcome Trust Sanger Institute
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The second paper describe a project for genome-wide characterization of mutations in every single protein coding genes in D. rerio.
In this perspective the research group working on D. rerio genome has published another interesting paper describing their active project to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using high-throughput sequencing and efficient chemical mutagenesis. They have already identified pote
ntially disruptive mutations in more than 38% of all known zebrafish protein-coding genes and assessed the effects of each mutation during embryogenesis. Moreover they have analysed the phenotypic consequences of over 1,000 alleles, making all data available to the community for genotype-phenotype correlation studies.
A systematic genome-wide analysis of zebrafish protein-coding gene function.Nature. 2013 Apr 25;496(7446):494-7.
Kettleborough RN, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, Sealy I, White RJ, Herd C, Nijman IJ, Fényes F, Mehroke S, Scahill C, Gibbons
R, Wali N, Carruthers S, Hall A, Yen J, Cuppen E, Stemple DL.
Wellcome Trust Sanger Institute
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding
genes, this number falls considerably short of the more than 22,000 mouse protein-coding genes. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning, insertional mutagenesis, antisense morpholino oligonucleotides, targeted re-sequencing, and zinc finger and TAL endonucleases have made substantial contributions to our understanding of the biological activity of vertebrate genes, but again the number of genes studied falls well short of the more than 26,000 zebrafish protein-coding genes. Importantly, for both mice and zebrafish, none of these strategies are particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Here we describe an active project that aims to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using a well-annotated zebrafish reference genome sequence, high-throughput sequencing and efficient chemical mutagenesis. So far we have identified potentially disruptive mutations in more than 38% of all known zebrafish protein-coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1,000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.
Monday, 15 April 2013
This demonstrate how far we have gone in our ability to map the activity of single cells and open amazing possibilities for future studies, promising to provide knowledge on how the brain response to stimuli or coordinate body activities.
The construction of a complete and informative map of the neuron interactions seems feasible and it is a hot topic right now. The US BRAIN project (recently founded with 100 million $ by Obama administration) and the European Human Brain Project (HBP) (founded with 1 billion euros for ten years as one of the EU FET-flegship), are two huge international initiatives, just started to accomplish this ambitious goal.
Someone have already pointed to the brain map as the third revolutionizing achievement after the Human Genome Project and the ENCODE project.
See your brain, with plenty of colourful neuron cell...
This time you can literally say: this is a brilliant idea!
And THIS is amazing science!
Wednesday, 10 April 2013
Tuesday, 9 April 2013
The topic of sequencing from nanograms or picograms is extensively covered in this post on CoreGenomics, that provides also some examples of recently published papers and dicuss about the new library preparation kits that make NGS sequencing from low input DNA fast and easy!
Now the stories!
We have reported on September 2012 about Revive and Restore (see the our post here), a company founded with the ambitious and controversial aim of sequence and reconstruct the genome of extinct species with the ultimate goal of eventually bring them back to life.
We were not sure how far this initiative would have gone, but the last month they surprised us again with the announcement of an actual project to resurrect the extinct passenger pidgeon.
However the idea not only require hard work, but it could really get dicey. Indeed, according to Shapiro, "because the last common ancestor of the two species flew about 30 million years ago, their genomes will likely differ at millions of locations." Fitting the pieces together will be grueling, if not impossible. GenomeWeb have a post on this and Wired also has covered the story in this article from Kelly Servick.
However one consider this real science or fantasy science, the general topic of de-extinction is getting increasing attention now that the DNA sequencing technology allow to effectively assemble genomes from ancient and degraded samples (remember for example the Neanderthal genome or the Mammoth genome). The collection of DNA from different species is a part of some huge projects intended to preserve and study biodiversity and experts are now discussing if and eventually how we have to deal with species that go extinct over time. If we as human race are responsible for the disappearing of a specific organism and we have the ability to bring it back to life, should we do this? What are the risks of re-introduce extinct species in our ecosystem?
Recently a TEDx event has been organized exploring the topic of de-extinction. and the recent advances in the field have received attention also from the National Geographic and The New York Times (this one dealing with bring an extinct frog back to life). Revive and Restore has a list of candidate organisms waiting for de-exctinction and is searching for collaborators! Looking at their list I may like to see a Dodo walking again in the garden...but want to raise my doubts about a tooth-saber cat!!
The second news is directly from scientific literature. In their paper recently published on Journal of Applied Genetics, Khairat et al. from the University of Tubingen, report the first metagenome analysis on ancient egyptian mummies. Their dataset comprise seven sequencing experiments performed on DNA obtained from five randomly selected Third Intermediate to Graeco-Roman Egyptian mummies (806 BC-124AD) and two unearthed pre-contact Bolivian lowland skeletons. Analyzing the data their were able to identify different genetic materials from bacteria, presumibly due to contamination from mummies conservation procedures, and also from plants, potentially associated with their use in embalming reagents. The paper demonstrates that also DNA from ancient mummies, could be a proper template for NGS sequencing, despite its age and the several treatment performed on the samples in the course of the conservation protocols.
Saturday, 30 March 2013
Friday, 29 March 2013
I've just finished an intensive course on NGS data analysis where command line based soutions where of course the best reported way to manage and make sense of data.
Playing with scripts, unix code and R language make you feel a sort of bioinformatic power. You start to blame all those wet-lab collegues spending hours on excel spreadsheets. You are amazed by the results of your last programming trick and effectivness of your command-line skills. Even if this make you proude, keep in mind that a screen full of symbols and over-a-million-row tables have to most og biologist and geneticists the same appeal of the flowing characters of The Matrix...As in the famous movie, not everyone can see the meaning behind the code, most of them will just see a bunch of chars and number, doubting that this is The real world!
A good visualization of genomic data from NGS experiments would make your results nicer to see, easier to explain and explore. Moreover, a colorful alignments of reads in genome browser style or a circos graph sure make a better impact when you show them in your presentations! The scientific community constantly ask for visulization tools that simplify the task of explaining and exploring NGS data, so that they became accessible to everyone, even to the old-school ones.
The last special issue of Briefings in Bioinformatics make an extensive review of the main visualization tools, with an overview on their peculiar advantages and main features. Web-based browsers, UCSC Genome Browser, IGV, Tablet, Bamview and GBrowse are all covered, making this issue the ideal answer to the collegue asking you: "I've just received this great NGS data, but what are all these bam and vcf files? I want to see them nicely placed on my favourite chromosome!".
Main articles in the special issue:
Jun Wang, Lei Kong, Ge Gao, and Jingchu Luo
A brief introduction to web-based genome browsers
Robert M. Kuhn, David Haussler, and W. James Kent
The UCSC genome browser and associated tools
Lincoln D. Stein
Using GBrowse 2.0 to visualize and share next-generation sequence data
Oscar Westesson, Mitchell Skinner, and Ian Holmes
Visualizing next-generation sequencing data with JBrowse
Helga Thorvaldsdóttir, James T. Robinson, and Jill P. Mesirov
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
Iain Milne, Gordon Stephen, Micha Bayer, Peter J.A. Cock, Leighton Pritchard, Linda Cardle, Paul D. Shaw, and David Marshall
Using Tablet for visual exploration of second-generation sequencing data
Tim Carver, Simon R. Harris, Thomas D. Otto, Matthew Berriman, Julian Parkhill, and Jacqueline A. McQuillan
BamView: visualizing and interpretation of next-generation sequencing read alignments
Michael C. Schatz, Adam M. Phillippy, Daniel D. Sommer, Arthur L. Delcher, Daniela Puiu, Giuseppe Narzisi, Steven L. Salzberg, and Mihai Pop
Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies
Thursday, 28 March 2013
In these days I'm attending to an intensive course on NGS data analysis...Everyday we deal with about 9h of bioinformatics, both theory and scripting...And a tons of useful tools have been cited during the course...
Since the names of these softwares are all but easy to remember, I found myself wondering for a summary that give a compact and organized overview and quick access to the main ones.
Considering that bioinformatics tricks have became as essentials as chemical elements, The Elements of Bioinformatics table from Eagle Genomics is an efficient and funny answer to my needs
If programming, analyzing DNA data and talking about stats and complex biology don't satisfy your need to look nerdy, use this table to remember strange-named tools should improve your reputation as a real geek!!
Have fun (if you read this blog I'm sure you will!)
Thursday, 21 March 2013
PubMed Highlight: The origin, evolution and functional impact of short insertion-deletion variants identified in 179 human genomes
However a detailed genome-wide assessment of indels impact and dsitribution still missing...until now.
In this interesting paper appeared in Genome Research, Montgomery et al. address exactly this question and with amazing results. First of all authors as to deal with the short Indels calling challenge that is one of the biggest issue when analyzing NGS data. Starting with DNA sequences from 179 individuals from 3 population groups, they made several optimization to the standard pipeline used by the 1000 Genome Project to obtain a set of high quality indels. Even if indels in homopolymeric regions remain out of reach, the improved pipeline described in the paper is certainly a guideline for anyone working in the field. Among the other interesting findings, authors confirmed that rates of indel mutagenesis are highly heterogeneous, with 43-48% of indels occurring in 4.03% of the genome (loci defined as indel hotspots by the authors), and they proposed fork stalling and template switching (FoSTeS) together with polymerase slippage as the main mechanism originating the indels.
Take a look!