Thursday, 30 May 2013

Now ready to start real sequencing

As anticipated a couple of posts ago, Life Technologies technicians are installing our brand new ION Proton NGS sequencer in these days. All tests performed fine so we are excited to announce that we are now ready for sequence production!

We hope this will be the beginning of an exciting NGS quest! 
As enthusiastic as we are right now, we have put our lab on the world map of sequencing centers (High-Throughput Sequencing Map site) conceived by James Hadfield (Cancer Research UK, Cambridge) and built by Nick Loman (University of Birmingham).

Meanwhile take a look to our Now-Good-for-Sequencing lab! 

Monday, 27 May 2013

Personalized advertising based on your genomic data

Companies market products and services according to your online browsing behavior, shopping habits, age, and social-network activities. In a very future, they may be able to advertise to you on the basis of your genetic makeup.
Miinome, a Minneapolis-based startup called plans to develop the first “member-controlled human genetic marketplace.” Their innovative idea is to sell DNA information to marketers
The company, which has just three full-time employees and is still hunting for financing, is notable mostly for its bold idea: to sell DNA information to marketers.
A MIT Technology Review article hypothesizes: "Do you carry the genetic variants associated with lactose intolerance? Here, Lactaid has a coupon for you. The genes for male-pattern baldness? That’s accelerated by stress, so maybe you should come in for a discounted massage Jimmy’s Spa & Bath".
“Today, it’s such a niche market, but there’s tremendous growth opportunities there,” said geneticist Michael Schatz of the Cold Spring Harbor Laboratory. “In the endgame, it’s certain [genetics is] going to become one of the factors that big retailers would consider, but I think that’s pretty far off.” More info in this Wired article.

Saturday, 25 May 2013

Evaluating variant detection methods: comparison of aligners and callers

Most of present NGS studies aim to the identification of genetic variants related to a condition of interest. To get to this final result you start with your bunch of sequencing reads and then you have to align them to a reference genome, refine the aligned data and finally call the variants, both SNPs or indels...Straighforward, isn't it? Actually there are several tools to perform each one of this steps and everyone of them produce different results and rely on different alghoritms, that make it more suitable for specific applications. So decide which one is better for your NGS data analysis is certainly not so easy...

Recently I've came into this good comparison on variant detection pipelines published on Blue Collar Bioinformatics blog. It considers the major aligner (bwa and novoaling), post-alignment analysis (using popular tools such as Picard and samtools rmdup) and variant callers (GATK UG and HC and freebayes).
For every steps the author report detailed metrics on the SNPs and indels called, their concordance and so on, giving a framework for the evaluation of the various solutions and assembly of your own analysis workflow.
For example in this picture from the original post on Blue Collar you can appreciate how even the choice of the aligner could impact your final variant dataset, mainly due to different strictness in dealing with indels that results in different depth of coverage in some regions.

As reported by the author: "This evaluation work is part of a larger community effort to better characterize variant calling methods. A key component of these evaluations is a well characterized set of reference variations for the NA12878 human HapMap genome, provided by NIST’s Genome in a Bottle consortium. The diagnostic component of this work supplements emerging tools like GCAT (Genome Comparison and Analytic Testing), which provides a community platform for comparing and discussing calling approaches."

Don't miss this!

Friday, 24 May 2013

Breaking News: the real stuff has arrived!

We are happy to announce that we are finally entering the NGS arena, and this time for real!
These not so mysterious boxes contain a just delivered ION Proton Sequencer with all the related instruments!
In the morning we finally received all the materials (besides the consumables) that will enables us to perform a whole experiment, from data production to data analysis! We are now waiting for the instrument installation to perform our first run! Stay tuned to follow the NGS Brescia evolution!

Friday, 17 May 2013

PubMed Highlight: CNV contribution to carrier states and disease-causing alleles

This paper published on Genome Research analyze a large cohort of patients by CGH array to asses the impact of CNVs on known disease-causing genes. Among others, interesting results are that complex phenotype could arise from structural variations affecting multiple disease-causing genes, and that dominant deleterious genes tend to be less affected (as expected by natural selection).

Deletions of recessive disease genes: CNV contribution to carrier states and disease-causing alleles
Philip M Boone, Ian M Campbell, Brett C Baggett, Zachry T Soens, Mitchell M Rao, Patricia M Hixson, Ankita Patel, Weimin Bi, Sau Wai Cheung, Seema R Lalani, Arthur L Beaudet, Pawel Stankiewicz, Chad A Shaw and James R Lupski1

Over 1,200 recessive disease genes have been described in humans. The prevalence, allelic architecture, and per-genome load of pathogenic alleles in these genes remain to be fully elucidated, as does the contribution of DNA copy-number variants (CNVs) to carrier status and recessive disease. We mined CNV data from 21,470 individuals obtained by array comparative genomic hybridization in a clinical diagnostic setting to identify deletions encompassing or disrupting recessive disease genes. We identified 3,212 heterozygous potential carrier deletions affecting 419 unique recessive disease genes. Deletion frequency of these genes ranged from one occurrence to 1.5%. When compared with recessive disease genes never deleted in our cohort, the 419 recessive disease genes affected by at least one carrier deletion were longer and were located farther from known dominant disease genes, suggesting that the formation and/or prevalence of carrier CNVs may be affected by both local and adjacent genomic features and by selection. Some subjects had multiple carrier CNVs (307 subjects) and/or carrier deletions encompassing more than one recessive disease gene (206 deletions). Heterozygous deletions spanning multiple recessive disease genes may confer carrier status for multiple single-gene disorders, for complex syndromes resulting from the combination of two or more recessive conditions, or may potentially cause clinical phenotypes due to a multiply heterozygous state. In addition to carrier mutations, we identified homozygous and hemizygous deletions potentially causative for recessive disease. We provide further evidence that CNVs contribute to the allelic architecture of both carrier and recessive disease-causing mutations. Thus, a complete recessive carrier screening method or diagnostic test should detect CNV alleles.

Thursday, 2 May 2013

PubMed Highlight: Zebrafish genome sequenced and the systematic genome-wide analysis of zebrafish protein-coding gene function

The new assembly of the Zebrafish (D. rerio) genome has been recently published in Nature describing also the complete set of proteins encoded in the teleost DNA and their relationship to human orthologs. In a second paper  the research group propose a complete genome-wide analysis on genotype-phenotype correlation for every single protein coding gene in the assembly.

The first paper describe the latest assembly of the D. rerio genome.
The first assembly of the Zebrafish genome was made available in 2002 (Zv1), and now the Zebrafish Genome Project has produced the latest detailed assembly based on NGS Illumina technology. The new version represents a great improvement: it provides a better coverage of the entire genome sequence, helped resolve tricky artifacts that are still sparse along the fish genome and allow better identification of the complete set of protein coding genes (about 26,000). Also the catalog of small RNA and repeated elements has been update and some mis-annotated genes has been removed (mostly genes from other species that was assigned to zebrafish in previous assemblies).
Having a complete and robust assembly of this teleost genome is a key factor for research community, giving its important role as animal model for studies on development and characterization of functional impact of mutations in disease genes.

The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013 Apr 25;496(7446):498-503.
Howe K, et al.
Wellcome Trust Sanger Institute

Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

The second paper describe a project for genome-wide characterization of mutations in every single protein coding genes in D. rerio.
In this perspective the research group working on D. rerio genome has published another interesting paper describing their active project to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using high-throughput sequencing and efficient chemical mutagenesis. They have already identified pote
ntially disruptive mutations in more than 38% of all known zebrafish protein-coding genes and assessed the effects of each mutation during embryogenesis. Moreover they have analysed the phenotypic consequences of over 1,000 alleles, making all data available to the community for genotype-phenotype correlation studies.

A systematic genome-wide analysis of zebrafish protein-coding gene function.Nature. 2013 Apr 25;496(7446):494-7.
Kettleborough RN, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, Sealy I, White RJ, Herd C, Nijman IJ, FĂ©nyes F, Mehroke S, Scahill C, Gibbons
R, Wali N, Carruthers S, Hall A, Yen J, Cuppen E, Stemple DL.
Wellcome Trust Sanger Institute 
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding
genes, this number falls considerably short of the more than 22,000 mouse protein-coding genes. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning, insertional mutagenesis, antisense morpholino oligonucleotides, targeted re-sequencing, and zinc finger and TAL endonucleases have made substantial contributions to our understanding of the biological activity of vertebrate genes, but again the number of genes studied falls well short of the more than 26,000 zebrafish protein-coding genes. Importantly, for both mice and zebrafish, none of these strategies are particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Here we describe an active project that aims to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using a well-annotated zebrafish reference genome sequence, high-throughput sequencing and efficient chemical mutagenesis. So far we have identified potentially disruptive mutations in more than 38% of all known zebrafish protein-coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1,000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.