Sunday, 28 July 2013

Pubmed highlight: review on ngs data format, visualization and analysis tools

"Unraveling genomic variation from next generation sequencing data."
http://feedly.com/k/18GOEVs

Monday, 22 July 2013

Pubmed Highlight: silencing the extra copy of chromosome 21 in Down’s syndrome cells using the XIST gene

Having been been part in 1991 of the team who originally cloned the mouse Xist gene, I've been really excited by the news. Scientists at the University of Massachusetts discovered that XIST, the gene involved in X-chromosome inactivation, can be used to turn off the extra chromosome 21 in Down syndrome.

The study has been published on the latest issue of Nature.

2013 Jul 17. doi: 10.1038/nature12394.

Translating dosage compensation to trisomy 21.

Jiang J, Jing Y, Cost GJ, Chiang JC, Kolpa HJ, Cotton AM, Carone DM, Carone BR, Shivak DA, Guschin DY, Pearl JR, Rebar EJ, Byron M, Gregory PD, Brown CJ, Urnov FD, Hall LL, Lawrence JB.

Source

Department of Cell and Developmental Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, Massachusetts 01655, USA.

Abstract

Down's syndrome is a common disorder with enormous medical and social costs, caused by trisomy for chromosome 21. We tested the concept that gene imbalance across an extra chromosome can be de facto corrected by manipulating a single gene, XIST (the X-inactivation gene). Using genome editing with zinc finger nucleases, we inserted a large, inducible XIST transgene into the DYRK1A locus on chromosome 21, in Down's syndrome pluripotent stem cells. The XIST non-coding RNA coats chromosome 21 and triggers stable heterochromatin modifications, chromosome-wide transcriptional silencing and DNA methylation to form a 'chromosome 21 Barr body'. This provides a model to study human chromosome inactivation and creates a system to investigate genomic expression changes and cellular pathologies of trisomy 21, free from genetic and epigenetic noise. Notably, deficits in proliferation and neural rosette formation are rapidly reversed upon silencing one chromosome 21. Successfultrisomy silencing in vitro also surmounts the major first step towards potential development of 'chromosome therapy'.

PMID:: 23863942

Friday, 19 July 2013

PubMed Highlight: A new type of viruses, the Pandoraviruses.

Pandoraviruses: Amoeba Viruses with Genomes Up to 2.5 Mb Reaching That of Parasitic Eukaryotes

Abstract:

Ten years ago, the discovery of Mimivirus, a virus infecting Acanthamoeba, initiated a reappraisal of the upper limits of the viral world, both in terms of particle size (>0.7 micrometers) and genome complexity (>1000 genes), dimensions typical of parasitic bacteria. The diversity of these giant viruses (the Megaviridae) was assessed by sampling a variety of aquatic environments and their associated sediments worldwide. We report the isolation of two giant viruses, one off the coast of central Chile, the other from a freshwater pond near Melbourne (Australia), without morphological or genomic resemblance to any previously defined virus families. Their micrometer-sized ovoid particles contain DNA genomes of at least 2.5 and 1.9 megabases, respectively. These viruses are the first members of the proposed “Pandoravirus” genus, a term reflecting their lack of similarity with previously described microorganisms and the surprises expected from their future study.

Wednesday, 17 July 2013

PubMed Highlight: RNA-Seq analysis made simple

Use of RNA-Seq data to asses differential expression and analyze variation in splicing and isoforms is becoming a recurrent task for many lab interested in gene expression. As usual with NGS, generate the is quite fast and simple but strong bioinformatic know-how is required to actually answer the biological question.
With this paper published in BMC Bioinformatics, Boria et al. provide a simple and automated analysis pipeline for RNA-Seq data, with the ability to detect deferentially expressed genes, differential splicing events and new gene transcripts. The suite is freely available upon registration at this web address.

BMC Bioinformatics. 2013 Apr 22;14 Suppl 7:S10.

NGS-Trex: Next Generation Sequencing Transcriptome profile explorer.

Boria I, Boatti L, Pesole G, Mignone F.

Abstract

BACKGROUND: Next-Generation Sequencing (NGS) technology has exceptionally increased the ability to sequence DNA in a massively parallel and cost-effective manner. Nevertheless, NGS data analysis requires bioinformatics skills and computational resources well beyond the possibilities of many "wet biology" laboratories. Moreover, most of projects only require few sequencing cycles and standard tools or workflows to carry out suitable analyses for the identification and annotation of genes, transcripts and splice variants found in the biological samples under investigation. These projects can take benefits from the availability of easy to use systems to automatically analyse sequences and to mine data without the preventive need of strong bioinformatics background and hardware infrastructure. RESULTS: To address this issue we developed an automatic system targeted to the analysis of NGS data obtained from large-scale transcriptome studies. This system, we named NGS-Trex (NGS Transcriptome profile explorer) is available through a simple web interface http://www.ngs-trex.org and allows the user to upload raw sequences and easily obtain an accurate characterization of the transcriptome profile after the setting of few parameters required to tune the analysis procedure. The system is also able to assess differential expression at both gene and transcript level (i.e. splicing isoforms) by comparing the expression profile of different samples.By using simple query forms the user can obtain list of genes, transcripts, splice sites ranked and filtered according to several criteria. Data can be viewed as tables, text files or through a simple genome browser which helps the visual inspection of the data. CONCLUSIONS: NGS-Trex is a simple tool for RNA-Seq data analysis mainly targeted to "wet biology" researchers with limited bioinformatics skills. It offers simple data mining tools to explore transcriptome profiles of samples investigated taking advantage of NGS technologies.

Wednesday, 10 July 2013

PubMed Highlight: Evaluation of bioinformatic tools for prediction of functional impact of missense variants

This interesting paper evaluates the performance (sensitivity and specificity) of 9 different tools commonly used in bioinformatics to predict the functional effect of a missense mutation. The authors also developed a publicly available Web-tool (Variant Effect Prediction) to estimate a consensus score taking into account the results from four different tools (SIFT, PolyPhen2, SNPs&GO and Mutation Assessor).
Since the automated prediction of functional impact is part of most SNV prioritization pipelines, this paper could certainly be useful to develop a robust NGS secondary analysis.

Genomics. 2013 Jul 3. pii: S0888-7543(13)00126-2
Predicting the functional consequences of non-synonymous DNA sequence variants - evaluation of bioinformatics tools and development of a consensus strategy.

Frousios K, Iliopoulos CS, Schlitt T, Simpson MA.

Abstract

The study of DNA sequence variation has been transformed by recent advances in DNA sequencing technologies. Determination of the functional consequences of sequence variant alleles offers potential insight as to how genotype may influence phenotype. Even within protein coding regions of the genome, establishing the consequences of variation on gene and protein function is challenging and often requires substantial laboratory investigation. However, a series of bioinformatics tools have been developed to predict whether non-synonymous variants are neutral or disease-causing. In this study we evaluate the performance of nine such methods (SIFT, PolyPhen2, SNPs&GO, PhD-SNP, PANTHER, Mutation Assessor, MutPred, Condel and CAROL) and developed CoVEC (Consensus Variant Effect Classification), a tool that integrates the prediction results from four of these methods. We demonstrate that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools.

Made with Love (and Science): first child born following embryo screening with NGS

Above is picture of Connor Levy from NewsWorks

The news has been reported on July 8 at the annual meeting of European Society of Human Reproduction and Embryology (ESHRE) by Dr Dagan Wells of the NIHR Biomedical Research Centre at the University of Oxford, UK.
According to The Guardian, after standard treatment at a US clinic a Philadelphia couple had 13 in vitro fertilization embryos embryos to choose from. The doctors cultured the embryos for five days, took a few cells from each and sent them to Dr. Wells in Oxford for genetic screening. Tests performed using NGS on a Ion Torrent platform showed that while most of the embryos looked healthy, only three had the right number of chromosomes. Based on the screening results, the US doctors transferred one of the healthy embryos into the mother and left the rest in cold storage. The single embryo implanted, and on 18 May 2013 a healthy boy, named Connor, was born.
Apparently the Oxford team has used NGS for testing for aneuploidy, mutations in the cystic fibrosis gene and mtDNA.
Dr Wells, who led the international research team behind the study, said: "Many of the embryos produced during infertility treatments have no chance of becoming a baby because they carry lethal genetic abnormalities. Next generation sequencing improves our ability to detect these abnormalities and helps us identify the embryos with the best chances of producing a viable pregnancy. Potentially, this should lead to improved IVF success rates and a lower risk of miscarriage".
The abstract of the ESHRE communication can be downloaded here.

Tuesday, 2 July 2013

Incredible But True: Life Technologies introduces an amplicon-based exome sequencing kit

Life Technologies has launched an exome capture kit that makes use of its AmpliSeq technology.
According to the manufacturer, the AmpliSeq™ Exome Kit minimizes the high cost and complexity of exome sequencing enabling the enrichment and sequencing of ~294,000 amplicons (!!!).
The kit targets 97% of coding regions, as described by Consensus Coding Sequences

(CCDS) annotation, in 12 primer pools for highly specific enrichment of exons within the human genome totaling ~58 Mb (it is not clear to me if the amplicons include sequences like UTRs and miRNA). The novel technology, designed for the Ion Proton platform, delivers >94% of targeted bases covered at 10x even with two exomes per Ion PI chip. Total workflow from DNA to annotated variants of an exome can be achieved in two days, including six hours for exome library preparation and three hours of sequencing time. Compared to hybridization approaches for exome sequencing, one advantage of an amplicon-based approached is that the input amount is small (as little as 50 nanograms).
Additional information can be found in this Life Technologies' Application Note.

Sunday, 30 June 2013

PubMed Highlight: Prioritization of synonymous variants

The final step of variant prioritization is a key point in NGS studies focused on identification of disease causing mutations. By now all the tools developed in this area consider only missense mutations, relying on various algorithms and integration with known information to suggest the best causative variants within a list of candidates. However, recent studies showed that also synonymous mutations could be responsible for disease. The new Silent Variant Analyzer (SilVA), describes by Buske et al. on Bioinformatics, is the first effort to prioritize synonymous variants and identify the ones that may be deleterious. I'm sorry, it seems that we can't anymore throw away synonymous SNVs to simplify data analysis...

Identification of deleterious synonymous variants in human genomes

Orion J. Buske, AshokKumar Manickaraj, Seema Mital, Peter N. Ray and Michael Brudno

Abstract

Motivation: The prioritization and identification of disease-causing mutations is one of the most significant challenges in medical genomics. Currently available methods address this problem for non-synonymous single nucleotide variants (SNVs) and variation in promoters/enhancers; however, recent research has implicated synonymous (silent) exonic mutations in a number of disorders.

Results: We have curated 33 such variants from literature and developed the Silent Variant Analyzer (SilVA), a machine-learning approach to separate these from among a large set of rare polymorphisms. We evaluate SilVA’s performance on in silico ‘infection’ experiments, in which we implant known disease-causing mutations into a human genome, and show that for 15 of 33 disorders, we rank the implanted mutation among the top five most deleterious ones. Furthermore, we apply the SilVA method to two additional datasets: synonymous variants associated with Meckel syndrome, and a collection of silent variants clinically observed and stratified by a molecular diagnostics laboratory, and show that SilVA is able to accurately predict the harmfulness of silent variants in these datasets.

Availability: SilVA is open source and is freely available from the project website: http://compbio.cs.toronto.edu/silva

Thursday, 27 June 2013

PubMed Highlight: The state of the art in Genomic Medicine

As interested in genomic studies, we can not miss this review appeared on Science Translational Medicine, that give a comprhensive overview of the impact and future perspective of genomics applied to medicine. The authors will guide you through a decade of genomic research that lead to identification of genetic causes for many mendelian diseases as well as the dissection of genetic factors underlying complex diseases. They also show how recent advances in sequencing technology have finally allowed for development of clinical genomic-driven care of patients, at least in some field such as cancer pharmacogenomics and genetic diagnosis.

Personalized medicine, the final goal that pushed us to decipher the whole DNA sequence, seems now close...or at least NGS has posed this goal within grasp.

Genomic Medicine: A Decade of Successes, Challenges, and Opportunities

Jeanette J. McCarthy1,2, Howard L. McLeod3 and Geoffrey S. Ginsburg1,*

Sci Transl Med 12 June 2013: Vol. 5, Issue 189, p. 189sr4

Abstract

Genomic medicine—an aspirational term 10 years ago—is gaining momentum across the entire clinical continuum from risk assessment in healthy individuals to genome-guided treatment in patients with complex diseases. We review the latest achievements in genome research and their impact on medicine, primarily in the past decade. In most cases, genomic medicine tools remain in the realm of research, but some tools are crossing over into clinical application, where they have the potential to markedly alter the clinical care of patients. In this State of the Art Review, we highlight notable examples including the use of next-generation sequencing in cancer pharmacogenomics, in the diagnosis of rare disorders, and in the tracking of infectious disease outbreaks. We also discuss progress in dissecting the molecular basis of common diseases, the role of the host microbiome, the identification of drug response biomarkers, and the repurposing of drugs. The significant challenges of implementing genomic medicine are examined, along with the innovative solutions being sought. These challenges include the difficulty in establishing clinical validity and utility of tests, how to increase awareness and promote their uptake by clinicians, a changing regulatory and coverage landscape, the need for education, and addressing the ethical aspects of genomics for patients and society. Finally, we consider the future of genomics in medicine and offer a glimpse of the forces shaping genomic medicine, such as fundamental shifts in how we define disease, how medicine is delivered to patients, and how consumers are managing their own health and affecting change.

Tuesday, 25 June 2013

Computational diet: how to make your Gb human genome as light as few Mb

With the technology rapidly developing, whole exome/genome sequence of individual subjects is nowadays been performed by many labs all around the world. From the first examples of indivudal genomes, such as Venter or Watson complete DNA, we have now reach a point where the entire 6 billion bases could be sequenced in about a day. With such an high data production rate, the data storage problem suddenly emerged as a painful thorn in the NGS boot, getting worse with every (sequencing) run!

In a future when genome based personalized medicine would become reality, your DNA sequence will need to be stored life through as any other medical record.

Long term storage of at least the complete DNA sequence could be a crucial factor, since as new genotype-phenotype correlations are discovered the owner of the genome could be updated with the new relevant information.

Different solutions has been proposed to reduce the amount of information that have to be stored for a single genome sequence so to make storage of large genomic data-set feasible. However, at least the final exome/genome sequence of the individual can now be reduced to few Mb of disk space so that you can accommodate several of them even in a standard hard-disk.

This incredible result is achieved using various compression alghortims. The first generation basically used information on repetitive regions in the genome to reduce the final file size to few hundred Mb. Second generation alghoritms are instead based on a reference sequence and only store the differences between the reference and the sequenced DNA. The best performing tools until now was, DNAZip, which reduced the Watson genome sequence to only 4Mb...something that you can easily share as email attachment, as the author stated. Now Pavlichin et al. from Stanford university has pushed compression even further. Their solution is based on the same approach, and take advantage of dbSNP database so that the positions of already known SNP don't have to be stored in the final file. To further shrink the file size, positions of novel SNPs are not stored individually but as distances from the previous SNP. This improvements, together with a brand new compression function and an haplotype based trick, push the file size for a single complete genome down to 2.5Mb.

The main disadvantage is that you need the reference sequence and the dbSNP database in order to reconstruct your genome, but when dealing with thousand of them it is a minor drawback.

Unfortunately, less can be done for the bunch of sequence related information so useful for research purposes (such as base and reads quality scores, genotype scores, alignment metrics and so on...)...it seems that you still need a tower of drives to store them all! However, lot of efforts has been applied also to this area, since the possibility to reanalyze older dataset with new techniques and new tools can lead to new discoveries. By now there is also a prize for data compression, the sequence squeeze competition, that also has updated information on the best performing tools.

NGS: News on Genomic Studies

Pages

Sunday, 28 July 2013

Pubmed highlight: review on ngs data format, visualization and analysis tools

Monday, 22 July 2013

Pubmed Highlight: silencing the extra copy of chromosome 21 in Down’s syndrome cells using the XIST gene

Translating dosage compensation to trisomy 21.

Source

Abstract

Friday, 19 July 2013

PubMed Highlight: A new type of viruses, the Pandoraviruses.

Pandoraviruses: Amoeba Viruses with Genomes Up to 2.5 Mb Reaching That of Parasitic Eukaryotes

Wednesday, 17 July 2013

PubMed Highlight: RNA-Seq analysis made simple

Wednesday, 10 July 2013

PubMed Highlight: Evaluation of bioinformatic tools for prediction of functional impact of missense variants

Made with Love (and Science): first child born following embryo screening with NGS

Tuesday, 2 July 2013

Incredible But True: Life Technologies introduces an amplicon-based exome sequencing kit

Sunday, 30 June 2013

PubMed Highlight: Prioritization of synonymous variants

Thursday, 27 June 2013

PubMed Highlight: The state of the art in Genomic Medicine

Tuesday, 25 June 2013

Computational diet: how to make your Gb human genome as light as few Mb