Pages

Friday, 29 March 2013

PubMed Highlight: Next-Generation sequencing visualization

I've just finished an intensive course on NGS data analysis where command line based soutions where of course the best reported way to manage and make sense of data.
Playing with scripts, unix code and R language make you feel a sort of bioinformatic power. You start to blame all those wet-lab collegues spending hours on excel spreadsheets. You are amazed by the results of your last programming trick and effectivness of your command-line skills. Even if this make you proude, keep in mind that a screen full of symbols and over-a-million-row tables have to most og biologist and geneticists the same appeal of the flowing characters of The Matrix...As in the famous movie, not everyone can see the meaning behind the code, most of them will just see a bunch of chars and number, doubting that this is The real world!
A good visualization of genomic data from NGS experiments would make your results nicer to see, easier to explain and explore. Moreover, a colorful alignments of reads in genome browser style or a circos graph sure make a better impact when you show them in your presentations! The scientific community constantly ask for visulization tools that simplify the task of explaining and exploring NGS data, so that they became accessible to everyone, even to the old-school ones.

The last special issue of Briefings in Bioinformatics make an extensive review of the main visualization tools, with an overview on their peculiar advantages and main features. Web-based browsers, UCSC Genome Browser, IGV, Tablet, Bamview and GBrowse are all covered, making this issue the ideal answer to the collegue asking you: "I've just received this great NGS data, but what are all these bam and vcf files? I want to see them nicely placed on my favourite chromosome!".

Main articles in the special issue:
Jun Wang, Lei Kong, Ge Gao, and Jingchu Luo
A brief introduction to web-based genome browsers

Robert M. Kuhn, David Haussler, and W. James Kent
The UCSC genome browser and associated tools

Lincoln D. Stein
Using GBrowse 2.0 to visualize and share next-generation sequence data

Oscar Westesson, Mitchell Skinner, and Ian Holmes
Visualizing next-generation sequencing data with JBrowse

Helga Thorvaldsdóttir, James T. Robinson, and Jill P. Mesirov
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration

Iain Milne, Gordon Stephen, Micha Bayer, Peter J.A. Cock, Leighton Pritchard, Linda Cardle, Paul D. Shaw, and David Marshall
Using Tablet for visual exploration of second-generation sequencing data

Tim Carver, Simon R. Harris, Thomas D. Otto, Matthew Berriman, Julian Parkhill, and Jacqueline A. McQuillan
BamView: visualizing and interpretation of next-generation sequencing read alignments

Michael C. Schatz, Adam M. Phillippy, Daniel D. Sommer, Arthur L. Delcher, Daniela Puiu, Giuseppe Narzisi, Steven L. Salzberg, and Mihai Pop
Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies

Thursday, 28 March 2013

Elementary elements in bioinformatics

In these days I'm attending to an intensive course on NGS data analysis...Everyday we deal with about 9h of bioinformatics, both theory and scripting...And a tons of useful tools have been cited during the course...
Since the names of these softwares are all but easy to remember, I found myself wondering for a summary that give a compact and organized overview and quick access to the main ones.

Considering that bioinformatics tricks have became as essentials as chemical elements, The Elements of Bioinformatics table from Eagle Genomics is an efficient and funny answer to my needs
If programming, analyzing DNA data and talking about stats and complex biology don't satisfy your need to look nerdy, use this table to remember strange-named tools should improve your reputation as a real geek!!

Have fun (if you read this blog I'm sure you will!)

Thursday, 21 March 2013

PubMed Highlight: The origin, evolution and functional impact of short insertion-deletion variants identified in 179 human genomes

The role of short indels (<50 bp) as main player shaping human genome variability and contributing to various mendelian disease has been underlined by several recent findings.
However a detailed genome-wide assessment of indels impact and dsitribution still missing...until now.

In this interesting paper appeared in Genome Research, Montgomery et al. address exactly this question and with amazing results. First of all authors as to deal with the short Indels calling challenge that is one of the biggest issue when analyzing NGS data. Starting with DNA sequences from 179 individuals from 3 population groups, they made several optimization to the standard pipeline used by the 1000 Genome Project to obtain a set of high quality indels. Even if indels in homopolymeric regions remain out of reach, the improved pipeline described in the paper is certainly a guideline for anyone working in the field. Among the other interesting findings, authors confirmed that rates of indel mutagenesis are highly heterogeneous, with 43-48% of indels occurring in 4.03% of the genome (loci defined as indel hotspots by the authors), and they proposed fork stalling and template switching (FoSTeS) together  with polymerase slippage as the main mechanism originating the indels.

Take a look!


The origin, evolution and functional impact of short insertion-deletion variants identified in 179 human genomes

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing 3 diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43-48% of indels occurring in 4.03% of the genome we classify as indel hotspots, while in the remaining 96% their prevalence is 16-times lower than that for SNPs. Polymerase slippage can explain upwards of 3/4 of all indels, including virtually all hotspot indels. The remainder are mostly simple deletions in complex sequence, but insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage showing an excellent fit to observed levels of variation, which enables us to identify a minority of indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogenetity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, as is well known of frameshift mutations in coding regions, but also longer indels and indels affecting multiple functionally constrained nucleotides are more strongly selected against in various non-coding contexts. We further find that indels are enriched in associations with gene expression, and find evidence for a contribution of nonsense-mediated decay to this association. Finally, we show that indels can be integrated in existing GWAS studies, and although we do not find direct evidence that potentially causal protein-coding indels are enriched with strong associations to known disease-associated SNPs, many of our findings suggest that the causal variant underlying some of these associations may be indels.

Wednesday, 20 March 2013

PubMed Highlight: the genome of HeLa cell line has been sequenced


HeLa cells, sampled in 1951 from the cervical tumor of a woman named Henrietta Lacks, are probably the world's most commonly used human cell lines and have been used as a standard for understanding many fundamental biological processes, leading to more than 60,000 scientific publications.
In a new study published on G3 (Genes, Genomes, Genetics), scientists announce they have successfully sequenced the genome of a HeLa cell line. While previous work had shown that they have extra copies of each chromosome and sometimes multiple extra chromosomes, the analysis of the HeLa genome revealed additional features commonly associated with cancer cells like losing healthy copies of genes. In particular, the researchers found that countless regions of the chromosomes in each cell were arranged in the wrong order and had extra or fewer copies of genes.

The results of the study are also discussed in a Nature commentary.

Published Early Online March 11, 2013, doi:10.1534/g3.113.005777
G3 March 11, 2013g3.113.005777

The Genomic and Transcriptomic Landscape of a HeLa Cell Line

Jonathan J. M. Landry, Paul Theodor Pyl, Tobias Rausch, Thomas Zichner, Manu M. Tekkedil, Adrian M. Stütz, Anna Jauch, Raeka S. Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O. Korbel, Wolfgang Huber and Lars M. Steinmetz

Abstract

HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies done using HeLa cells requires accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. The extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.