Saturday, 31 December 2011

Disease genes identified by NGS in 2011

In the last post of the year on our baby NGS blog, I believe it's worth mentioning the survey performed by Dan Koboldt on Massgenomics about the disease-causing mutations discovered by NGS in 2011. Although the numbers are impressive (60 studies linking genetic variation to mendelian disease) to be honest I thought they were even higher.
However, as the author states at the end of his post "one can only imagine what we’ll know by next December, as large federally-funded initiatives ramp up their efforts to systematically apply exome and whole-genome sequencing to inherited disorders".
I'm sure everybody agrees that 2012 will be a very exciting year for Next Generation Sequencing and Personal Genomics.

Happy ...

CNV in zebrafish genome

Scientists know that zebrafish is a wonderful animal model for developmental studies. However, from the genomic point of view Danio rerio is often a nightmare due to the high level of sequence variation even within zebrafish strains.
It is thus not surprising to read in a recent article on PNAS that the "amount of copy number variation is four times that previously observed in other vertebrates, including humans". The authors of the study come to this conclusion after constructing a genome-wide, high-resolution CNV map comprising 6,080 CNV elements and encompassing 14.6% of the zebrafish reference genome. The analysis has been carried out using 80 zebrafish genomes, representing three commonly used laboratory strains and one native population.

Friday, 30 December 2011

Will it be possible to build a online game capable of predicting phenotype from genotype?

Maybe this day is not so far away. In the field of proteomics, the gaming community is already starting "playing" a relevant role in creating accurate protein structure models using the multiplayer online game Foldit.
This topic is reviewed in the recent Games with a scientific purpose article on Genome Biology:
"The scientific value of the Foldit system was first demonstrated by showing that game players could solve specific structure prediction problems. In the first major publication to discuss Foldit, Baker and colleagues (6) showed that game players could, in many cases, generate better structure predictions than the state-of-the-art Rosetta structure prediction program. Foldit then unleashed their army of folders on the task of solving the structure of the Mason-Pfizer monkey virus retroviral protease, a problem that was previously intractable to both computational and experimental methods (7). After 3 weeks of game play, the best solutions were screened and, remarkably, a solution to this previously unsolved structure was identified and subsequently validated. This achievement established Foldit as a legitimate resource for the structural biology community"
Zoran Popović, one of the founders of the Foldit project, has a audacious ambition for their community of game players: "Our ultimate goal is to have ordinary people play the game and eventually be candidates for winning the Nobel Prize".
Considering the enormous amount of data generated by NGS, and the importance of data curation, it won't be surprising to see a future for multiplayer online games in Personal Genomics.

Statistical methods on detecting differentially expressed genes for RNA-seq data

Interesting paper published this month. From the abstract:

"To detect differentially expressed genes under two conditions, statistical methods such as Poisson distributions are often used. However, to accurately detect differential expression of gene with low expression levels, more powerful statistical methods are desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates).

Through simulation study and real data analysis, the authors find that the Wald test with the data being log transformed is more powerful than other methods, including the likelihood ratio test, the variance stabilizing transformation test, the conditional exact test and the Fisher exact test.

When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes."

Chen et al. (2011) Statistical methods on detecting differentially expressed genes for RNA-seq data. BMC Systems Biology 5(Suppl 3), S1

I'll read it soon and come back with comments.

Friday, 23 December 2011

Useful tools to keep in you toolbox

I report this post from a really good bioinformatic blog pointing to biotoolbox a free collection of tool based on bioperl...
This is a really good job, including scripts for manipulation and basic analysis on NGS and microarray data that will save you a lot of time and some headaches! Thanks for sharing! 

Thursday, 22 December 2011

WES is contained in WGS...maybe not completly!

When you look at Next Generation Sequencing and think about whole genome (WGS) or whole exome sequencing (WES) there are several options you can choose from, both in terms of sequencing platforms and exome enrichment kits. Most of times a researcher is interested in using the data generated for sequence variant discovery but initial comparisons of datasets showed that the results obtained using different techniques are not as similar as we can hope for. So the question "which is the best solution?" naturally arises.
In the past few months the group of Michael Snyder at Stanford University has published a pair of interesting papers on Nature Biotechnology addressing this question: the first one (September) compares the three main solutions for exome capture (Nimblegen, Agilent and Illumina TrueSeq), the second one (December) is a performance comparison on WGS performed with Illumina sequencers and Complete Genomics innovative platform (see this article in Science). Both studies have been extensively commented also in the GenomeWeb blog (the first here and the second here). The take-home message is that there is no winner within sequencing platforms... and this is not such a big surprise...but they also found that WGS and WES are not simply like matrioska dolls contained one in the other!

In their paper on exome techniques the authors have also compared SNVs detected with exome sequencing with those detected from a whole genome sequencing on the same sample. Results are quite interesting! They found that, considering the coding regions, a significant proportion of the identified variants (few thounsands) are different comparing WGS and WES data. In most cases this is explained by the fact that WGS cover some exon regions that capture kits simply miss meanwhile WES have a deeper coverage and so allows detection of variants in some regions which are low-covered with the whole genome sequencing. However a few hundred of variants remain that are uniquely identified by one of the two techniques...and this is quite away from expected...since one usually think that a WGS approach with adequate coverage will identify all the variants from a WES (plus of course many others located in the non-coding regions). Moreover about 300 SNVs that were identified by all three of the exome sequencing platforms but not by WGS are associated with human diseases, suggesting that exome sequencing can pick up variants with clinical relevance that WGS alone would miss.
"It was definitely surprising to me that the exome [sequencing] was finding information that the genome [sequencing] did not pick up," said Snyder. "Some of these are important regions — you can't just blow these off.". Given these results, it might make sense to do both WGS and exome sequencing "to make sure you are really covering your exome variants," he said. "If you can afford it, that's a good thing to do since you will get extra information from your exome that you would not have gotten from the genome."I post also an image from the original article that help visualizing this idea!

The authors also compares various aspects of the three most used exome sequencing kits (Agilent, NimbleGen and Illumina  TrueSeq) giving a clear picture of pros and cons of every kit. Even if it is not possible to chose a winner, these information can be extremly useful when choosing the correct enrichment package for your application.

Best Wishes!

Wednesday, 21 December 2011

Repetita (non) juvant?

In the new issue of Nature Reviews Genetics (Jan 2012) we can read a very interesting and useful review about computational tools for next generation sequencing.
“Rather than describing the algorithmic details of these programs, we will discuss their shared strategies for solving repeat-induced analysis problems in each situation and address some of their limitations” write Todd J. Treangen and Steven L. Salzberg in their introduction.
In this review the authors compile a guide on the challenge represented by the abundance of repeats in genome assembly and RNA-seq analysis. Two list of computational tools are provided, one for NGS genome alignment and assembly, and one for NGS transcriptome analysis.
The range of tools available is wide and grows rapidly, making the analysis of repeats easier, with the  ultimate purpose to learn more about their role in disease and their contributions to gene function, genome structure and evolution.

Tuesday, 20 December 2011

World Map of High-throughput Sequencers

A couple of interesting considerations: please zoom to China and count the number of NGS instruments. Then move to Italy....

What to do you think about having a Neanderthal ancestor?

With the release of the draft sequence of the Neanderthal Genome, the questions about our evolution and our relationship with our bigger-body and bigger-brain cousins have gained new life. Comparative analysis have shown that the two Homo species are closer than previously supposed and that the two groups may have interbred before the Neanderthal extinction. The last studies in the field support the theory that less than 100,000 years ago Neanderthals and the ancestors of all non-African Homo sapiens lived side by side, and a few of them may have shacked up. So some of our ancestor may share Neanderthal DNA and some of us are more "Neanderthalian" than others...
What do you think about yourself? Do you feel like a big and smart H. neanderthalensis or just a common H. sapiens?
You may not be so interested, but Eric Durand, formerly at the Department of Integrative Biology at the University of California (Berkley) and now employed by 23andMe, took this question very seriously. He developed a bioinformatic tool to determine exactly how much Neanderthal is in you! This will definitely take the ancestry genetic test package offered by 23andMe to the ultimate level! As trivial as this could seem, the new interest in Neanderthal genetic testing has captured the attention of the Nature Blog, as you can read here.

Additional readings: Sleeping with the Enemy, The New Yorker; The Perfect Gift This Holiday Season: The Neanderthal Test, Discovery Magazine.

UCSC Genome Browser now supports Variant Call Format (VCF)

I just received this interesting email from the UCSC Genome Bioinformatics Group (see below). The official support of the Variant Call Format will allow to displays within the UCSC Genome browser window variant base calls from the data generated from your lab as well as from several personal genomes that have been made publicly available.

From: Steve Heitner
Subject: [Genome-announce] New Variant Call Format (VCF) support
Date: 19 dicembre 2011 23:58:16 GMT+01:00

We are pleased to announce that the UCSC Genome Browser now supports Variant Call Format (VCF). VCF is a flexible and extendable line-oriented text format developed by the 1000 Genomes Project for releases of single nucleotide variants, indels, copy number variants and structural variants discovered by the project. Similar to bigBed, bigWig and BAM, the Browser transfers only the portions of VCF files necessary to display viewed regions, making VCF a fast and attractive option for large data sets. VCF files will need to be compressed and indexed using the tabix package available from This new format is available for use in custom tracks and data hubs. For more information about VCF and tabix, please see our VCF Track Format help page at
Steve Heitner
UCSC Genome Bioinformatics Group

Genome of a Genghis Khan's descendant sequenced

The genome of a great - great - ... grandson (34th-generation) of the legendary conqueror Genghis Khan has been sequenced.
This was the first individual genome sequencing of a Mongolian, said Zhou Huanmin, project leader and head of the biological research lab at the Inner Mongolia Agricultural University. The results of this study are important for the detection of ethnicity-specific genome inheritances and the evolutionary features of Mongolians.
The the research team will continue to sequence the genomes of another 199 ethnic Mongolians and build a database consisting of Mongolian genetic code.

Monday, 19 December 2011

23andMe Christmas sales

As reported by the excellent Italian "my GenomiX" blog site, 23andMe is offering a $23 Christmas discount for their SNP genotyping analysis. More info can be found on the 23andMe site.
Unfortunately ordering from Italy you have to add $63.95 of shipping cost....

Best Xmas gifts...If you are fond of DNA science

Christmas is coming! So don't miss the latest and coolest DNA gifts on the market.
The GenomeWeb blog has a nice list for you:
Instead of Seven Lords A-Leaping, Try These | The Daily Scan | GenomeWeb

Thursday, 15 December 2011

Some stats about the Next Generation Sequencing Market

"Competition is the soul of trade" we say in Italy. Here are reported some statistics about the new market born around the Next Generation Sequencing techniques. Illumina has the bigger slice of the cake, but new competitors are growing fast... Who will better satisfy the hungry researchers in 2012?

Tuesday, 13 December 2011

Do you want $20000 + your genome sequenced for free?

Know a great software engineer? If you refer and they hire him ......

Apparently is becoming more and more difficult to find skilled informatics professionals.

Sunday, 11 December 2011

Some interesting reflections on the evolution of the Ion Torrent platform

... (and a hypothetical 320 chip) can be found on K. Robison's Omics! Omics! blog.

Cosmetic mutations...

This blog is undergoing some "cosmetic mutations", we are still trying to find our identity while preserving the readability of the site.

Another gold mine of genomic variations available

Digging in the internet I found that on Nov 16th the NHLBI Exome Sequencing Project (ESP) has released the data relative to the sequencing of approx. 5400 (yes, 5400!!!) exomes on their Exome Variant Server.
You can select as a query either your gene of interest or a particular chromosomal region. The SNV data can be downloaded in either text or vcf format.
The goal of the NHLBI GO Exome Sequencing Project (ESP) "is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders". The study has been performed on European American and African American populations.

Wednesday, 7 December 2011

Don't you feel good? Spit on your smartphone!

"Hi sir. How are you feeling today? Not so well? Could you spit on the phone screen, please". This is not exactly what you could expect to hear from your doctor but it could happen in a future not so far.
Researchers of the Korea Advanced Institute for Science and Technology in Daejeon are developing “a process that allows a droplet of fluid to be pressed against a smartphone’s touch screen for instant disease detection” as the article on Forbes magazine website reports about this potential future technology. Wait a second and try to think about the many potential applications of this feature! A doctor could theoretically check several biomedical parameters from blood or urine tests performed by patients at home by simply laying a drop of the biological sample on the smartphone's screen (the researchers are also developing a special cover or film to protect the touchscreen from contamination during testing). The touch screen of modern smartphones, in fact, extends far beyond the function to sense a finger touch, potentially it could detect much smaller things, such as droplets of human fluids containing biomarkers indicating a disease status, Forbes adds.

Tuesday, 6 December 2011

Genomic research: a new field requires young people!

Genomics studies are the future, or so we believe! So, as the genomics community is rapidly growing it is a great pleasure to see young scientists producing great works in this exciting field of research. Stay tuned with the latest news from Sixth Annual Young Invetigators, with studies covering a wide variety of topics such as  single-cell genomics, the role of microRNA in disease, and the uncovering of new biomarkers!

Friday, 2 December 2011

In the era of Clouds we still need the Bricks

A recent article in New York Times underlines once again the indredible rapid excalation in genomic data production boosted by NGS technology and the set-up of huge sequencing center, like the famous BGI (a monster with 162 HiSeq2000 sequencher capable of producing about 2,000 genomes/day).
So now it seems that the real bottleneck in genomics is no more the data production but its storage, analysis and sharing. The analysis step open a large new business for hi-tech and bioinformatic companies and rapid progresses are expected. However I think that even with super-fast computers and super-efficient analysis pipelines we can not multiply brains the researchers' heads and the problem to found the biological meaning of all these data are the challenge of today and tomorrow research.
But the real challenge come with the sharing part...In these day with everybody speaking about cloud technologies, rapid data transfers and the almost futility of local storage, we are in fact facing the hard problem of delivering the genomic data from the service provider (say the BGI in china) to the researcher who have to analyze the results (say me in my little lab in Italy). As already pointed out in a review on Nature, the Tb scale data easily reached by genomic projects require to get back to Bricks, also known as the old, concrete hard drives. The experts at BGI agreed that the most rapid and effordable way to deliver the results is to save them in a hard-drive and send it well packed with FedEx. In someway it is like we are suddenly back from the actual time of mobile smartphones to the times of hand written letters...Think to a large consortium, involving different research unit located in differetn countries, working on a large genomic project: the work will be consistently slowed down and the costs will raise since every unit in the consortium have to wait days for the results and analysis to be physically delivered  on a hard-drive...We are no more in the dreamy world of instant transfer. Of course after initial analysis you only have to move a small subset of the total data, but it still the fact that we are facing a situation in which moving the researchers to data could become cheaper than move the data to the labs and analyzing and moving the data would cost more than resequence.
So consider a barrow in your lab equipment: it's cheap and with NGS expanding there will be a lot of bricks to move!

Say hello to granpa LUCA

About a year ago I was reading the last pages of a semi-scientific adventure book: "The Fifth Day" by Frank Schatzing. Without spoiling too much (in case someone else is going to read it), I've found two intriguing concepts in the story: one, we may not be the most intelligent and powerful species on earth, and two, maybe we share this planet with other forms of intelligent organisms, so different in their appearance and biological organization that we may not even recognize them, until they want to directly interact with us... But even more fascinating to me is the fact that the author imagines a giant multicellular organism spread upon the oceans in which every single cell is as well a separate entity and a part of a super-organism communicating by a complex exchange of DNA molecules. Maybe this is not so cool compared to Indiana Jones adventures, but what can be more fascinating for a lab guy working on genetics?
Ok, now I've found in recent scientific papers that this super-organism hypothesis may be closer to a real fact that to a science-fiction story. With the exponential rise in the accumulation of genomic and proteomic data from a lot of different organisms, both eukaryotes and prokaryotes, a lot of efforts have been also made in recent years trying to reconstruct the features of the Last Unique Common Ancestor... LUCA the grandpa of us all!
LUCA is considered the unique organism that pre-date Eukarya, Bacteria and Archea and its study can provide useful information on the essential biochemical and genetic mechanisms of a cell. Based on the papers published in the last 2 years, LUCA seems to be a kind of RNA-based super-organism, huge mix of cells that span across primordial oceans constantly exchanging pieces of RNA and useful proteins in an attempt to obtain an efficient energy processing machine. As Michael Marshall summarize on NewScientist: "The latest results suggest LUCA was the result of early life's fight to survive, attempts at which turned the ocean into a global genetic swap shop for hundreds of millions of years. Cells struggling to survive on their own exchanged useful parts with each other without competition - effectively creating a global mega-organism. [...] In order to cope, the early cells must have shared their genes and proteins with each other. New and useful molecules would have been passed from cell to cell without competition, and eventually gone global. [...] It was more important to keep the living system in place than to compete with other systems, says Caetano-Anollés. He says the free exchange and lack of competition mean this living primordial ocean essentially functioned as a single mega-organism".
So, Christmas is coming, add a chair for LUCA!