NGS: News on Genomic Studies: January 2012

Monday, 30 January 2012

NHGRI presents Current Topics in Genome Analysis 2012

The National Human Genome Research Institute (NHGRI) has recently started a lecture series covering main topics in Genomics and Bioinformatics. The lectures will be held from January 11th to April 25th and will cover several interesting topics from Biological Sequence Analysis (how to use BLAST, Genome Browsers and so on), to Next-Generation Sequencing and Genomics (see below for a complete list of the events). Videos of the lectures as well as slides in pdf will be available at this NHGRI page.

Ps. Thanks to the myGenomix blog to bring this to my attention!

January 11 - The Genomic Landscape circa 2012, Eric Green, NHGRI

January 18 - Biological Sequence Analysis I, Andy Baxevanis, NHGRI

January 25 - Genome Browsers, Tyra Wolfsberg, NHGRI

February 1 - Biological Sequence Analysis II, Andy Baxevanis, NHGRI

February 15 - Regulatory and Epigenetic Landscapes of Mammalian Genomes, Laura Elnitski, NHGRI

February 22 - Next-Generation Sequencing Technologies, Elaine Mardis, Washington University at St. Louis

March 7 - Introduction to Population Genetics, Lynn Jorde, University of Utah

March 14 - Genome-Wide Association Studies, Karen Mohlke, University of North Carolina

March 21 - Pharmacogenomics, Howard McLeod, University of North Carolina

March 28 - Large-Scale Expression Analysis, Paul Meltzer, NCI

April 11 - Genomic Medicine, Bruce Korf, University of Alabama at Birmingham

April 18 - Applications of Genomics to Improve Public Health, Colleen McBride, NHGRI

April 25 - Genomics of Microbes and Microbiomes, Julie Segre, NHGRI

Friday, 27 January 2012

What is a ... "MitoExome"?

The latest issue of Science Translational Medicine reports an article describing the target sequencing of the mitochondrial DNA and exons of 1037 nuclear genes encoding mitochondrial proteins in 42 unrelated infants with clinical and biochemical evidence of mitochondrial oxidative phosphorylation disease. 10 patients had mutations in genes previously linked to disease while 13 had mutations in nuclear genes not previously linked to disease. The pathogenicity of two such genes, NDUFB3 and AGK, was supported by complementation studies and evidence from multiple patient. For the other half of the patients studied, the genetic mutations causing mitochondrial disorders remain unknown.

The study has been carried out by several groups at the Broad Institute, Massachusetts General Hospital, and elsewhere. The team will apply the same technique in adults who develop the disease later in life. Researchers estimate that mutations in more than 200 different mitochondrial genes could give rise to mitochondrial diseases, but to date, only about 100 of these disease genes have been identified.

Molecular Diagnosis of Infantile Mitochondrial Disease with Targeted Next-Generation Sequencing

Sci Transl Med 25 January 2012: Vol. 4, Issue 118, 118ra10

Thursday, 26 January 2012

"If you can't beat them, join them!" (or... buy them): Roche's hostile bid for Illumina.

"If you can't beat them, join them!" is a proverb often used in politics and war. In this case we use it for a financial strategy chosen by the "commanders" of Roche. Some rumors started to appear in the past weeks, but now Roche comes out with an official bid of (please take a sit) $5.7 billions to acquire Illumina, more precisesly $44.50 per share in cash, an 18 percent premium over Illumina's closing share price of $37.69 yesterday.
Apparenlty Roche started to negotiate silently with Illumina to find a way to reach a deal, but Illumina decided to refuse any kind of offer. Franz Humer, Roche’s chairman, decided, after numerous efforts, to release a public letter to Illumina chief executive and chairman Jay Flatley bemoaning “the lack of any substantive progress in our efforts to negotiate a business combination between Illumina and Roche” and a January 18 letter confirmed a lack of interest by Illumina’s board.

By acquiring Illumina, Roche will strengthen its position in the DNA sequencing and microarrays market "to address the growing demand for genetic/genomic solutions." More importantly, however, is the hope that Illumina's technology will "help accelerate the transition of DNA sequencing into clinical routine diagnostics" the firm said.

At the moment the "commanders" of Illumina refuse to accept all the offers made by Roche, but, as a smart conqueror, Roche's CEO Severin Schwan try to seduce the lower military ranks: "we therefore believe that the shareholders of Illumina will consider positively the opportunity to sell their shares at a price above the current market values."

Tuesday, 24 January 2012

More than 50 millions unique variants in dbSNP

Today the MassGenomics blog reports an EXCELLENT survey of the current state of dbSNP written by Dan Koboldt.

In the image above it is impressive to notice the impact of the 1000 Genome Project on the identification of novel human sequence variations. We have to keep in mind that, despite its name, dbSNP collects also insertion-deletion variants (indels), multiple nucleotide polymorphisms (MNPs), as well as other classes of mixed polymorphisms (such as short tandem repeats). This is a must read if you are working in the personal genomics field, a sort of mini review with interesting stats about these class of human variations. Did you know for instance that in the Built 135 of dbSNP there are more than 40,000 variants predicted to cause premature termination (nonsense) or a shift in translation frame (frameshift) in the encoded protein?

Friday, 20 January 2012

VarSifter: a useful software to manage NGS data

For beginners (like us) in Next Generation Sequencing data analysis, to find a tool that offers a simple and user friendly manner to analyse and visualize the huge ammount of data produced by exome sequencing, it's like to see a oasis in the vastness of a desert. This is the feeling I had when I started to use VarSifter for the first time.

"VarSifter is a program designed to view massively parallel sequencing variation output. It allows sorting on any field (as well as combinations of fields), and filtering on different types of information (variant type, inheritance, etc). Additionally, it allows custom filtering. The program is written in Java, and should run on any platform with a current Java Virtual Machine." This is the description provided by the authors of the program, Dr. Jamie K. Teer and Dr. James C. Mullikin of the National Human Genome Research Institute (NHGRI) at NIH.

The software allows to use several filters that permit to analyse and compare a multiple types of data related to genome variants. A set of precomputed tools allows, for example, to set which are your control and case samples, which are the affected or normal samples. From this point on it is possible to analyze the data by applying many other filters. For example I can filter for those variants presents only in case samples but not in controls, or discriminate the homozygous variants from the heterozygous ones and so on. Furthermore VarSifter offers the possibilty to create a personalized set of filters on the basis of your needs. You can see a presentation of VarSifter made by Dr. Teer here, and find here the paper describing the software on Bioinformatics.

The NGS technologies allow the generation of an incredible amount of data. Softwares like VarSifter can help us to find the path in this infinite land full.

Thursday, 19 January 2012

60 genomes in the clouds....

The diffusion of NGS technology has made clear that no one has the ability to analyze in depth all the data produced in large scale WGS or WES projects.

It is thus not surprising that many NGS studies, either carried out by a large consortium or by a single research center or private company, are applying an open-source model: we provide the data, you do your analysis, we all share the honours.

This means that today a huge amount of genomic data (complete exomes and genomes) can be accessed and analyzed for free, thus creating the opportunity of performing genomic research with no more than a PC (...well, maybe not your old laptop...).

In this scenario, here is an interesting initiative from Complete Genomics, a life sciences company that has developed and commercialized an innovative DNA sequencing platform: a few months ago they have released on BioNimbus Cloud a dataset of 60 complete human genomes at high coverage (average 55x). You can read the Complete Genomics annoncement here and access the data here. This dataset is intriguing for several reasons:

- these are complete genomes from different population;

- all have high coverage;

- they include some trios;

- data can be accessed through the BioNimbus cloud computing service.

This last aspect means that one can subscribe to BioNimbus Cloud and use their virtual machine hosted by Amazon Elastic Cloud to do the NGS data analysis. Not only you no longer have to produce the sequences, you don't even need the computational power to perform the bioinformatics analysis! We are living in a new genomic era, in a rapidly evolving world where, to do research, you will only need some bioinformatic skills and an open mind to come up with a good scientific question. The rest is provided by the NGS community!

Wednesday, 18 January 2012

BarraCUDA

BarraCUDA is a GPU-accelerated short read DNA sequence alignment software based on Burrows-Wheeler Aligner (BWA).

Researchers from the University of Cambridge and the University College Corkused Nvidia's Compute Unified Device Architecture (CUDA) have designed the software to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. Quite impressively BarraCUDA demonstrated a throughput six times the speed of a CPU core for gapped alignment and even faster when gap opening is disabled. Mapping accuracy is not compromised by using GPUs.

Notably many modern supercomputers (including the Chinese Tianhe-1A, the second most powerful system in the world) also contain multiple GPU nodes on top of traditional nodes with CPUs to take advantage to the parallel computing capability of GPUs

A BMC Research Note describes the features and performances of BarraCUDA. Additional info can be found at the Project Home Page.

Flash Report: most impressive NGS papers of 2011

Kevin Davies, founding editor of Nature Genetics and Bio-IT World, indicates in the NGS Leaders Blog some of the top NGS papers published in 2011.

Kevin also remind us that BGI published twenty NGS papers in 2011.

Saturday, 14 January 2012

Freely "Explore" the Pediatric Cancer Genome Project data

St. Jude Children’s Research Hospital announced in a press release the launch of “Explore”, a freely available website for published research results from the St. Jude Children’s Research Hospital – Washington University Pediatric Cancer Genome Project (PCGP). The PCGP is the largest effort to date aimed at sequencing the entire genomes of both normal and cancer cells from pediatric cancer patients, comparing differences in the DNA to identify genetic mistakes that lead to childhood cancers (see my previous post).

The Explore website is designed to expand access to high-quality genomic data related to pediatric cancers, accelerate discovery and hypothesis testing, and provide comprehensive visualizations of the data. Explore is also designed to make it easier for clinical and basic researchers to search published results from the PCGP. Explore allows researchers to access the genome project’s unique, published data specific to pediatric cancers and to make discoveries of their own.

To access Explore, go to http://explore.pediatriccancergenomeproject.org/.

I tried to play with Explore, after a simple registration obtained by providing my email address. The site is quite sophisticated but not always intuitive and at times apparently unresponsive. One should know that right now the access is granted to 16.3% of the 600 cancer genome data (only the published ones), corresponding T-ALL and RB tumors. Data include Copy Number Variation, Single Nucleotide Variations, Structural Variations and Gene Expression.

Pediatric Cancer Genome Project

Two years ago St. Jude Children's Research Hospital and Washington University School of Medicine in St. Louis announced a $65-million, three-year joint effort to identify the genetic changes that give rise to some of the world's deadliest childhood cancers. The aim was to decode the genomes of more than 600 childhood cancer patients. No one had sequenced a complete pediatric cancer genome prior to the PCGP, which has sequenced more than 250 sets to date.

The project is starting now to produce important results, with two studies published on the latest issue of Nature.

In the first one (The genetic basis of early T-cell precursor acute lymphoblastic leukemia) researchers sequenced the genomes of cancer cells from twelve patients with early T-cell precursor acute lymphoblastic leukemia and discovered that genetically, the subtype had more in common with a different type of leukemia than with other acute lymphoblastic leukemias. This might point the way toward better treatments, according to the St. Jude researchers.
In the other study (A novel retinoblastoma therapy from genomic and epigenetic analyses) investigators sequenced the tumors of four young patients with retinoblastoma, a rare childhood tumor of the retina of the eye. The finding also led investigators to a new treatment target and possible therapy.

A "negative cost" for the genome in the future?

Gloria Oh/MEDILL Northwestern University's Genetics Community offers insight into the $1,000 genome and dropping costs as well its implications on clinical application.

The short movie is accompanied by an article on Medill Reports discussing some of the implications of the dropping cost of a full genome sequence. It contains a couple considerations by Atul Butte, Associate Professor and Division Chief of Systems Medicine in the Department of Pediatrics at Stanford University School of Medicine: “There’s no place to put the hard disk. If we were to save a short-reads per patient, it would be a terabyte of information. It would be cheaper to re-sequence and retrieve a person’s DNA at a trivial zero-genome cost then to store the sequence.” Butte also adds that “we’ll eventually head to a zero-dollar genome, you could imagine a negative genome in the future, where the insurance pays for your genome sequence.”

Although we read similar statements before, it's still shocking to think about it now.

Friday, 13 January 2012

NGS: a ramp up for Stem Cells?

NGS techniques are rapidly changing many aspects of the research world. Their wide potential applications are many and probably not all are already known.On the new Nature Biotechnology issue (January 2012), N.D. DeWitt et al describe one of the most challenging and interesting support that NGS techniques can offer in biomedical research: an engagement in the Stem Cell field. “There is an urgent need”, DeWitt says, “to ramp up the efforts to establish stem cell as a leading model system for understanding human biology and disease states.” Several analysis on human iPS cells (hiPSCs) and on human ES cells (hESCs) detected structural and sequence variations under some culture conditions. It is not yet clear if these variations are present in the original cell (for hiPSCs) or if they are due to the process of deriving cells or even to the culture conditions. What is clear is the need to understand what causes the variations and to determine what kind of changes in cellular behaviour these variations lead to.

Furthermore, it is important to note that hESCs and iPSCs offer a controlled cellular system for the mapping of molecular changes during development and differentiation. It is also well known that cellular and gene expression pathways implicated in cancer are increasingly found to be active in ESCs.

Next-generation sequencing technology will be fundamental for moving these critical studies forward, and will represent a powerful tool to deepen our knowledge on genomics, epigenomics and transcriptome changes that occur in hESCs and hiPSCs.

Thursday, 12 January 2012

AmpliSeq Inherited Disease Gene Panel

Among all these exciting news, I did not fully appreciated the fact that Life Technologies announced (early access in Q1, full release in Q2 2012) the AmpliSeq Inherited Disease Gene Panel targeting genes implicated in over 140 inherited diseases (300 genes, 10,000 amplicons) at ~30X coverage using the 316 chip.

A $1 genome by 2017?

It sounds unbelievable but with the current decline in sequencing costs the $100 genome will come by mid-2014. By 2016, the faster pace set by Life Tech will bring full-genome sequencing down to a cost of just $3. Do you believe it?

This and other speculation in the "The Promise and Peril of the Mass-Market Genome" article by Alex Planes (originally posted on the The Motley Fool).

Flash Report: Cancer Genome and Exome Sequencing in 2011

As a follow-up to his previous post (Disease-causing Mutations Discovered by NGS in 2011) Dan Koboldt compiled for MassGenomics a nice survey of cancer genome and exome sequencing studies published last year.

According to the author around 700 individual tumors representing 17 major cancer types were characterized in the enlisted studies.

Wednesday, 11 January 2012

An exciting new year

Now that both Life Technologies and Illumina have released their new generation of sequencers it seems that the fantomatic $1000 genome goal is suddenly at hand's reach. After the announcements made by both NGS leaders with two press releases yesterday, the new sequencers have been immediately at the center of the interest with many blogs, newspaper and commentaries (you can find lot of informations surfing on some of the blogs and sites we follow) speculating about the actual possibilities and future perspective of the new astonishing Ion Proton and HiSeq2500. So what can we say by now about these newborns?

Ion Proton

It relies on the same technology that boost IonTorrent PGM so it promises to be as fast and cheap (and of course with the same quality issues for homopolymers). It seems that the advancements are essentially based on larger chips (165M wells for the Ion Proton Chip I and 660M for the Chip II vs 11M wells on the 316 and 318 chips) and better materials for the semiconductor sensors.

The chemistry will also be the same of PGM. So it is expected that they will start with 2x200 bp pair end reads, even if the Ion Proton Chip I is announced for the mid-2012 and we think that the 400bp reads will be also ready to deliver at that time. In any case don't forget that, at this time, paired-end reads on IonTorrent are made by two overlapping reads (and not contiguous ones as for Illumina) so this greatly increase accuracy but not productivity.

Based on the actual standards for PGM (200 bp reads) we can speculate that the Proton Chip I (declared for exome) will deliver at least 12-14 Gb of sequences, so one can obtain a minimum of 2 exomes (considering the larger available exome kit by Agilent, about 70Mb) at about 90X or 3 exomes at a decent 50X. Since the Proton Chip II (declared for genome) contains 4 times more wells, we can suppose about 60 Gb of sequences (or they will follow the increasing curve of PGM chips with a 10-fold increase for every chip?), enough to get a single whole human genome at about 20X. You can see also predictions made in this post in the SeqAnswer forum (which are a little more optimistic than ours).

Now the costs: the Ion Proton machine itself costs about $ 150k (CEO of Life also announced dicounted prices for labs upgrading from PGM), but you have to consider also $ 75k for the Ion Proton Server necessary to run the machine and some more bucks for the Ion Proton OneTouch System (essential for library preparation). So we can think of an overall price of about $ 240k for a working platform. However don't forget that this system will include also the new analysis software developed by Life Technologies so there would be no need for additional computer equipment, at least at beginning (obviously we can say that for sure only with tech specifications for the new server available). The most discussed topic remains the actual cost of chips and of the entire workflow. In its press release Life Technologies stated a $ 1000 overall price for both Chip I and II, but most of the experts believe that this may be true for the Chip I (exome), but unlikely for the Chip II (genome), since rumors place the chip itself at $ 1000 cost. Will the $ 1000 genome remain out of grasp? We will see as Life releases tech specifications and commercial prices.

Addition considerations: nothing is known by now about compatibility between the new and the old instruments. Even if this is unlikely for the sequencer itself, it may result really useful for the side equipment like the OneTouch and the Server. The chemicals are though to be the same between the PGM and the Proton at least in this early phase. Another interesting question is: what's future now for the PGM? Speaking at the JP Morgan Healthcare conference yeasterday Life CEO Greg Lucier declared that he expect the PGM to continue to run side-by-side with the new proton since they answer to substantially different needs. They also assured that the development of the PGM technology will continue even if I personally think that they will improve read lenghts and chemicals but a new chip is unlikely to be developed. So the actual scenario is: PGM works fine for diagnostic (since it is fast cheap and really good on "small" targets), cancer (and other relatively small pre-made gene panels), compact genome applications (bacteria) and RNA-seq, while the new Proton is suitable for research applications based on exome and whole genome sequenching.

HiSeq2500 and new MiSeq

As for Proton, this new machine is based on the same chemicals and technology of the latest version and so promises to fullfill the same high quality standards. In this case advancements have been done in the cluster genertion and detection and adding the cluster generation procedure directly in the sequencing machine so that the entire procedure is quicker and more automated (like it is on the MiSeq).

The chemicals will also be the same, so it is expected that they will start with 2x150 bp pair end reads. But even in this case the 300 bp reads protocol seems to be almost ready and one can expect a rapid increase in productivity.

Illumina declared that the new instrument will come with a fast-run mode that produce 120 Gb of data (1 human genome at something more than 30X) in a 27 hrs run. However, even if this time is supposed to include the cluster generation done by the machine, you have to add some time for sample preparation. In the standard run mode the HiSeq2500 will produce 600Gb of data per flowcell in a 10 days run (about 5 genomes at 30X). CoreGenomics tried to estimate the cost per single genome concluding that it will be around 1000$ as for the Life new platform. Even more Illumina has already discussed the possibilities of 1Tb runs.

Coming to costs the HiSeq2500 will be sold at 720-740k $ and it is not clear to me if this price is for a ready-to-go package or some additional equipment is necessary for template preparation or data analysis. However Illumina have announced that luckly owners of the HiSeq2000 could upgrade to the new version for $50k, a pretty good deal.

Even more Illumina has made its move also in the field of benchtop sequencers, announcing susbtantial improvements in protocols and productivity for the MiSeq platform. They stated a 3-fold increase in sequencing throughput with a single run now producing 7Gb of data (enough for at least 1 exome at 100X). They also get 2x250 bp pair-end reads and a cycling 20 to 40% faster depending on applications (2,5 h for a microbial genome is declared in the press release).

Who is gonna win this battle?

Of course we all have to wait for further details to eventually declare a winner. However it is quite sure that the battle on the new NGS market has boosted innovation with great advantages for the final users!

By now I think that, with the new HiSeq2500, Illumina remains the leader in the field of large sequencers. Its platform has proved to produce really good an reliable results, even if the high cost place HiSeq out of the market for most research center of small and medium sizes. However it seems the best investment for large sequencing facilities, due to its huge productivity (1Tb in 10 days mean 100Gb/day/machine).

With its ultra fast and quite cheap platforms, the PGM and the new Proton, Life Technology has placed itself in a really good position for the "consumer" market (research centers and small labs interested in exome sequencing and WGS as well as diagnostic centers interested in mutations detection) . The Ion machines are also strong competitors against the Illumina's MiSeq, even more considering their semiconductor technology. This principle has shown some weaknesses in the past year, but surely still has a great potential in terms of flexibility and quick improvement.

Last, but not least, do not forget that there are some new companies, one for all Oxford Nanopore Technologies, that seem ready to come up with their 3rd generation platforms this year!

The NASDAQ response: Illumina vs LIFE

How did the financial market react to the earthquake generated by the two big announcements made by Illumina and Life Technologies?

Let we check on Google Finance what happened yesterday. Life Technology released its Ion Proton news just before the Wall Street opening (and market award them with a +4% from the start). Now try to guess by the plot (red line) when Illumina has released the news of the new HiSeq2500...

The first battle on the financial field seems to be won by Life (Life +9% , Illumina +4% at the close), who will win the war at the end?

Flash Report: another "Genome in a Day" DNA sequencer announced yesterday

It comes at no surprise that yesterday also Illumina had a press release introducing the HiSeq 2500, an evolution of their HiSeq 2000 platform that will enable researchers and clinicians to sequence a “Genome in a Day”.

Our 2 cents: "the Chip is (not) the machine"

In Italy we had a TV ad saying "to paint a big wall you need a big paintbrush". Likewise, to sequence a large genome you need a big chip, so large that it cannot fit in the "old" Ion Torrent Personal Genome Sequencer.

After the excitement about the announcement of the new Ion Proton NGS sequencers, a number of considerations have been expressed on various NGS forums and blogs.

An interesting one is on the "Pathogens: Genes and Genomes" blog: Ion Torrent Proton Announced: The Chip Is (Not) The Machine.

I just want to add a few personal thought:

- the fact that they do not mention it in the press release or on their web site I guess that implicitly means that the Ion Proton cannot use the 414, 416 and 418 Chips :-(

- It is not fair that Life Technology do not more clearly state that to run the Ion Proton you need to buy a Ion Proton Server for $75,000.

- What will be the name of the third generation Ion Torrent-like sequencers? Maybe "Ion H+".

- With the release of the Ion Proton Life Technology clearly killed the family of Solid Genetic Analysis Systems.

Tuesday, 10 January 2012

How the little ones can do it well in NGS

NGS techniques have developed rapidly last year, with data production capacity increasing at a quite unexpected rate. With all these data to analyze, a new series of questions arise about how to organize a lab to actually manage and store sequencing projects and how to be effective in the analysis of the data themselves. Most of the comments appeared in the field pointed to the necessity of large computational facilities, proposing a frustrating scenario in which every small lab willing to work on NGS would need to implement a massive bioinformatic research group (it sounded like: who needs biologists anymore? Lets turn to computer science!) or see its possibilities severly limited to simple and rigid workflows. Now this commentary on Nature Biotechnology shows how even small research groups can be organized to do the best in NGS analysis without having to completly turn themselves into a computational center. The paper, based on the experience of the authors in their own center, suggests some interesting tricks on how to organize your activity and train your researchers so that they can be flexible and productive in the new field of high-throughput genomic. It really worths a read!

Breaking News: "Ion Proton", the evolution of the Ion Torrent PGM

This is a big news in the NGS market. Today Life Technologies announced the Ion Proton, an evolution of the Ion Torrent PGM designed to sequence the entire human genome in a day for $1,000.

You can find more details here.

Monday, 9 January 2012

Flash Report: Survey of 2011 NGS market

Jeffrey M. Perkel discuss in a Biocompare editorial article the events that shaped the next-gen sequencing market in 2011.

Flash Report: 3000 human genomes delivered in 2011 by Complete Genomics

Complete Genomics today announced that the company delivered approximately 3,000 genomes to its customers in 2011 and is entering 2012 with contracts for approximately 5,800 genomes.

Flash Report: Life Technologies press release on Ion Torrent new products and protocols

Today Life Technologies Corporation announced several new products and protocols to improve workflows for Ion Torrent DNA sequencing applications. You can read the press release here.

Flash Report: six months of Ion Torrent

The EdgeBio NGS Service Provider describes in its blog six months of experience (300 runs!) with the Ion Torrent PGM. Something they say at the end of the post let us think they are trying to squeeze an exome on 318 Chips.

Flash Report: pair ends on the Ion Torrent platform

Ion Torrent released a set of paired end datasets over the Christmas holiday. The interesting news is discussed in this article on Omics! Omics! It is quite incouraging to read that they are apparently able to reduce indel errors by about 5-fold on an E.coli DH10B dataset using this procedure.

Sunday, 8 January 2012

Flash Report: a soft genome

Here we go with a new category of posts on this blog: Flash Reports. Whenever we want to share an interesting news article with our readers but we do not have time to write a post for the NGS blog, we will create a "Flash Report" with a brief description and the link to the original article.

We start with a Flash Report about the release of the first ‘gold-standard’ genome sequence for Gossypium raimondii, the New World cotton progenitor. Interestingly, 49 additional cotton species will follow.

BGI ... GPU ... NGS...???

First of all, let me state that I'm not a fan of video games. However it seems that video games are somehow starting to heavily influence the world of genomics and NGS. Last fall BGI scientists have been able to shrink from 4 days to 6 hours the time needed to completely analyze the sequence data describing a human genome. The result was achieved using servers built around graphics processing units, or GPUs, the sort of processors that were originally designed to draw images on personal computer and are heavily used by modern video games. However, don't get too excited just yet. The latest generation gaming PC in your apartment most likely is not powerful enough to analyze your grandma or grandpa genome, at least from what I see about the hardware used by BGI (see the NVIDIA(R) Tesla(TM) GPU-based server farm pictured below).

You can read the entire story in this article recently published on Wired. NVIDIA, the GPU chip-maker collaborating with BGI, also has a press release on this achievement. SEQanswers has a number of posts about GPU based systems for NGS.

Don't feel guilty for playing games

A few days ago I described in a NGS post an on-line game (Foldit) that allows users to contribute generate protein structure predictions. I was not aware of the existence of Phylo, a scientific video game in which "every puzzle completed contributes to mapping diseases within human DNA". This description, provided on the developers Facebook page, is at least vague.

The explanation given by a high school biology teacher in the Ars Technica open forum is much more clear: "Philo is flash game where you drag around DNA sequences to try to achieve a better alignment than the computer's algorithm for comparing genes between several species".

Philo was developed by Dr. Jérôme Waldispuhl of the McGill School of Computer Science and collaborator Mathieu Blanchette. Last month the researchers released the results computed from the Phylo solutions collected over the last year. The game has over 17,000 registered users and since it was launched in November 2010, the researchers have received more than 350,000 solutions to sequence alignment problems. "Phylo has contributed to improving our understanding of the regulation of 521 genes involved in a variety of diseases" says Jerome Waldispuhl in McGill's news release. "There's a lot of excitement in the idea of playing a game and contributing to science at the same time" says Phylo co-creator Mathieu Blanchette in the school release. "It's guilt-free playing; now you can tell yourself it's not just wasted time."

More info can be obtained from the Phylo web site.

Pages