NGS: News on Genomic Studies: March 2012

Thursday, 29 March 2012

There's more on pores than just Oxford Nanopores

Oxford Nanopores has recently presented at AGBT its new GridIon and MinIon, the first avaiable DNA sequencers based on nanopore technology, and there is a lot of hype to get hands on these new machines. Another manufacturer, Genia, has step up with a nanopore based sequencing platform promising to develop the 100$ genome.

However the nanopores have such a great potential for new applications (see this extensive review on Nature Biotechnology) that many others are entering this arena with novel methods suitable for DNA sequencing and other applications.

In one paper published on Nature Biotechnology and commented on this post, the authors combined a MspA biological nanopore with a phi29 DNA polymerase to develop a new DNA sequencing strategy. The method itself involves also the use of a DNA oligomer to initially inihibit the polymerase activity. It is quite complicated to explain in words but you can see how it works in the figure below, taken from the paper itself. The authors had successfully sequenced short oligomers, as well as short random sequences, even if they had to admit that issues remain with homopolymers (basically due to the influence of adjacent bases on the reading of the single basis moving through the pore) and the oligomer bloker. However Gundlach, ono of the authors, said the system has "the potential of being a very good reader with very high confidence and quality."

(a) Crystal structure of M2-NNN MspA. Charged vestibule residues are indicated in blue (negative) or red (positive). (b) A schematic depicting a standard experiment. Roman numerals correspond to positions in the current trace in c. (c) The measured blockage current (Ib) as a fraction of the open pore current (Io) is shown for a sample event. (i) A single MspA pore (purple) in a lipid bilayer (gray). The template strand (black) contains the sequence to be read. A primer strand (blue) is hybridized to the template's 3′ end. A blocking oligomer (red) with a 3′ end of several abasic sites is adjacent to the primer. The phi29 DNAP (green) binds to the DNA to form a complex that is driven into MspA. A positive voltage is applied to the trans side. The single stranded 5′ end of the DNA-motor complex threads through MspA and the ionic current drops. (ii) The electric force on the captured strand draws the DNA through the phi29 DNAP, unzipping the blocking oligomer. Arrows show the direction of motion of the DNA template strand. The ionic current exhibits distinct steps while nucleotides pass through the pore. (iii) The blocking oligomer is removed and DNA reverses direction (marked by blue dashed line). (iv) The phi29 DNAP incorporates nucleotides into the primer strand, pulling the template toward the cis side. The current repeats previously observed levels in reverse time-order. Two abasic sites produce a high current peak (~0.6 Io) indicated by red Xs. This marker is first seen during unzipping and then again during synthesis. When synthesis is complete, the DNA and DNAP escape to the cis volume, marked by the return to Io.

A second paper published on Nature Methods and also commented on GenomeWeb, has a more "technological" approach, based on a new CMOS sensor (see the figure belowe taken from the paper) to increase the sensibility of voltage change detection, which is at the basis of every nanopore system. Using solid-state nanopores platforms (rather than biological nanopores) they developed a current preamplifier that improves the signal recorded, allowing to take faster measurements. On of the main drawbacks of current nanopore technology is that it is necessary to slow down the DNA moving through the pore to make the voltage changes distinguisable, but "it would be nice to come closer to the natural rate at which DNA goes through the pore," said Jacob Rosenstein, first author of the paper. The speed at which weak currents through nanopores can be measured is not limited by the speed of the electronics, Rosenstein explained, but by the signal-to-noise ratio. "Whenever you try to measure something faster, you inevitably have more noise in the measurement."

Authors applied their new technology to measure the current trace of short DNA oligonucleotides of up to 50 base pairs and while they were unable to determine individual bases from the current trace, they could record the current signal with a bandwidth of 1 megahertz, about ten times faster than commercially available amplifiers.

(a) Schematic of the measurement setup. (b) Cross-section schematic of the low-capacitance thin-membrane chip. (c) Optical micrograph of the 8-channel CMOS voltage-clamp current preamplifier. (d) Magnified image of one preamplifier channel. (e) Optical image of a solid-state silicon nitride membrane chip mounted in the fluid cell. (f) Transmission electron microscope image of a 4-nm-diameter silicon nitride nanopore.

Monday, 26 March 2012

Flash Report: AmpliSeq, a new amplicon designer tool from Life Technologies

Today Life Technologies today has opened the access to the Ion AmpliSeq™ Designer, a web based primer design tool to create custom, ultrahigh-multiplex primer pools for Ion Torrent sequencing.

James Hadfield (who runs a Genomics core facility in Cambridge, UK) was an AmpliSeq beta tester and on the CoreGenomics blog shares his apparently positive experience on the amplicon designer tool.

I just tried to test it myself but apparently the AmpliSeq site is already "under attack" by impatient Ion Torrent users.

Tuesday, 20 March 2012

iPOP: not a new gadget from Apple but a first example of comprehensive omic approach

Since the completion of the sequence of the human genome everybody is expecting for the great promises of Personalized Medicine to become reality, but the gap from genomic studies to real progress in personalized health care was larger than expected.

However, things are starting to change: this new masterpiece published in Cell demonstrates that an integrated omics approach is both feasible and useful for monitoring and prevention of diseases (in this case type 2 diabetes).

Image from Chen R. et al., Cell 148 (2012)

In this paper the authors report the results from a huge study based on a survey of several omics profiles, what they called iPOP (integrative Personal Omic Profile) including proteomics, metabolomics, transcriptomic and genomics data, in an healthy individual over 14 months. During this period they have collected several peripheral blood samples and evaluated the changes in the various omics profiles in response to infections and other events.

This seems to me like the first fly by the Wright brothers in 1903: it was known that flying was theoretically possible, but they were the first ones to show that it can be a reality, even if only for a few meters. From that point on there was a rapid technological progression and now we have intercontinental and supersonic airplanes.

Now that the omics approach to personalized medicine has been proven to be feasible and also extremely informative, we expect a rapid progression from a single proof-of concept to routine screenings.

Citing the article discussion: "We focused on a generally healthy subject who exhibited no

apparent disease symptoms. This is a critical aspect of personalized medicine, which is to perform iPOP and evaluate the importance and changes of all the proﬁles in ordinary individuals. These results have important implications and suggest new paradigm shifts: ﬁrst, genome sequencing can be used to direct the monitoring of speciﬁc diseases (in this study, aplastic anemia and diabetes) and second, by following large numbers of molecules a more comprehensive view of disease states can be analyzed to follow physiological states. [...]

Although this cannot be proven with the analyses from a single individual, this study nonetheless serves as proof-of-principle that iPOP can be performed and provide valuable information. [...] Finally, we believe that the wealth of data generated from this study will serve as a valuable resource to the community in the developing ﬁeld of personalized medicine. A large database with the complete time-dynamic proﬁles for more individuals that acquire infections and other types of diseases will be extremely valuable in the early diagnostics, monitoring and treatment of diseased states."

Friday, 16 March 2012

Tumor evolution, one cell at a time

When applying NGS to study genome modifications occurring in specialized cellular populations extracted from a tissue, one can incur in confusing results due to the contamination from common cells present in the sample. Thus the "evolution" process resulting in the specialized function can be hard to evaluate. One interesting application to circumvent this limitation is single-cell sequencing, that allows to obtain detailed genomic information from a single cell extracted from any biological sample.

This has been applied recently in tumors to elucidate the exact mutational process that transform a normal tissue cell in a cancerous one. Knowing exactly the time schedule of mutations finally resulting in the malignant phenotype can be crucial to understand cancer biology and identify early genetic marker of transformation. Moreover this kind of study can identify mechanism of cancer evolution, that make the cancer cells able to respond to treatment developing various kind of resistance.

After a first paper applying this approach appeared in Nature in April 2011, now two other studies have been published in Cell, both supported by the BGI institute and both based on single-cell exome sequencing: the first one by Xu et al. present results from a clear cell renal cell carcinoma, the second one by Hou et al. studied the evolution of a JAK2-negative myeloproliferative neoplasm.

Graphical abstract from Hou et al.

Graphical abstract from Xu et al.

Tuesday, 13 March 2012

First steps of exome sequencing as diagnostic tool

The advent of exome sequencing over the past two years has led to an earthquake in the traditional approach of discovering new mutations involved in mendelian diseases, and it is increasingly considered as a powerful diagnostic tool for unresolved disorders. The question if exome sequencing will be soon an effective clinical diagnostic tool and if it will be succesfully fitted in the standard procedures of the clinical routine is a relevant topic well faced in a review recently published on Annals of Neurology. Exome sequencing could represent a dramatic improvement in the diagnosis of many Mendelian disorders (such as retinitis pigmentosa, Charcot-Marie-Tooth, etc.) characterized by locus heterogeneity. Sanger sequencing of all the exons of the candidate genes, is a time consuming procedures, representing a problem if a rapid molecular diagnosis is crucial for a more specific and effective care. The new NGS platforms seem to satisfy the needs of speed and accuracy, and the recent commercialization of kits for the sequening of a panel of genes involved in the main mendelian diseases is a further proof.

The possibility to sequence just a set of genes and not the entire exome is another crucial point that introduces some ethical concerns specially in those countries with a medical care based on private health insurance. A recent post on GenomeWeb by Matthew Dublin discuss the choice of UCLA to offer exome sequencing in a CLIA-certified (Clinical Laboratory Improvement Amendments) laboratory. It offers the service at $ 4.500 for an individual, $6.550 for a trio of exomes, and $2.500 for any additional exome. "There's a very large number of people with a clear Mendelian disease that do not have a molecular diagnosis" said Stan Nelson, a professor of human genetics at UCLA, "it's clearly more efficient to sequence the exome first, as the first genetic test" rather than subject a patient to a whole host of different tests, he noted. And about how to choose the patients for the exome sequencing: "Anyone with a very rare serious phenotype that's likely to be a single genetic event" would be eligible. Through exome sequencing and the bionformatic analysis of data they find a list of candidate variants, those are highlighted for further evaluation by a genomic data board. This step consist of a discussion between expert physicians, genetic counselors, bioinformaticians, medical geneticists, and the patient's primary physician. Only results that are relevant for the patient's disease are returned. The test is "explicitly to diagnose," Nelson said. Other institutes, such as the Medical College of Wisconsin and Children's Hospital, decided to be more stringent in the number of cases analysed with exome sequencing. They choose this technique only after all the other options are exhausted, but they return any relevant results that the patient's parents want, including variants that confer risk for adult-onset diseases.

A crucial point is: how will the insurance companies treat exome sequencing? UCLA has no specific and formal agreements about reimbursement of the cost of exome sequencing, but Nelson thinks the test will eventually be reimbursed, because it will ultimately save them money. But a new doubt arises: after the reimbursement will the insurance companies claim demand to know all the variants of their customers?

Monday, 12 March 2012

A Huge(Seq) tool that can make your life easier

I want to report this tool called HugeSeq, an automated pipeline to identify and annotate genetic variants. The authors developed a 3 steps process:

Mapping, that is mapping the reads and generate the BAM files
Sorting, that is sorting the BAM files by chromosomes and clean them up (remove of PCR duplicates, base recalibration and sequence realignment)
Reduction, that is detecting the variants (SNP, indels and SVs) and eventually annotate them.

The entire process is highly optimized and build up to run in parallel reducing the analysis time. Various steps rely on robust and widely used softwares such as BWA, Picard, SAMTools, VCFTools, GATK, ANNOVar. As stressful as this can be, these softwares often use non standard input files and produce a non standard output as well... But luckly now you have HugeSeq that can make your life easier!

For details see the related article on Nature Biotechnology.

Thursday, 8 March 2012

Who wants to live forever?

Finding a genetic signature of extreme longevity and determining what exactly is the impact of the genetic background on human lifespan have been one of the most discussed topics in recent times. Everybody asking: to live over 100 is a matter of life-style or is written in your DNA?

We remember the excitement generated by the article published in Science in 2010 that describes a set of SNPs significantly associated with extreme longevity, and the delusion when these findings revealed to be inconsistent (the paper has been retracted in 2011). However the authors have reanalyzed the data, corrected the issues and were able to produce robust conclusions and a new publication, this time on PLOS One, in the early 2012.

Together with other clues already discussed in literature, such as high familiarity of extreme longevity, this is a strong evidence of the role of genetic background in influencing the human life expectation.

Taking advantage of the NGS technologies, several projects have been launched aiming to sequence a consistent group of people older than 100 to get clues on which variants and/or

genetic mechanisms are at work to give these subjects a long and surprisingly healthy life. Examples are the Medco 100 over 100 Prize or the New England Centenarian Study.

Sebastiani (the first author of the previously cited paper) has also recently published the results from the whole genome sequencing of two individuals, one male and one female, over 114 years old. The paper reports interesting findings on the distribution of genetic variants and tested the 4 main models proposed for extreme longevity genetics: presence of alterations in metabolic pathways, lack of disease-associated variants, presence of rare variants, enrichment in longevity associated variants. Even if two individuals are clearly not sufficient for inferring final conclusion, the data obtained supported a scenario in which disease-associated variants are not depleted, but are likely counter-balanced by the enrichment of longevity associated variants (the two subject tested resulted enriched in variants near the ones previously identified as longevity associated). Moreover, a detailed annotation of the identified variants showed that modifications in splicing events may be an important factor, calling for future RNA-Seq studies on ultracentenarians.

Overall this is the first reported WGS on subjects >100 years old and an interesting pilot project for future larger studies.

Wednesday, 7 March 2012

If Italy cries, Japan doesn't laugh...

Although Japan is amongst the five leading countries in biomedical research, its DNA sequencing capacity is surprisingly low (there are more next generation sequencing machines in the Australia or in Spain). Art Wuster, a postdoctoral fellow at the Sanger Institute in Cambridge, in his Seqonomics blog provides an interesting overview the state of DNA sequencing in the land of the Rising Sun.

Interestingly, a different post on the same blog (Who are the sequencing superpowers?) analyzes the sequencing capacity of several countries compared to their R&D expenditure (see the graph above). Also in this case emerges the low number of sequencers in Japan (34), compared to the vast research expenditure ($144bn) of that nation.

By the way, Italy is not close to the bottom of the list just because our research and development expenditure is much lower (about 1.2% of GDP vs more than 3% of Japan)

P.S: The original saying goes "If Athens cries Sparta doesn't laugh" and is not related to the current Greek economic crisis but to the Peloponnesian War (431–404 bc).

Monday, 5 March 2012

Far beyond variant discovery: NGS applied to the study of spatial organization of the mouse genome

When it comes to NGS, we usually think about genetic variant discovery in individuals with medical conditions.

However the potentials of the new sequencing technologies can go much further as demonstrated by this masterpiece published on Cell. In this paper by Zhang et al. authors used high-throughput genome-wide translocation sequencing (HTGTS) together with Genome-wide chromosome conformation capture (Hi-C) technique to obtain a detailed map of chromosome physical interactions in the mouse genome and evaluate the impact of chromosome spatial distribution on translocation events.

Reading the Methods section we find where the NGS provide its support. First, authors used Roche 454 for the HTGTS application. Then they applied NGS to Hi-C, with a really interesting method. Briefly, after cross-linking, they have digested the mouse genomic DNA with the HindIII restriction enzyme, labeled the fragment obtained with a biotinylated dCTP and then ligated adjacent fragments. These steps resulted in short sequences composed by two pieces of DNA that were in spatial proximity with the restriction sites and the labeled dCTP in the middle. Then they reversed cross-ligation and proceeded with the standard protocol for NGS library preparation: fragmentation by sonication and hang-repair. They added a bead-capture step to select only fragments containing the biotinylated dCTP, and then finalized the library by size-selection and adapter ligation. Using Illumina NGS paired-end sequencing they were able to massively sequence both extremities of these fragments and then map them against all the HindIII restriction sites in the mouse genome obtaining a precise map of all the proximity regions occurring between different chromosomes!

The results obtained are really interesting! They demonstrated that spatial distribution, together with the double strand breaks (DSB) events probability, is a major factor in determining preferential translocation. Besides providing an accurate map of spatial distribution of chromosomes in mammals, as authors stated in the discussion "this finding has great relevance to translocations in cancer". The paper shows that "formation of translocations between randomly generated DSBs, such as those induced by chemotherapies and radiotherapies, will likely reflect a strong influence of spatial proximity [...]. [The results] also suggest that spatial proximity may be a major driving force for the activation of certain oncogenes via translocation to a wide range of recurrent partners".

The horse has arrived on the NGS ark

With this article published on BMC Genomics now also the horse has its spot within the ever growing group of animals with their genome resequenced using NGS technologies.

The authors has produced about 60Gb of DNA sequence from the genome of a Quarter Horse mare (mean coverage of 25X) and then compared the new assembled genome to the horse reference. In the paper they reported 19.1 Mb of new genomic sequence assembled and identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs.

Besides resulting in a better horse genome reference, this paper, as authors stated, "increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids."

Thursday, 1 March 2012

Personalized medicine in pediatric cancer: a first clinical trial

In November 2011 the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC), supported by the Translational Genomics Research Institute (TGen), has started an interesting clinical trial that will apply next generation sequencing to provide a rapid personal treatment to pediatric patient affected by refractory neuroblastoma, a deadly form of childhood cancer.

The trial is testing the hypothesis that molecular aberrations in the tumors of individual patients can be identified in real time through genomic analysis to predict responsiveness to targeted therapies. This would allow researchers to predict which of the 150-200 available chemotherapy drugs will be most effective. This kind of genomic analysis is producing about 30Tb of data for a single patient, a huge amount of data that has to be rapidly shared within the Consortium researchers and clinicians to provide effective results. Since time is a crucial factor in fighting this type of pathologies, an adequate hardware infrastructure is required for the project to become reality.

Luckly Dell, as part of the "Powering the Possible" charitable program, has donated an huge platform that has increased the computational power of TGen center by about 12 times (see the reports by TGen and Dell itself). According to HPC (a site dedicated to cloud computing) this platform comprises Dell PowerEdge Blade Servers, PowerVault Storage Arrays, Dell Compellent Storage Center arrays and Dell Force10 Network infrastructure. It features 148 CPUs, 1,192 cores, 7.1 TB of RAM, 265 TB Disk (Data Storage) and all the necessary for cloud based applications.

According to Dell representative Jamie Coffin, with Tgen translational technology and the Dell cloud platform, work that used to take a year, can now be accomplished in two weeks and early results are reflecting success rates of 24-30%.

Flash Report: 99 exomes sequenced at no charge

Scientists at Washington University School of Medicine in St. Louis are reaching out to patient advocacy groups and offering to decode the DNA of 99 patients with rare diseases to help find the genetic alterations responsible for their illnesses.

The initiative is known as the Rare99X Clinical Exome Challenge. The patients’ DNA will be sequenced at the university’s Genomics and Pathology Services (GPS) at no cost to patients or the advocacy groups. GPS will begin accepting proposals for exome sequencing from patient advocacy groups on Feb. 29, which has been designated as Rare Disease Day.

More info at the Rare Genomics page

Pages