NGS: News on Genomic Studies: February 2012

Wednesday, 29 February 2012

The nuclear genome of Otzi the iceman is finally published

A team of international scientist (belonging to 19 institutions!) has published the complete genome sequence of Ötzi, the Tyrolean Iceman, in Nature Communications.

Ötzi’s 5,300 year old body has been discovered in 1991 in the Alps bordering Italy and Austria.

"The Iceman probably had brown eyes, belonged to blood group O and was lactose intolerant. His genetic predisposition shows an increased risk for coronary heart disease and may have contributed to the development of previously reported vascular calcifications."

The researchers also found traces of DNA from the bacteria Borrelia burgdorferi, meaning Ötzi may well be the earliest human case of Lyme disease on record.

Interestingly, when the researchers compared Ötzi's genome with that of modern day populations, they found he was most closely related not to people from Northern Italy (where he was discovered), but present-day inhabitants of the Tyrrhenian Sea, specifically men from the islands of Sardinia and Corsica.

The image above has been reproduced from the Nature commentary "Iceman's DNA reveals health risks and relations"

Tuesday, 28 February 2012

Flash Report: exome sequencing using an Ion Torrent PGM

Dr. Marilyn Li, director of the Cancer Genetics Laboratory (CGL) of Baylor College of Medicine, presented at 2012 AGBT some data relative to whole exome sequencing using the Ion Torrent PGM. The study was performed on one HapMap sample (NA12763) and on the DNA of a patient with Charcot Marie Tooth Neuropathy.

10 runs using Ion Torrent 318 Chips were merged (about 7 Gb) to achieve a high average coverage (43 to 57 fold, see the slide above). The experiment was not particularly cheap but apparently the results are encouraging: variant calls were compared to SNP array data of the HapMap DNA sample resulting in the 99.1% concordance. Variants for the CMT sample were compared to the Illumina exome data, resulting in 92% SNP concordance and the identification of the 2 causative mutations in the SH3TC2 gene.

The AGBT presentation can be seen on this YouTube video (the exome part starts around min. 15:00)

Friday, 24 February 2012

The New York Genome Center plans to sequence 1,000 genomes from Alzheimer's patients over the next four years

The AGBT meeting ended a few days ago. Many exciting announcements were made, most of them related to technological aspects in the field of NGS. AGBT was also the stage for relevant biomedical projects. The New York Genome Center (NYGC), for instance, decided to reveal its future plans right there. The brand new center, as stated by its director Nancy Kelley, is planning to start its research activity with a challenging large-scale sequencing project. In collaboration with the Feinstein Institute for Medical Research and Illumina, NYGC is starting to sequence 130 genomes from Alzheimer patients and the number will reach the 1,000 over the next 4 years. The study will include the sequencing of an healthy elderly control group and all the data will be made freely available. The goal of the project is to understand the genetic basis of susceptibility to Alzheimer’s disease, of which very little is currently known. “The Feinstein Institute’s commitment to sharing the data resulting from these efforts with the greater research community could significantly accelerate the speed of translational research in Alzheimer’s disease, with a profound impact on patient care and clinical outcomes, which is in line with the vision of NYGC” director Nancy Kelley said. Here is the link to the NYGC press release.

Wednesday, 22 February 2012

Italian Nobel Prize Renato Dulbecco dies at 97

Renato Dulbecco, who shared the 1975 Nobel Prize in medicine for "discoveries concerning the interaction between tumour viruses and the genetic material of the cell", has died in California.

In 1986, Dulbecco wrote a seminal editorial in the journal Science (A turning point in cancer research: sequencing the human genome) in which he called for sequencing the human genome to understand tumor virology and cancer in general — the research that came to be the Human Genome Project.

Dulbecco worked on the Italian Genome Project from 1992 until 1997, but financing dried up and he returned full-time to Salk.

I'm sure Prof. Dulbecco was happy to see before dying that the $1000 genome is just around the corner.

Sunday, 19 February 2012

The 3rd generation is here: first details on the new Oxford Nanopore sequencing platforms

There was a great hype around the new technology proposed by Oxford Nanopore Technologies (ONT). Sequencing using nanopores had been the center of attention in the past months as it was becoming clear that this new platform really had the power to make a revolution, taking NGS really to the 3rd generation. Quick, affordable and robust single molecule sequencing with extremely long reads was the promise and Oxford Nanopore did not disappoint!
They made their move on the stage of the AGBT conference with a talk and an official press release announcing the main specs of their new paltforms (GridION and the small MinION). Now everybody is talking about nanopore sequencing, its potential and the issues related to this new amazing technology. A number of interesting posts have been published by the bloggers: Keith Robinson had the opportunity to talk directly with ONT guys and made a detailed commentary; CoreGenomics blog has a good summary of the main innovations introduced by ONT; Bio-IT World released an interesting interview with Oxford Nanopore CTO Clive Brown and you can find lot more on the others blogs we follow.
This are the main characteristics of the new platforms that make everybody say "WOW!":

- long reads: at least 10kb, up to 40kb, with a 4% error;

- high accuracy: ONT stated that at the time of release they will reach 99% accuracy in base calling;

- fast sequencing: GridION machine will be able to read sequences at roughly 600 million base pairs per second, about 14Gbp a day or a high coverage (30x) human genome in just under 6 days;

- low costs: even if the exact costs are not available yet, they will be around $2000 for a single human genome, with the promise to quickly drop to $1000. It seems that one could choose between different configurations of the machine: one will costs more at the beginning but will have low running expenses, the other will have a small price (even close to zero) then you will spend in consumable when you decide to sequence;

- quick sample preparation: the new single molecule technology will require very limited sample preparation. Simply extract the DNA, maybe perform a partial fragmentation, and you are ready to load and run the machine. This will eliminate the long and expensive library preparation steps that affect all other platforms.

Beyond the technical specs there are a number of amazing facts around the new Nanopore technology, among them the possibility to read nucleotide modifications, thus creating space for a variety of new experimental approaches.

The MinION machine is really surprising, emboding the concept of Personal Genomics. It is a device with a 1Gb sequencing capability dressed as an USB pen-drive, and also working as simply as one of them, as Oxford Nanopore showed. You can buy one, load your sample, attach the MinION to an USB port and have the sequence on your laptop!

The GridION big brother is also amazing: a single instrument, a node, operates with a single-use cartridge that contains the necessary reagents to perform an experiment; multiple nodes can be aggregated together into larger co-operating units or clusters, communicating with each other (a 20 nodes system could deliver an entire human genome in 15 minutes); a user can run one or more nodes for minutes or days according to how much data is needed to complete the experiment.

Are you ready for the 3rd generation in genomic sequencing?

Saturday, 18 February 2012

Incredible but true: a USB disposable Next generation Sequencer from Oxford Nanopore for less than $1000

Yesterday fireworks at AGBT: Oxford Nanopore Technologies presented two new products: the high-throughput GridION™ and the MinION™, a sequencer the size of a USB memory stick. Both will be commercially available by the end of 2012.

The MinION™ (see the picture above of the device connected to an Apple laptop :-) will cost between less the $900 per unit, and generate about 150 Mbp of sequence per hour for about six hours. Il will be connected via USB to a computer that will analyze the data (bases are streamed in real time in FASTQ format). Then you pull the MinION out and throw it away. A device of the size of a fat USB pen has (more or less) the productivity of an Ion Torrent with a 318 Chip. Moreover the DNA sample preparation is almost non-existent ("You add blood, and some buffer and some enzymes”) and the read length of the nanopore technology is (... have a seat, please) in the order of 20-50kb (yes, kilobases)!!! On the down side the accuracy is less than stellar: right now is around 96%, errors are deletions, error profile will improve through software.

Oxford Nanopore's GridION™ platform consists of a scalable network device - a node - designed for use with a consumable cartridge. Each cartridge is initially designed for real-time sequencing by 2,000 individual nanopores at any one time (Nanopore number will increase to 8,000 in early 2013). Nodes may be clustered in a similar way to computing devices, allowing users to increase the number of nanopore experiments being conducted at any one time if a faster time-to-result is required. A 20-node installation using an 8,000 nanopore configuration would be expected to deliver a complete human genome in 15 minutes. Sigh.

The future is now... no, actually, the future was yesterday at the AGBT.

On the "Pathogens: Genes and Genomes" blog you can read an exclusive interview with Oxford Nanopore’s Dan Turner (Director of Applications), Clive Brown (Chief Technical Officer) and Zoe McDougall (Director of Comms).

Friday, 17 February 2012

Flash report: two new Next Generation Sequencers presented at AGBT

GnuBIO

- Read Length: 1000bp

- Accuracy: Phred 70+

- Holopolymers: Phred 70+ to 9bp

- Yield/run: 1.2 GB

- Real time Variant calling, each base queried 6 times.

- Sample Prep: < 1min (!!!)

- System is rackable, ~30lbs

- Dry Instrument fludics in cartridge.

- Beta Mid-2012, commercial end of 2012?

Sources:

http://seqanswers.com/forums/showthread.php?t=17764

http://t.co/JPKbR7L2

http://t.co/5KSzCcVS

LaserGen

- Raw base accuracy 99.8%

- 100bp reads

- ~$99K for instrument?

- $1K per run?

Source:

http://seqanswers.com/forums/showthread.php?t=17764

Google Refine: a powerful tool also for genomic data?

Managing and working with genomic data such as taxonomic databases or next generation sequencing data is a challenging and tortuous field. One of the worst problems is to compile and put together different data sets in an easy and fast way, and, why not, also in a comprehensible and intuitive fashion. As experience teach us, some softwares that seem to be the right ones to use, turned out to be not the best ones. For example, as one of us wrote on this blog some days ago, Excel shows difficulties and embarrassing limits when formatting a text with some gene names, potentially leading to mistakes in the bioinformatic analysis. Fortunately, a number of new softwares have been recently developed to help us in the bioinformatic data analysis; one of them is VarSifter (we had a post about it few weeks ago), a very useful tool to manage next generation sequencing data and to analyze genomic variants.

Here we want to speak about a new software called Google Refine, developed by Google "for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase".

Google Refine is not specific for bioinformatic data but it is widely dynamic and can support different file formats such as TSV, Excel, CSV, and XML. It offers a huge amount of intruments that allow to manage your data in almost infinite ways. It would be impossible to try to explain here all the potential applications of Google Refine. We suggest to personally try this tool and we are sure that you will find it extremely powerful and user friendly at the same time!

Here is a link to the YouTube video describing the main features of Google Refine.

Flash Report: effects of acute intake of alcohol on NGS scientists at AGBT...

A new sequencing technology enters the ring: SHTseq(TM)

"... Using CrapBio-SHTseq technology we regularly get 10Mb reads and we have even seen reads of 100Mb which completely sequenced E. coli 20 times in a single read...."

"... AngstromRealtimeSensors™ can accurately read the DNA while still inside cells. Better than that, you don’t even need to take the DNA to the analysis machine you can simply take an image of the organism you wish to sequence and upload it to the SHTcloud and have the DNA information extracted directly by out highly trained SHTtechnicians..."

Apparently they are having fun in Marco Island...

Thursday, 16 February 2012

Broken genes in healty people: sequencing errors and genome robustness

A new paper published in Science by Daniel McArthur provides the first exactly estimates of the presence and impact of loss of function (LoF) variants in healthy human genomes.

This extensive analysis, based on data from the 1000 Genome Project, applied informatic as well as experimental filters to distinguish true LoF variants from all those that are due to errors in sequencing, variant calling alghoritms or gene annotations. Results indicate that LoF are subjected to strong puryfing selection that tends to eliminate them from populations. Every genome harbors in mean 100 real LoF, most of them in an heterozygous state or in genes with few protein interactions. So completely disrupted genes are in fact rare, reducing previous estimates on their abundance. However the fact that the knock-out of a gene produces no phenotypic effects induces to reconsider the robustness of human genome and the redundancy of genes.

But the most interesting aspect, stressed also by the author in his post on MassGenomics, is the evaluation of the errors resulting in an high false positive rate when identyfing LoF. Authors stated that "the greater the predicted functional impact of a sequence variant, the more likely it is to be a false positive". This is caused by the fact that disrupting mutations are subjected to puryfing selection and thus tend to be removed from population, meanwhile errors are equally ditributed across all types of variants resulting in a greater proportion of false positives when looking to LoF variants.

So in the frenzy search for disease causing mutations you'd better be careful!

Live coverage of the AGTB Meeting

Yesterday (Wednesday, February 15) there was the opening of the 2012 Advances in Genome Biology & Technology meeting in Marco Island (a boring place if I trust the pictures I've seen).

The ones who have not been able to fly to Florida can get the AGBT highligths through the Twitter feed provided by Dan Koboldt, the curator of the MassGenomics blog.

In his blog Dan provides a number of information about the meeting.

When he comes back he hopes to have an answer to questions like these:

Which is better: IonTorrent PGM or Illumina MiSeq?

What’s the current product quality and sequencing backlog of Complete Genomics?

Should I start saying “Roche” instead of “Illumina”?

Is there a future in 454 sequencing?

Exome kits: Are researchers leaning toward Nimblegen, Agilent, or Illumina?

Who’s got a real shot at the $1,000 genome this year?

Is this genome-in-a-day thing realistic, or pie-in-the-sky?

Like many of us, Dan is particularly looking forward to the presentation from Oxford Nanopore, who earlier this month promised to have a commercial instrument available in 2012.

Stay tuned.

Addendum: an additional massive AGBT coverage at #AGBT . A good summary of tweets is also available on the "Pathogens: Genes and Genomes" blog (AGBT 2012 Day 1 and Day 2 Tweets)

Wednesday, 15 February 2012

Have you ever seen an Ion Proton sequencer in action?

From this interview to Jonathan Rothberg it seems you can even take it with you in your chalet when you go skiing... : Le Ion Proton, un décodeur d’ADN révolutionnaire.

Rothberg also made an appearance a few weeks ago on Fox News with one of the Ion Proton prototypes. You can watch the video of the interview on YouTube.

Interestingly, Life Tech just announced that they shipped over 700 Ion Torrent PGMs in 2011. I'm wondering how many orders for the Ion Proton have been already placed...

Gene name errors inadvertently introduced when using Excel in bioinformatics

If you are in the right mood you can read this (funny?) story (Genes Families hacked by Microsoft Excel) on the errors that can be introduced by the automatic text formatting feature of Excel when dealing with gene names such as SEPT1, FEB1, DEC1 (guess why?).

Incredible but true, a scientific article has been published on BMC Bioinformatics titled "Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics".

A story a about a reporter, a firefighter, a genome sequenced and a gene named JAK2

If you are considering to have - sooner or later - your genome sequenced, I believe it would be quite instructive to read this scary article written by John Lauerman, a Bloomberg reporter working on a story about genomics: "Harvard Mapping of DNA Turns Scary".

Monday, 13 February 2012

Flash Report: extensive RNA editing in human transcriptome

A study published online on 12 February 2012 in Nature Biotechnology (Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome) seems to confirm the data reported last year in Science relative to the presence of a large number of sequence differences between mRNA and DNA in the human transcriptome.

While the 2011 paper was strongly contested by other scientists due to technical issues and lack of academic rigor, the BGI team leaded by Dr. Jun Wang, Executive Director of BGI, developed a more rigorous pipeline for approaching these problems and answered some of the concerned questions.

This latest analysis was performed on more than 750 million sequencing reads from poly(A)+, poly(A)− and small RNA samples of a of a male Han Chinese individual. The study identified "22,688 RNA editing events in noncoding genes and introns, untranslated regions and coding sequences of protein-coding genes. Most changes (~93%) converted A to I(G), consistent with known editing mechanisms based on adenosine deaminase acting on RNA (ADAR)."

“The evidence of extensive RNA editing identified in a human transcriptome underscores the necessity of an effective method to fully detect these events in order to further advance our understanding of human development and normal pathophysiological condition,” said Jun Wang. “With continual improvement of the new approach, we believe this could be achieved in the near future.”

Flash Report: BGI Opens a Genome Research Center in Europe

On February 10th, 2012, BGI officially opened its first European Genome Research Center located in Copenhagen Bio Science Park (COBIS). This research center is about 1,200 square meters and equipped with "only" 10 Illumina HiSeq 2000 sequencers.

More info in this BGI press release and on the BGI Europe website.

Wednesday, 8 February 2012

A 30-fold coverage sequence of extinct human relative available through Amazon Web Services

In 2008 scientists discovered a population of premodern humans present in Asia less than 50,000 years ago that were genetically distinct from modern humans and Neanderthals. Svante Pääbo's team described in Nature the complete mitochondrial DNA sequence retrieved from a bone excavated in Denisova Cave in southern Siberia. The sample came from a layer of material that dates from between 30,000 and 50,000 years ago. Interestingly Neanderthal DNA was found in a sample from the same time period less than 100km away, while artifacts indicate that modern humans were also present in the region by 40,000 years ago. While Denisova mtDNA shows about 385 differences with the typical human mitochondrial genome, Neanderthals only differ from modern humans by an average of 202. This indicated that the Denisova lineage split off about a million years ago, well before modern humans and Neanderthals did.

Scientist at the Max Planck Institute have subsequently sequenced the Denisova to about 2 x coverage. Data analysis, published in December 2010 in Nature suggested a more complex picture, with the Denisova population as a sister group to Neanderthals. Differently from Neanderthals, Denisovan ancestors did not contribute genes at a detectable level to present-day people all over Eurasia. However, the data suggest that it contributed 4–6% of its genetic material to the genomes of present-day Melanesians. Interestingly, they did not contribute to the genomes of modern asian populations such as Han Chinese and Mongolians which live near Denisova. The Denisovans obviously interbred with the ancestors of modern Melanesians at some point, but it seems unlikely to have happened at Denisova, which suggests that the Denisovans lived over a considerable area of eastern Asia.

Only two days ago the Denisova Genome Consortium released the raw sequence data and alignments for additional sequences generated using the Illumina GAIIx sequencing platform and corresponding to about 30-fold coverage (!!!) of the genome.

Although the researcher are still working on a paper describing their findings, they’ve decided to release the sequence, both on the Max Planck website and through Amazon’s Web Services.

As stated by the authors "the data are available for use, but users are expected to allow the data producers to make the first presentations and to publish the first paper containing genome-wide analyses of the data. Researchers who use small amounts of the data (eg: for single locus analyses) are not required to request permission. Researchers who have queries about whether they may present or submit Denisova genome data for publication may contact Svante Paabo."

It will be interesting to see if the new genomic data will help to shed light on the origin and the fate of Denisovas and their relationship with Neanderthals and modern humans.

In the mean time we can read a nice Nature Reviews Genetics article entitled "Learning about human population history from ancient and modern genomes".

Saturday, 4 February 2012

Does Illumina also have a problem with DNA homopolymers?

Apparently it does, according to a note published on the Ion Community website (free registration required). While it is widely known that both Roche 454 and Ion Torrent are in troubles when counting the number of bases in DNA homopolymer tracts, MiSeq seems to have difficulty getting the base after the run correct, often missing the correct call for one more call of the homopolymer. I'm wondering if HiSeq and GAIIx are also affected by the problem. For more information have a look at this post on the Omics! Omics! blog.

Which RNAs are you drinking tomorrow for breakfast?

Likely the ones transcribed by approx. 70% of the cow's genes. This is the result of a recent study entitled "Transcriptional profiling of bovine milk using RNA sequencing" published on BMC Genomics. The RNA comes from milk somatic cells (primarily leukocytes and some epithelial cells shed from the lining of the mammary gland) of Holstein cows at three stages of lactation. Gene expression analysis was conducted by Illumina RNA sequencing.

Friday, 3 February 2012

Flash Report: do you know that circular RNAs ...

... are far more common than previously thought?

This is the conclusion reached by Patrick O. Brown and colleagues in a study published in the February issue of PLoS ONE (Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types).

Do you know how many species are there on our planet?

Likely a number between 1.8 and 8.7 million (honestly I thought there were more than that). The numbers come from two studies recently published in PLoS Biology (How Many Species Are There on Earth and in the Ocean?) and Systematic Biology (Predicting total global species richness using rates of species description and estimates of taxonomic effort). A post in the iPhylo blog discuss the different approaches used by the authors leading to different estimates. What I also find surprising is that the "The Species 2000 & ITIS Catalogue of Life", the most comprehensive catalogue of all known species of organisms on Earth, already contains more than 1,400,000 species!!!

Today "only" a few thousands genomes have been either completely or partially sequenced in Eukaryotes (1215) and Prokaryotes (1865). This means that there is plenty of sequencing work ahead for BGI & Co.

Thursday, 2 February 2012

If you see a crocodile don't forget to...sequence it!

Recently a team led by David Ray, from Mississipi state Univeristy, has published in Genome Biology their plan to sequence the genome of 3 crocodile species: the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus). The work relies both on Illumina and 454 technologies to assemble mid-coverage genomes. Data will be likely available in June.

This will add valuable informations to the reptilian genomics and shed light on the evolution of archosaurs and amniotes. Beyond their charm as mythological creatures, crocodiles are one of the most ancient species living on earth and they show a lot of peculiar biological and behavioral features. Citing the authors: "In addition to their ecological, sociological and economic significance, crocodilians have genomes that will be useful sources of data for biological and biomedical research. Alligator serum has been shown to contain broad spectrum antibiotic peptides. The American alligator has been used extensively as a model for examining the environmental impact of various contaminants, including endocrine disrupting xenobiotics. Crocodilians represent important research organisms for diverse fields that include evolution and phylogenetics, functional morphology, osmoregulation, sex determination, hybridizationand population genetics. To provide the genomic resources necessary to expand our understanding of these fascinating organisms, the ICGWG is obtaining and assembling genome sequences for the American alligator, saltwater crocodile, and gharial, one representative from each of the extant crocodilian families."

Flash Report: Genome Research special issue on Cancer Genomics

The February issue of Genome Research, entitled "Cancer Genomics", includes a number of articles that highlight insights gained from cutting-edge genomic and epigenomic analyses of cancer.

Wednesday, 1 February 2012

2012 will be the year of Nanopore Single Molecule DNA Sequencing

In a couple of weeks we will know more about it. It has been confirmed that Clive Brown, the Chief Technology Officer of Oxford Nanopore Technologies, will be talking at the AGTB meeting about “Single Molecule ‘Strand’ Sequencing Using Protein Nanopores and Scalable Electronic Devices”. On February 17th we will likely know if the new technology will represent a competitor of the (relatively) few but long reads of the PacBio RS or if it will be directely projected in the $1000 genome race.

You can read some interesting speculations on what is going on in the following posts on the Core Genomics blog: Oxford Nanopore confirmed at AGBT; Nanopore sequencing: is the hype about to end? How does a nanopore sequencer work?

Today InSequence has an article with an intriguing title on the same topic "Oxford Nanopore to Commercialize DNA Strand Sequencing this Year; Illumina not Involved".

A press release on the ONT website clearly states "Oxford Nanopore intends to commercialise DNA strand sequencing products, directly to customers within 2012. At the AGBT presentation, Oxford Nanopore will show DNA strand sequencing data and other disruptive features of the Company's proprietary electronics-based sensing devices."

Flash Report: miRdSNP, a database of disease-associated SNPs and microRNA target sites on 3'UTRs of human genes

miRdSNP is a database of manually curated disease-associated SNPs (dSNPs) on the 3’UTRs of human genes. "miRdSNP annotates genes with experimentally confirmed targeting by miRNAs and indexes miRNA target sites predicted by TargetScan and PicTar as well as potential miRNA target sites newly generated by dSNPs. A robust web interface and search tools are provided for studying the proximity of miRNA binding sites to dSNPs in relation to human diseases."

An article describing miRdSNP has been published on the latest issue of BMC Genomics.

Pages