Monday, 14 July 2014
After about a decade from the first appearance if NGS sequencing we have seen incredible improvements in throughput, accuracy and analysis methods and sequencing is now more diffused and easy to achieve also for small labs. Researchers have produced tons of sequencing data and the new technology allowed us to investigate DNA and human genomic variations at unprecedent scale and precision.
However, beside the milestones achieved, we have now to deal with new challenges that were largely underestimated in the early days of NGS.
MassGenomics has a nice blog post underlining the main ones, that I reported here:
Where do we put all those data from large genomic sequencing projects? Can we afford the cost of store everything or we have to be more selectively on what to keep in our hard drives?
GWAS studies have showed us that large numbers, in the order of 10 thousands of samples, are needed to achieve statistical significance for association studies, particularly for common diseases. Even when you consider the present low price of 1,000$ / genome it will require around 10 millions $ for such a sequencing project. So we can reduce our sample size (and thus significance) or create mega consortium with all the managing issues.
Samples became precious resources.
In the present scenario sequencing power is not longer a limitation. The real matter is find enough well-characterized samples to sequence!
Whole genome and whole exome approaches let researchers to rapidly identify new variants potentially related to phenotypes. But which of them are truly relevant? Our present knowledge do not allow for a confident prediction of functional impact of genetic variation and thus functional studies are often needed to assess the actual role of each variants. These studies, based on cellular models or animal models, could be expensive and complicated.
With large and increasing amount of genomic data available to the community and studies showing that people ancestry and living location could be traced using them (at least in a proportion of cases), there are concerns about how "anonymous" these kind of data could really be. This is going to became a real problem has more and more genomes are sequenced.
Friday, 4 July 2014
Literome: PubMed-scale genomic knowledge base in the cloud
Hoifung Poon, Chris Quirk, Charlie DeZiel and David Heckerman
Motivation: Advances in sequencing technology have led to an exponential growth of genomics data, yet it remains a formidable challenge to interpret such data for identifying disease genes and drug targets. There has been increasing interest in adopting a systems approach that incorporates prior knowledge such as gene networks and genotype–phenotype associations. The majority of such knowledge resides in text such as journal publications, which has been undergoing its own exponential growth. It has thus become a significant bottleneck to identify relevant knowledge for genomic interpretation as well as to keep up with new genomics findings.
Results: In the Literome project, we have developed an automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a Web site to facilitate browsing, searching and reasoning. Currently, Literome focuses on two types of knowledge most pertinent to genomic medicine: directed genic interactions such as pathways and genotype–phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as diseases and drugs associated with a single nucleotide polymorphism or gene. Users can also search for indirect connections between two entities, e.g. a gene and a disease might be linked because an interacting gene is associated with a related disease.
Availability and implementation: Literome is freely available at literome.azurewebsites.net. Download for non-commercial use is available via Web services.
Wednesday, 25 June 2014
The project try to investigate the relation between genomics and environmental factors to define their impact on human life and define which advantages this kind of genomic screening could provide for the human health. The massive longitudinal project that would sequence the genomes of 100,000 US babies and collect loads of environmental, lifestyle, and medical data on them until the age of 21.
However the NIH director, Francis Collins, has recently announced that the project will be stopped waiting for a detailed review on the methodologies applied and the opportunity to complete it in its present form. Few key questions has to be addressed: Is the study actually feasible, particularly in light of budget constraints? If so, what changes need to be made? If not, are there other methods for answering the key research questions the study was designed to address?
As GenomeWeb reports, National Academy of Sciences (NAS) released a report saying the NCS needs some major changes to its design, management, and oversight. The NAS recommendations include making some changes to the core hypotheses behind the study, beefing up scientific input and oversight, and enrolling the subjects during pregnancy, instead of at birth, as is the current plan.
Monday, 23 June 2014
The group of professor James Dale, from the Queensland University of Technology, received a $10 million grant from the Bill and Melinda Gates Foundation to support this 9 year project.
Dale said that by 2020 vitamin A-enriched banana varieties would be grown by farmers in Uganda, where about 70% of the population survive on the fruit.
The genome of a baby sequenced before birth raises questions on the opportunity and pitfalls of genome screening
Moreover, if the health public service continues to stand against whole genome screening, people will soon turn to private companies, that can already provide this kind of services. This policy will thus increase the risk of incomplete or misleading interpretations without any kind of support from medical stuff.
The topic of genomic risk assessment in healthy people has been recently discussed also on the New England Jornal of Medicine, that published a review on clinical whole exome and whole genome sequencing. The journal also presented the hypothetical scenario of a subject which discovers some cancer affected relatives and wants to undergo genetic testing. They propose 2 strategies, gene panel or whole exome/genome sequencing and the case is open for readers to comment with even a pool to vote for your preferred solution.
- Mathematics Department
- Computer Science Department
- Data Science Department
- Chemistry Department
- Biology Department
- Computational Biology Department
- Evolutionary Biology Department
- Systems Biology Department
- Neurosciences Department
- Translational Sciences Department
- Humanities Department
Morena Pappalardo and Mark N. Wass
Overall, improvements have to be made to the base-calling software, reliability of the flow cells, and library shelf-life, and new software needs to be developed by the community to take advantage of the MinION reads. Oxford Nanopore said a new chemistry will be available in the coming months, which might include some of these improvements.
So, here is a quick update of what has been published in the last months!
Monday, 19 May 2014
Performance comparison of SNP detection tools with illumina exome sequencing data-an assessment using both family pedigree information and sample-matched SNP array data.
Nucleic acids research. 2014 May 15. pii: gku392
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios-family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.