A blog with news and curiosity on genomics subjects with a particular interest for topics related to Next Generation Sequencing, Personal Genomics and Bioinformatics. We work at the University of Brescia (Italy) and are new in the field but with a lot of energy to share.
This new chemistry based on isothermal amplification will provide longer reads up to 600bp on PGM an 400bp on Proton PI with lower costs for template preparation.
New protocols for Ampliseq library preparation on Ion Chef system.
The company aims to transfer all the processing steps on the new Ion Chef that will beacm the all-in-one solution from library prep to chip loading, minimizing hand on time. Take a look to the video on you tube.
News on the PII chip.
They finally reported about a full working version of the PII chip producing up to 300M reads with 100bp length. However an official release date was not announced
NeoPrep, the new automated system for library preparation.
Having already presented some new sequencers few months ago, Illumina come to the stage with this new equipment that will provide fast and accurate library preparation and can prepare 16 libraries at with as little as 30 minutes of hands-on time. NeoPrep is based on the electrowetting microfluidic technology and use A cartridge with reagents and your sheared DNA. The instrument generate and quantify each library, ready for pooling. Roughly is required per run. All this stuff at the price of $49K ($39K introductory pricing for the first 6 months). More details at the Illumina page or on Omics Omics blog.
10X Genomics is the real innovation this year. They revealed their new GemCode technology to reconstruct long reads and haplotypes from standard short reads sequencing. The new instrument will integrate in the standard Illumina based library preparation, thus configuring as an add-on for already equipped laboratories. Using dedicated pen source software and genome browsers from 10X Genomics one can then generate standard file formats (such as phased VCF or BAM with phase tags) and visualize the reconstructed haplotypes.
Here it is another edition of the exciting conference AGBT (advances in genome biology an technology).
As usual there are a lot of new technologies announced and hundreds of interesting talks touching every aspects of genomics and presenting the latest technologies and methods!
In this first day one there is a lot of talking about the new platform from 10X Genomics that promise to use massive fragment labeling to assemble long reads from standard short reads produce by Illumina technology. The new instrument will thus be an add-on for seqeucning labs instead of a completely new sequencing technologies with the ability to deliver long haplotypes.
Let see what they will reveal at the official presentations
Don't forget to follow the conference in real-time on twitter #AGBT15
On October 29th, the Exome Aggregation Consoritum as released its browser based on the impressive number of 63,000 human exomes.
This database is the larger collection of human exome data so far and provide both a web base interface to retrieve variants in your gene of interest or the download of a VCF file containing the list of all the annotated variants.
The final dataset is based on sequences from several consortia working on complex disorders and also includes 1000G and ESP6500 data.
The first aim of the Consortium is to study the distribution of "human knockout", that is people having both copies of a given gene inactivated by severe mutations. The analysis of associated phenotype data promise to reveal lot of interesting information of the actual role of single human genes. Moreover, the study of subjects carrying inactivating mutations on known disease genes but not showing the expected phenotype could lead to identification of new therapeutic targets.
Five papers that summarize the latest data from ENCODE and modENCODE consortia have recently been published on Nature. Together, the publications add more than 1,600 new data sets, bringing the total number of data sets from ENCODE and modENCODE to around 3,300.
The authors analyze RNA-Seq data produced in the three species and an extensive effort was conducted in Drosophila to investigate genes expressed only in specific tissue, developmental stages or only after specific perturbations. The analysis also identified many new candidate long non-coding RNAs, including ones that overlap with previously defined mutations that have been associated with developmental defects.
Other data sets derive from chromatin binding assays focused on transcription-regulatory factors in human cell lines, Drosophila and C. elegans; and on study of DNA accessibility and certain modifications to histone proteins. These new chromatin data sets led to identification of several features common to the three species, such as shared histone-modification patterns around genes and regulatory regions.
The new transcriptome data sets will result in more precise gene annotations in all three species, which should be released soon. The access to the data on chromatin features, regulatory-factor binding sites, and the regulatory-element predictions seem more difficult. We have to wait for them to be integrated in user-friendly portals for data visualization and flexible analyses. The UCSC Genome Browser, Ensembl, ENCODE consortium are all working to provide the solution.
After about a decade from the first appearance if NGS sequencing we have seen incredible improvements in throughput, accuracy and analysis methods and sequencing is now more diffused and easy to achieve also for small labs. Researchers have produced tons of sequencing data and the new technology allowed us to investigate DNA and human genomic variations at unprecedent scale and precision.
However, beside the milestones achieved, we have now to deal with new challenges that were largely underestimated in the early days of NGS.
Where do we put all those data from large genomic sequencing projects? Can we afford the cost of store everything or we have to be more selectively on what to keep in our hard drives?
GWAS studies have showed us that large numbers, in the order of 10 thousands of samples, are needed to achieve statistical significance for association studies, particularly for common diseases. Even when you consider the present low price of 1,000$ / genome it will require around 10 millions $ for such a sequencing project. So we can reduce our sample size (and thus significance) or create mega consortium with all the managing issues.
Samples became precious resources.
In the present scenario sequencing power is not longer a limitation. The real matter is find enough well-characterized samples to sequence!
Whole genome and whole exome approaches let researchers to rapidly identify new variants potentially related to phenotypes. But which of them are truly relevant? Our present knowledge do not allow for a confident prediction of functional impact of genetic variation and thus functional studies are often needed to assess the actual role of each variants. These studies, based on cellular models or animal models, could be expensive and complicated.
With large and increasing amount of genomic data available to the community and studies showing that people ancestry and living location could be traced using them (at least in a proportion of cases), there are concerns about how "anonymous" these kind of data could really be. This is going to became a real problem has more and more genomes are sequenced.
This tool mines the "genomic" literature for your gene of interest and reports a list of interactions with other genes, specifying also the kind of the relation (inhibit, activate, regulate...). It can also search for a SNP and find phenotypes associated to it by GWAS. You can then filter the results and also report if the listed interactions are actually real or not.
Good stuff to quickly identify relevant papers in the large amount of genomic researches!
Hoifung Poon, Chris Quirk, Charlie DeZiel and David Heckerman
Abstract Motivation: Advances in sequencing technology have led to an exponential growth of genomics data, yet it remains a formidable challenge to interpret such data for identifying disease genes and drug targets. There has been increasing interest in adopting a systems approach that incorporates prior knowledge such as gene networks and genotype–phenotype associations. The majority of such knowledge resides in text such as journal publications, which has been undergoing its own exponential growth. It has thus become a significant bottleneck to identify relevant knowledge for genomic interpretation as well as to keep up with new genomics findings. Results: In the Literome project, we have developed an automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a Web site to facilitate browsing, searching and reasoning. Currently, Literome focuses on two types of knowledge most pertinent to genomic medicine: directed genic interactions such as pathways and genotype–phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as diseases and drugs associated with a single nucleotide polymorphism or gene. Users can also search for indirect connections between two entities, e.g. a gene and a disease might be linked because an interacting gene is associated with a related disease.
Availability and implementation: Literome is freely available at literome.azurewebsites.net. Download for non-commercial use is available via Web services.
One of the most ambitious project and one of the few attempts to really perform "personal genomics", is (or I may say was) the National Children's Study (NCS) sustained by NIH and the US government.
The project try to investigate the relation between genomics and environmental factors to define their impact on human life and define which advantages this kind of genomic screening could provide for the human health. The massive longitudinal project that would sequence the genomes of 100,000 US babies and collect loads of environmental, lifestyle, and medical data on them until the age of 21.
However the NIH director, Francis Collins, has recently announced that the project will be stopped waiting for a detailed review on the methodologies applied and the opportunity to complete it in its present form. Few key questions has to be addressed: Is the study actually feasible, particularly in light of budget constraints? If so, what changes need to be made? If not, are there other methods for answering the key research questions the study was designed to address?
As GenomeWeb reports, National Academy of Sciences (NAS) released a report saying the NCS needs some major changes to its design, management, and oversight. The NAS recommendations include making some changes to the core hypotheses behind the study, beefing up scientific input and oversight, and enrolling the subjects during pregnancy, instead of at birth, as is the current plan.
According to GenomeWeb and The Guardian, researchers from Australia are tweaking the genome of the banana in order to get it to deliver higher levels of vitamin A. The study is aimed to supplement vitamin A in Uganda and other similar population, where banana is one of the main food sources and deficiency in vitamin A cause blindness and death in children.
The group of professor James Dale, from the Queensland University of Technology, received a $10 million grant from the Bill and Melinda Gates Foundation to support this 9 year project.
Dale said that by 2020 vitamin A-enriched banana varieties would be grown by farmers in Uganda, where about 70% of the population survive on the fruit.
Khan, graduate student at the University of California, Davis, and blogger at The Unz Review, decided that he wanted detailed genetic information on his child as soon as he knew that his wife was pregnant. After a genetic test for chromosomal abnormalities he asked to have the DNA sample back and managed to have the baby's genome sequenced on one of the University NGS instruments.
MIT technology review reports about the whole story and Khan tells on the many difficulties he faced to have the genome sequencing done. Most of the medical staff tried to discourage him from performing this kind of test, afraid that the couple could take irrevocable decisions, such as pregnancy termination, based on the presence of putative deleterious mutations in the baby's genome. This case raised again the question of how much information could be extracted from a single genome, which part of this information is really useful on a medical care basis and which part is actionable in nowadays.
It seems to me that by now our ability to robustly correlate genotypes to phenotypes is still scarce. This is due to incomplete knowledge about causative and risk associated mutations as well as on the molecular and genetic mechanisms that lead from genetic variants to phenotypes. Studies in the last years have demonstrated that this path is not straightforward and the actual phenotypes often depend on the interaction of several genetic components and regulatory mechanisms, living aside the environmental factors.
Several disease mutations show incomplete penetrance and many example exist of variants linked to phenotypes only in specific populations, so a reliable interpretation of genomic data seems far away by now.
However, many decision can be made knowing your DNA sequence and this information will become even more interesting as researchers continue to find new associations and elucidate genotype-phenotype correlation mechanisms.
Moreover, if the health public service continues to stand against whole genome screening, people will soon turn to private companies, that can already provide this kind of services. This policy will thus increase the risk of incomplete or misleading interpretations without any kind of support from medical stuff.
A lot has to be discussed on the practical and ethical point of view, but we have to face the reality that since these kind of tests are going to became easily accessible in the near future, we have also to find a way to provide the correct information to the subject analyzed.
The topic of genomic risk assessment in healthy people has been recently discussed also on the New England Jornal of Medicine, that published a review on clinical whole exome and whole genome sequencing. The journal also presented the hypothetical scenario of a subject which discovers some cancer affected relatives and wants to undergo genetic testing. They propose 2 strategies, gene panel or whole exome/genome sequencing and the case is open for readers to comment with even a pool to vote for your preferred solution.
This paper is a great resource for anyone looking to get started in a computational biology, or just looking to an insight on a specific topics ranging from natural language processing to evolutionary theory. The author describes hundreds of video courses that are foundational to a good understanding of computational biology and bioinformatics. The table of contents breaks the curriculum down into 11 "departments" with links to online courses in each subject area:
Computer Science Department
Data Science Department
Computational Biology Department
Evolutionary Biology Department
Systems Biology Department
Translational Sciences Department
Listings in the catalog can take one of three forms: Courses, Current Topics, or Seminars. All listed courses are video-based and free of charge. The author has tested most of the courses, having enrolled in up to a dozen at a time, and he shared his experience in this paper. So you can find commentary on the importance of the subject and an opinion on the quality of instruction. For the courses that the author completed, listings have an "evaluation" section, which ranks the course in difficulty, time requirements, lecture/homework effectiveness, assessment quality, and overall opinions. Finally there are also autobiographical annotations reporting why the courses have revealed useful in a bioinformatics career.