Pages

Showing posts with label NGS PubMed Highlights. Show all posts
Showing posts with label NGS PubMed Highlights. Show all posts

Wednesday, 13 April 2016

Resilience project identifies the first 13 genetic heroes!

The Resilience Project has just published on Nature Biotechnology a new paper on the analysis of genomic data from more than 500k subjects in search for the so called "genetic heroes". The authors first aggregated genomic data from various sources, including 23andMe genotyping database, 1000G, ESP6500 and UK10K sequencing projects, Sweden exomes for schizophrenia research, CHOP sequencing program and others, to reach a total of 589,306 subjects with genomic data. Then they applied a strict filtering criteria to identify 13 healthy people bearing a pathogenic mutations for severe Mendelian childhood disease without showing any clinical symptoms.

Analyzing genomic data from these 13 "genetic heroes" the authors are now trying to study protective variants to understand the molecular mechanisms that have rescued the pathogenic mutations, with the potential to provide useful insight on how to treat the corresponding disease.

Thursday, 10 March 2016

Recent interesting facts in genomics!

Human genetic knockouts point to a resilient human genome

According to this paper published in Science, the human genome is more resilient than previously expected and can tolerate a certain amount of disrupted genes without any observable phenotypic effect. The authors "sequenced the exomes of 3222 British Pakistani-heritage adults with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of gene function (knockouts) in 781 genes. [...] Linking genetic data to lifelong health records, knockouts were not associated with clinical consultation or prescription rate.".
Interested? Read the full paper: "Health and population effects of rare gene knockouts in adult humans with related parents"


Genetic alterations in regulatory elements could predict personal health history

This paper published on PLoS Computational Biology analyze the impact of personal genetic variants on conserved regulatory elements and how this information could be used to predict health related traits. By analyzing transcription factor binding sites disrupted by an individual’s variants and then look for their most significant congregation next to a group of functionally related genes, the authors found that the top enriched function is invariably reflective of medical histories. As stated by authors these "results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history". They also developed a computational test to interpret personal genomes based on their approach that "promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them".
Interested? Read the full paper: "Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories"


Don't forget about exonic splice-affecting mutations

In this interesting paper on PLoS Genetics authors evaluate the prevalence of splice-affecting exonic variants. This kind of variants are often neglected in the canonical pipelines searching for causative mutations, even if aberrant splicing can obviously have a major impact on gene function. Using MLH1 as a model gene, the authors found that the frequency of this kind of mutations is higher than expected, suggesting that they deserve more attention in future analysisi. Moreover the paper also provide with a comparative evaluation of different in silico prediction alghoritms assessing their performance in splice-affecting variants classification.
Interested? Read the full paper: "Exonic Splicing Mutations Are More Prevalent than Currently Estimated and Can Be Predicted by Using In Silico Tools"

The health impact of your Neanderthal ancestry

Another interesting story published recently on Science journal pointed out the influence of Neanderthal ancestry on human health-related traits. The authors analyzed how alleles inherited from Neanderthals in present European population impact clinically relevant phenotypes and they found associations for neurological, psychiatric, immunological, and dermatological phenotypes. The results indicate that archaic admixture influences disease risk in modern humans, including risk for depression and skin lesions resulting from sun exposure, hypercoagulation and tobacco use.
Interested? Read the full paper: "The phenotypic legacy of admixture between modern humans and Neandertals"


A map of transciptomic cellular landscape in visual cortex by single cell RNA-Seq

This study from Nature Neuroscience used single cell RNA-Seq on more than 1,600 cells to construct a cellular taxonomy of the primary visual cortex in adult mice. Authors identified 49 transcriptomic cell types, displaying specific and differential electrophysiological and axon projection properties, confirming that the single-cell transcriptomic signatures can be associated with specific cellular properties. These results open new perspective on cell level organization within brain tissue, first of all on the potential causal relationships between transcriptomic signatures and specific morphological, physiological and functional properties. Another interesting point, as noted by the authors in to investigate if "certain transcriptomic differences [are] representative of cell state or activity, rather than cell type.
Interested? Read the full paper: "Adult mouse cortical cell taxonomy revealed by single cell transcriptomics"


Thursday, 1 October 2015

1000G and UK10K publish results of large scale human genome sequencing

Risultati immagini per human genetic variability

Both 1000G and UK10K consortia have recently published the results of their analysis on the variability of human genomes, based on their large scale genomics projects.


In the 1000G papers appeared on Nature the consortium described SNV and structural variants findings based on the phase 3 dataset. 
Citing the abstract, they have analyzed "2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries."

You can find the papers here:
A global reference for human genetic variation (Nature, 2015)

An integrated map of structural variation in 2,504 human genomes (Nature, 2015)



UK10K LogoThe UK10K consortium also published a detailed description of the human genetic variability based on around 10,000 samples, partially low coverage WGS of control samples and partially deep covered WES focused on various complex and rare diseases.
Citing the abstract, "Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections."
By the way, they published also an improved haplotype reference panel that can be used to improve imputation of low-frequency and rare variants also developed an online tools to explore their association results.
The third paper is a first example of the disease-oriented results obtained by the consortium: they identified EN1 as a gene involved in reduced bone density and recurrent fracture.

You can find the papers here:
The UK10K project identifies rare variants in health and disease (Nature, 2015)

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel (Nature Communications, 2015)

Monday, 28 September 2015

PubMed Highlights: In a recent special issue Science discuss recent advances in human genomics, their impact for human health and future perspectives

Science Magazine has just published a special issue on the recent advances in genomics, as well as the promise and challenges of the new technologies for human health. The special issue is composed by a set of reviews discussing the main topics in the field:

  • the use of new sequencing technologies for the identification of genetic mutations causing human disease and the improvements in our ability to interpret the consequences of such mutations.
  • the state of the art in somatic variant detection and the impact of the new technology on cancer research
  • the genetic and genomics of psychiatric diseases, reporting the difficulties in understanding the interplay of inherited genetics and spontaneous mutations in complex diseases
  • the impact on mitochondrial disease research


In addition there are some commentary on ethical and privacy related challenges emerging from the wide adoption of the new genomics technologies and their use for preimplantation screens and human health care. Finally, there is a report on the recent effort from NIH to deliver personalized medicine.

A must read issue!

Tuesday, 2 September 2014

PubMed Highlight: New relese of ENCODE and modENCODE

Five papers that summarize the latest data from ENCODE and modENCODE consortia have recently been published on Nature. Together, the publications add more than 1,600 new data sets, bringing the total number of data sets from ENCODE and modENCODE to around 3,300.

The growth of ENCODE and modENCODE data sets.

The authors analyze RNA-Seq data produced in the three species and an extensive effort was conducted in Drosophila to investigate genes expressed only in specific tissue, developmental stages or only after specific perturbations.  The analysis also identified many new candidate long non-coding RNAs, including ones that overlap with previously defined mutations that have been associated with developmental defects.
Other data sets derive from chromatin binding assays focused on transcription-regulatory factors in human cell lines, Drosophila and C. elegans; and on study of DNA accessibility and certain modifications to histone proteins. These new chromatin data sets led to identification of several features common to the three species, such as shared histone-modification patterns around genes and regulatory regions.
The new transcriptome data sets will result in more precise gene annotations in all three species, which should be released soon. The access to the data on chromatin features, regulatory-factor binding sites, and the regulatory-element predictions seem more difficult. We have to wait for them to be integrated in user-friendly portals for data visualization and flexible analyses. The UCSC Genome Browser, Ensembl, ENCODE consortium are all working to provide the solution.

Meanwhile take a look to the papers:
Diversity and dynamics of the Drosophila transcriptome Regulatory analysis of the C. elegans genome with spatiotemporal resolution Comparative analysis of metazoan chromatin organization

Friday, 4 July 2014

PubMed highlight: Literome help you find relevant papers in the "genomic" literature

This tool mines the "genomic" literature for your gene of interest and reports a list of interactions with other genes, specifying also the kind of the relation (inhibit, activate, regulate...). It can also search for a SNP and find phenotypes associated to it by GWAS. You can then filter the results and also report if the listed interactions are actually real or not.

Good stuff to quickly identify relevant papers in the large amount of genomic researches!

Literome: PubMed-scale genomic knowledge base in the cloud

Hoifung Poon, Chris Quirk, Charlie DeZiel and David Heckerman

Abstract
Motivation: Advances in sequencing technology have led to an exponential growth of genomics data, yet it remains a formidable challenge to interpret such data for identifying disease genes and drug targets. There has been increasing interest in adopting a systems approach that incorporates prior knowledge such as gene networks and genotype–phenotype associations. The majority of such knowledge resides in text such as journal publications, which has been undergoing its own exponential growth. It has thus become a significant bottleneck to identify relevant knowledge for genomic interpretation as well as to keep up with new genomics findings.
Results: In the Literome project, we have developed an automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a Web site to facilitate browsing, searching and reasoning. Currently, Literome focuses on two types of knowledge most pertinent to genomic medicine: directed genic interactions such as pathways and genotype–phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as diseases and drugs associated with a single nucleotide polymorphism or gene. Users can also search for indirect connections between two entities, e.g. a gene and a disease might be linked because an interacting gene is associated with a related disease.

Availability and implementation: Literome is freely available at literome.azurewebsites.net. Download for non-commercial use is available via Web services.

Monday, 23 June 2014

PubMed Highlight: Complete review of computational biology free courses

This paper is a great resource for anyone looking to get started in a computational biology, or just looking to an insight on a specific topics ranging from natural language processing to evolutionary theory. The author describes hundreds of video courses that are foundational to a good understanding of computational biology and bioinformatics. The table of contents breaks the curriculum down into 11 "departments" with links to online courses in each subject area:
  • Mathematics Department
  • Computer Science Department
  • Data Science Department
  • Chemistry Department
  • Biology Department
  • Computational Biology Department
  • Evolutionary Biology Department
  • Systems Biology Department
  • Neurosciences Department
  • Translational Sciences Department
  • Humanities Department

Listings in the catalog can take one of three forms: Courses, Current Topics, or Seminars. All listed courses are video-based and free of charge. The author has tested most of the courses, having enrolled in up to a dozen at a time, and he shared his experience in this paper. So you can find commentary on the importance of the subject and an opinion on the quality of instruction. For the courses that the author completed, listings have an "evaluation" section, which ranks the course in difficulty, time requirements, lecture/homework effectiveness, assessment quality, and overall opinions. Finally there are also autobiographical annotations reporting why the courses have revealed useful in a bioinformatics career. 

Don't miss this!

PubMed Highlight: VarMod, modelling the functional effects of non-synonymous variants

On Nucleic Acid Research, authors from Uuniversity of Kent published the varmod tool. By incorporating protein sequence and structural feature cues into the non-synonymous variant analysis, their Variant Modeller method provides clues to understanding genotype effects on phenotype, the study authors note. Their proof-of-principle analysis of 3,000 such variants suggests VarMod predicts protein function and structural effects with accuracy that's on par with that offered by the PolyPhen-2 tool.


Abstract
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod.

Insects, sheep, polar bears, crow, beans and eucalyptus...all the genomes you want!

I'm always amazed by the explosion of new species genomes since the introduction of NGS. In the last two years the sequencing and assembly of genomes from various animals and plants have accelerated even more and focused also on "exotic" species, so much that now we have almost a new genome per month! All these data can tell us a lot on basic mechanisms of evolution and provide information to study how complex biological processes have developed and why they act the way we see now. Moreover, many species have peculiar properties and produce biopeptides or other biological molecules that could be useful for life science and medicine.
So, here is a quick update of what has been published in the last months!

The amazing spiderman: Social velvet and tarantula genomes to study silk and venom
Authors from BGI-Shenzhen and the Aarhus University reported on Nature Communication the assembly of the full genome of social velvet spider and tarantula spider. Besides the genome sequencing and analysis, authors also performed transcriptome sequencing and proteomic analysis by mass spectroscopy. A de novo assembly of the velvet spider (S. mimosarum) was generated from 91 × coverage sequencing of paired end and mate pair libraries and assembled into contigs and scaffolds spanning 2.55 Gb. Integrating also transcriptome data authors reconstructed a gene set of 27,235 protein-coding gene models. Approximately 400 gene models had no homology to known proteins but were supported by proteomic evidence, identifying putative ‘spider’-specific proteins. The exon-intron structure, unlike other arthropod genomes, is characterized by and intron-exon structure very similar to the human genome. The size estimate of the tarantula genome is about 6 Gb and was sequenced at 40 × coverage from a single female A. geniculata using a similar combination of paired end and mate pair libraries as for the velvet spider. Authors sequenced proteins from different spider tissues (venom, thorax, abdomen, haemolymph and silk), identifying 120 proteins in venom, 15 proteins in silk and 2,122 proteins from body fluid and tissue samples, for a total of 2,193 tarantula proteins. Introns were found to be longer than those of the velvet spider.
Combining three different omics approaches the paper reconstructed species specific gene duplication and the set of peculiar proteins involved in spiders silk and venom. The analysis revealed enrichment in cysteine-rich peptides with neurotoxic effect and proteases, that specifically activate protoxins in the venom of spiders.

Stick insects: a large sequencing effort to study evolution and speciation
In this paper published on Science (it appears on the cover magazine), the authors performed whole genome sequencing on several subjects from different populations of stick insects to investigate the role and mechanism of action of selection in adaptation and parallel speciation. Researchers performed a parallel experiment moving four groups of individuals from the original population from their natural host plant to a new one. They sequenced them and their first offspring generation and analyzed genomic variations and their role in adaptation to the new environment. Comparing genomic changes in the four groups allow analysis of parallel speciation and the genomic mechanisms behind the scene.

Polar bear genome: population genomics to dissect adaptation to extreme environments
On this paper from Cell, authors reconstructed a draft assembly of the polar bear genome and then analyzed 89 complete genomes of polar bear and brown bear using population genomic modeling. Results show that the species diverged 479–343 thousand years ago and that the polar bear lineage have been under stronger positive selection than the brown bears. Several genes specifically selected in polar bears are associated with cardiomyopathy and vascular disease, implying important reorganization of the cardiovascular system. Another group of genes showing strong evidence of selection are those related to lipid metabolism, transport and storage, like APOB. Functional mutations in this gene may explain how polar bears are able to cope with life-long elevated LDL levels.

Sheep genome: now all the major livestock animals have their genome sequence
Researchers from the international sheep genomics consortium published on Science the first complete assembly of the sheep genome. The team build an assembly that spans 2.61 billion bases of the sheep genome to an average depth of around 150-fold. That assembly covers around 99 percent of the sheep's 26 autosomal chromosomes and X chromosome. In addition to the high-quality reference genome, the team generated transcriptome sequences representing 40 sheep tissues, which contributed to its subsequent analysis of sheep features. Like cattle, sheep are known for feed on plants and deriving useful proteins from lignocellulose-laden material with the help of fermentation and microbes in the rumen. Specialized features of the sheep metabolism go to work on volatile fatty acids that gut bugs produce during that process and other adaptations on fatty acid metabolism features seem to feed into the production of wool fibers, which contain lanolin formed from waxy ester molecules. By adding in transcript sequence data for almost 100 samples taken from 40 sheep tissue types, the researchers looked at the protein-coding genes present in the sheep genome and their relationship to those found in 11 other ruminant and non-ruminant mammals.

Two Crow species: genomes reveal what make them look different
Researchers published on Science a genomic study on two crow species, the all-black carrion crow and the gray-coated hooded crow — and find that a very small percentage of the birds' genes are responsible for their different looks. Researchers started by assembling the high-quality reference genome for the hooded crow species C. cornix. The 16.4-million-base assembly — covered to an average depth of 152-fold — contained nearly 20,800 predicted protein-coding genes. The team then resequenced the genomes of 60 hooded or carrion crows at average depths of between 7.1- and 28.6-fold apiece, identifying more than 5.27 million SNPs shared between the two species and more than 8.4 million SNPs in total. Comparison of the two species genomes revealed that varied expression of less than 0.28 percent of the entire genome was enough to maintain different coloration between the two species. This particular 1.95 megabase pair-long area of the genome is located on the avian chromosome 18, and it harbors genes associated with pigment coloration, visual perception, and hormonal balance. Together, the team's findings hint that distinctive physical features are maintained in hooded and carrion crow species despite gene flow across all but a fraction of the genome.

Eucalyptus genome: tandem duplications and essential oils encoded in the DNA
An international team published on Nature a reference genome for the eucalyptus tree. The researchers used whole-genome Sanger sequencing to build the genome assembly of an E. grandis representative belonging to the BRASUZI genotype. Using those sequences, together with bacterial artificial chromosome sequences and a genetic linkage map, the team covered more than 94 percent of the plant's predicted 640 million base sequence at an average depth of nearly seven-fold. To facilitate transcripts identification, they added RNA sequences representing different eucalyptus tissue types and developmental stages and reconstructed 36,376 predicted protein-coding eucalyptus genes. The genomes of a sub-tropical representative from E. grandis BRASUZI and a temperate eucalyptus species called E. globulus were re-sequenced with Illumina instruments. Comparison of the different genomes revealed that eucalyptus displays the greatest number of tandem duplications of any plant genome sequenced so far, and that the duplications have appear to have prioritized genes for wood formation. The plant also has the highest diversity of genes for producing various essential oils.

Common Bean genome: genomic effects of plant domestication
The reference genome for the common bean, Phaseolus vulgaris L., was recently published on Nature Genetics. Authors used a whole-genome shotgun sequencing strategy combining together linear libraries and paired libraries of varying insert sizes, sequenced with the Roche 454 platform. To these data they added 24.1 Gb of Illumina-sequenced fragment libraries and sequences from fosmid libraries and BAC libraries obtained from canonical Sanger platform for a total assembled sequence coverage level of 21.0X. The final assembly covers 473 Mb of the 587-Mb genome and 98% of this sequence is anchored in 11 chromosome-scale pseudomolecules. Using resequencing of 60 wild individuals and 100 landraces from genetically differentiated Mesoamerican and Andean gene pools, the authors performed a genome-wide analysis of dual domestications and confirmed two independent domestications from genetic pools that diverged before human colonization. They also identified a set of genes linked with increased leaf and seed size. These results identify regions of the genome that have undergone intense selection and thus provide targets for future crop improvement efforts.

Monday, 19 May 2014

Pubmed highlight: SNP detection tools comparison

Performance comparison of SNP detection tools with illumina exome sequencing data-an assessment using both family pedigree information and sample-matched SNP array data.

Nucleic acids research. 2014 May 15. pii: gku392

Abstract

To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios-family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.

Wednesday, 30 April 2014

Pubmed highlight: Comparison of mapping alghoritms

Map millions of reads to some reference genome sequence is in most cases the first step in NGS data analysis. Proper mapping is essential for downstream variant identification and assessing of the quality of each sequenced base. Various tools exist to perform this task and this paper present an interesting new tool to benchmark results from the different aligners. Authors developed CuReSim, a tool able to generate simulated reads dataset resembling differetn NGS technologies, and CuReSimEval, that perform the performance evaluation for a given aligner given the created dataset.
In the paper they apply this new method to compare the performance of some popular aligners (like BWA, TMAP and BowTie) working on Ion Torrent data. "The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes [...] demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used." they reports in the abstract.

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.
Caboche S, Audebert C, Lemoine Y, Hot D

Abstract
BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms.
RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established.
CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.