PubMed Highlight: New relese of ENCODE and modENCODE

Five papers that summarize the latest data from ENCODE and modENCODE consortia have recently been published on Nature. Together, the publications add more than 1,600 new data sets, bringing the total number of data sets from ENCODE and modENCODE to around 3,300.

The growth of ENCODE and modENCODE data sets.

The authors analyze RNA-Seq data produced in the three species and an extensive effort was conducted in Drosophila to investigate genes expressed only in specific tissue, developmental stages or only after specific perturbations.  The analysis also identified many new candidate long non-coding RNAs, including ones that overlap with previously defined mutations that have been associated with developmental defects.
Other data sets derive from chromatin binding assays focused on transcription-regulatory factors in human cell lines, Drosophila and C. elegans; and on study of DNA accessibility and certain modifications to histone proteins. These new chromatin data sets led to identification of several features common to the three species, such as shared histone-modification patterns around genes and regulatory regions.
The new transcriptome data sets will result in more precise gene annotations in all three species, which should be released soon. The access to the data on chromatin features, regulatory-factor binding sites, and the regulatory-element predictions seem more difficult. We have to wait for them to be integrated in user-friendly portals for data visualization and flexible analyses. The UCSC Genome Browser, Ensembl, ENCODE consortium are all working to provide the solution.

