I'm always amazed by the explosion of new species genomes since the introduction of NGS. In the last two years the sequencing and assembly of genomes from various animals and plants have accelerated even more and focused also on "exotic" species, so much that now we have almost a new genome per month! All these data can tell us a lot on basic mechanisms of evolution and provide information to study how complex biological processes have developed and why they act the way we see now. Moreover, many species have peculiar properties and produce biopeptides or other biological molecules that could be useful for life science and medicine.
So, here is a quick update of what has been published in the last months!
The amazing spiderman: Social velvet and tarantula genomes to study silk and venom
Authors from BGI-Shenzhen and the Aarhus University reported on
Nature Communication the assembly of the full genome of social velvet spider and tarantula spider. Besides the genome sequencing and analysis, authors also performed transcriptome sequencing and proteomic analysis by mass spectroscopy. A de novo assembly of the velvet spider (
S. mimosarum) was generated from 91 × coverage sequencing of paired end and mate pair libraries and assembled into contigs and scaffolds spanning 2.55 Gb. Integrating also transcriptome data authors reconstructed a gene set of 27,235 protein-coding gene models. Approximately 400 gene models had no homology to known proteins but were supported by proteomic evidence, identifying putative ‘spider’-specific proteins. The exon-intron structure, unlike other arthropod genomes, is characterized by and intron-exon structure very similar to the human genome. The size estimate of the tarantula genome is about 6 Gb and was sequenced at 40 × coverage from a single female
A. geniculata using a similar combination of paired end and mate pair libraries as for the velvet spider. Authors sequenced proteins from different spider tissues (venom, thorax, abdomen, haemolymph and silk), identifying 120 proteins in venom, 15 proteins in silk and 2,122 proteins from body fluid and tissue samples, for a total of 2,193 tarantula proteins. Introns were found to be longer than those of the velvet spider.
Combining three different omics approaches the paper reconstructed species specific gene duplication and the set of peculiar proteins involved in spiders silk and venom. The analysis revealed enrichment in cysteine-rich peptides with neurotoxic effect and proteases, that specifically activate protoxins in the venom of spiders.
Stick insects: a large sequencing effort to study evolution and speciation
In this paper
published on Science (it appears on the cover magazine), the authors performed whole genome sequencing on several subjects from different populations of stick insects to investigate the role and mechanism of action of selection in adaptation and parallel speciation. Researchers performed a parallel experiment moving four groups of individuals from the original population from their natural host plant to a new one. They sequenced them and their first offspring generation and analyzed genomic variations and their role in adaptation to the new environment. Comparing genomic changes in the four groups allow analysis of parallel speciation and the genomic mechanisms behind the scene.
Polar bear genome: population genomics to dissect adaptation to extreme environments
On this
paper from Cell, authors reconstructed a draft assembly of the polar bear genome and then analyzed 89 complete genomes of polar bear and brown bear using population genomic modeling. Results show that the species diverged 479–343 thousand years ago and that the polar bear lineage have been under stronger positive selection than the brown bears. Several genes specifically selected in polar bears are associated with cardiomyopathy and vascular disease, implying important reorganization of the cardiovascular system. Another group of genes showing strong evidence of selection are those related to lipid metabolism, transport and storage, like APOB. Functional mutations in this gene may explain how polar bears are able to cope with life-long elevated LDL levels.
Sheep genome: now all the major livestock animals have their genome sequence
Researchers from the international sheep genomics consortium
published on Science the first complete assembly of the sheep genome. The team build an assembly that spans 2.61 billion bases of the sheep genome to an average depth of around 150-fold. That assembly covers around 99 percent of the sheep's 26 autosomal chromosomes and X chromosome. In addition to the high-quality reference genome, the team generated transcriptome sequences representing 40 sheep tissues, which contributed to its subsequent analysis of sheep features. Like cattle, sheep are known for feed on plants and deriving useful proteins from lignocellulose-laden material with the help of fermentation and microbes in the rumen. Specialized features of the sheep metabolism go to work on volatile fatty acids that gut bugs produce during that process and other adaptations on fatty acid metabolism features seem to feed into the production of wool fibers, which contain lanolin formed from waxy ester molecules. By adding in transcript sequence data for almost 100 samples taken from 40 sheep tissue types, the researchers looked at the protein-coding genes present in the sheep genome and their relationship to those found in 11 other ruminant and non-ruminant mammals.
Two Crow species: genomes reveal what make them look different
Researchers
published on Science a genomic study on two crow species, the all-black carrion crow and the gray-coated hooded crow — and find that a very small percentage of the birds' genes are responsible for their different looks. Researchers started by assembling the high-quality reference genome for the hooded crow species
C. cornix. The 16.4-million-base assembly — covered to an average depth of 152-fold — contained nearly 20,800 predicted protein-coding genes. The team then resequenced the genomes of 60 hooded or carrion crows at average depths of between 7.1- and 28.6-fold apiece, identifying more than 5.27 million SNPs shared between the two species and more than 8.4 million SNPs in total. Comparison of the two species genomes revealed that varied expression of less than 0.28 percent of the entire genome was enough to maintain different coloration between the two species. This particular 1.95 megabase pair-long area of the genome is located on the avian chromosome 18, and it harbors genes associated with pigment coloration, visual perception, and hormonal balance. Together, the team's findings hint that distinctive physical features are maintained in hooded and carrion crow species despite gene flow across all but a fraction of the genome.
Eucalyptus genome: tandem duplications and essential oils encoded in the DNA
An international team
published on Nature a reference genome for the eucalyptus tree. The researchers used whole-genome Sanger sequencing to build the genome assembly of an
E. grandis representative belonging to the BRASUZI genotype. Using those sequences, together with bacterial artificial chromosome sequences and a genetic linkage map, the team covered more than 94 percent of the plant's predicted 640 million base sequence at an average depth of nearly seven-fold. To facilitate transcripts identification, they added RNA sequences representing different eucalyptus tissue types and developmental stages and reconstructed 36,376 predicted protein-coding eucalyptus genes. The genomes of a sub-tropical representative from
E. grandis BRASUZI and a temperate eucalyptus species called
E. globulus were re-sequenced with Illumina instruments. Comparison of the different genomes revealed that eucalyptus displays the greatest number of tandem duplications of any plant genome sequenced so far, and that the duplications have appear to have prioritized genes for wood formation. The plant also has the highest diversity of genes for producing various essential oils.
Common Bean genome: genomic effects of plant domestication
The reference genome for the common bean,
Phaseolus vulgaris L., was recently
published on Nature Genetics. Authors used a whole-genome shotgun sequencing strategy combining together linear libraries and paired libraries of varying insert sizes, sequenced with the Roche 454 platform. To these data they added 24.1 Gb of Illumina-sequenced fragment libraries and sequences from fosmid libraries and BAC libraries obtained from canonical Sanger platform for a total assembled sequence coverage level of 21.0X. The final assembly covers 473 Mb of the 587-Mb genome and 98% of this sequence is anchored in 11 chromosome-scale pseudomolecules. Using resequencing of 60 wild individuals and 100 landraces from genetically differentiated Mesoamerican and Andean gene pools, the authors performed a
genome-wide analysis of dual domestications and confirmed two independent domestications from genetic pools that diverged before human colonization. They also identified a set of genes linked with increased leaf and seed size. These results identify regions of the genome that have undergone intense selection and thus provide targets for future crop improvement efforts.