The new assembly of the Zebrafish (D. rerio) genome has been recently published in Nature describing also the complete set of proteins encoded in the teleost DNA and their relationship to human orthologs. In a second paper the research group propose a complete genome-wide analysis on genotype-phenotype correlation for every single protein coding gene in the assembly.
The first paper describe the latest assembly of the D. rerio genome.
The first assembly of the Zebrafish genome was made available in 2002 (Zv1), and now the Zebrafish Genome Project has produced the latest detailed assembly based on NGS Illumina technology. The new version represents a great improvement: it provides a better coverage of the entire genome sequence, helped resolve tricky artifacts that are still sparse along the fish genome and allow better identification of the complete set of protein coding genes (about 26,000). Also the catalog of small RNA and repeated elements has been update and some mis-annotated genes has been removed (mostly genes from other species that was assigned to zebrafish in previous assemblies).
Having a complete and robust assembly of this teleost genome is a key factor for research community, giving its important role as animal model for studies on development and characterization of functional impact of mutations in disease genes.
The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013 Apr 25;496(7446):498-503.
Howe K, et al.
Wellcome Trust Sanger Institute
Abstract
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The second paper describe a project for genome-wide characterization of mutations in every single protein coding genes in D. rerio.
In this perspective the research group working on D. rerio genome has published another interesting paper describing their active project to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using high-throughput sequencing and efficient chemical mutagenesis. They have already identified pote
ntially disruptive mutations in more than 38% of all known zebrafish protein-coding genes and assessed the effects of each mutation during embryogenesis. Moreover they have analysed the phenotypic consequences of over 1,000 alleles, making all data available to the community for genotype-phenotype correlation studies.
A systematic genome-wide analysis of zebrafish protein-coding gene function.Nature. 2013 Apr 25;496(7446):494-7.
Kettleborough RN, Busch-Nentwich EM, Harvey SA, Dooley CM, de Bruijn E, van Eeden F, Sealy I, White RJ, Herd C, Nijman IJ, FĂ©nyes F, Mehroke S, Scahill C, Gibbons
R, Wali N, Carruthers S, Hall A, Yen J, Cuppen E, Stemple DL.
Wellcome Trust Sanger Institute
Abstract
Since the publication of the human reference genome, the identities of specific genes associated with human diseases are being discovered at a rapid rate. A central problem is that the biological activity of these genes is often unclear. Detailed investigations in model vertebrate organisms, typically mice, have been essential for understanding the activities of many orthologues of these disease-associated genes. Although gene-targeting approaches and phenotype analysis have led to a detailed understanding of nearly 6,000 protein-coding
genes, this number falls considerably short of the more than 22,000 mouse protein-coding genes. Similarly, in zebrafish genetics, one-by-one gene studies using positional cloning, insertional mutagenesis, antisense morpholino oligonucleotides, targeted re-sequencing, and zinc finger and TAL endonucleases have made substantial contributions to our understanding of the biological activity of vertebrate genes, but again the number of genes studied falls well short of the more than 26,000 zebrafish protein-coding genes. Importantly, for both mice and zebrafish, none of these strategies are particularly suited to the rapid generation of knockouts in thousands of genes and the assessment of their biological activity. Here we describe an active project that aims to identify and phenotype the disruptive mutations in every zebrafish protein-coding gene, using a well-annotated zebrafish reference genome sequence, high-throughput sequencing and efficient chemical mutagenesis. So far we have identified potentially disruptive mutations in more than 38% of all known zebrafish protein-coding genes. We have developed a multi-allelic phenotyping scheme to efficiently assess the effects of each allele during embryogenesis and have analysed the phenotypic consequences of over 1,000 alleles. All mutant alleles and data are available to the community and our phenotyping scheme is adaptable to phenotypic analysis beyond embryogenesis.