Dovetail Genomics’ Cichlid Genome Empowers Studies of the Origins of Species

Axel Meyer’s group at the University of Konstanz endeavors to understand the fundamental evolutionary forces of adaptation and speciation. To do so, they study the highly diverse cichlid fish family. Dovetail’s genome assembly is empowering them to more thoroughly and powerfully answer important evolutionary questions about the relationships among species.

 

Cichlids are organisms exceptionally well-suited to investigation of the genomic bases of the key evolutionary processes of adaptation and speciation – the origins of new species. This family of freshwater fishes comprises thousands of species, making it one of the largest families of vertebrates in the animal kingdom. Cichlids have become textbook examples of the formation of so-called species-flocks, where tens or hundreds of closely-related, but phenotypically diverse, species evolve extremely rapidly within single lakes. Cichlids are astonishingly diverse in terms of body coloration and anatomical specializations, allowing them to explore a wide variety of ecological niches. Many species of cichlids are so recently diverged that they can still be crossed with one another, which allows researchers to investigate the genetic basis of speciation and adaptation.

 

The ability to produce reference-quality genome assemblies for evolutionary and ecological model organisms opens the door to the investigation of the genetic and genomic basis of many evolutionary phenomena, including speciation and adaptation. A well-developed “genomic infrastructure” is an invaluable tool to enable genetic screens and genome-scale analyses such as gene expression analysis, genetic mapping, comparative genomics, and population genomics. It is for these reasons that Axel Meyer and his group at the University of Konstanz, in Germany, have endeavored to generate a reference-quality genome assembly for Amphilophus citrinellus, the Midas cichlid from Nicaragua.

 

The Meyer group’s initial assembly was generated using a combination of various sequencing libraries with different insert sizes. Most of the sequences were obtained from libraries with short insert sizes (< 300 bp) in order to create overlapping reads after paired-end sequencing (Illumina 2×100 bp and 2×250 bp). To scaffold the contigs, the group used medium (3 to 6 kb, Illumina 2x100bp) and long (~16 Kb, Illumina 2×100 bp and Roche 454) insert size mate-pair libraries. The data were assembled de novo using the Broad Institute’s ALLPATHS-LG assembler. This initial assembly was 845 Mbp in magnitude, included 6,643 contigs, and had a scaffold N50 of 1.21 Mbp. Unfortunately, the assembly was still too fragmented for the analyses the group wished to perform. It was this realization that led them to Dovetail.

 

To begin, Dovetail extracted high molecular weight DNA from samples provided by the Meyer group. This DNA was used to construct a ChicagoTM library, which was sequenced with ~150 million 2×150 bp reads. This level of sequencing ultimately yielded ~115X physical coverage of the genome. After scaffolding with Dovetail’s HiRise™ software pipeline, the scaffold N50 was improved from 1.2 Mbp to 3.2 Mbp, a nearly three-fold increase. Similarly, the N90 was improved from 147 kbp to 534 kbp.

 

Currently, the Meyer group is using the Dovetail assembly to map RAD markers for population genomic analysis. Subsequent analyses will strive to improve understanding of two fundamental pillars of evolution: adaptation and speciation.