Dovetail™ Omni-C™ Technology incorporates a sequence-independent endonuclease approach to chromatin fragmentation in the proximity ligation protocol. This provides the following key advantages over traditional Hi-C that rely on restriction enzyme fragmentation:
Genome wide coverage that is free of restriction enzyme density bias
Shotgun-like coverage that captures SNPs, SVs and chromatin topology in a single NGS library
Support for sample inputs down to 1,000 cells
Compatible with hybrid capture approaches for targeted studies
By employing an endonuclease. Omni-C™ increases the genomic coverage of a proximity-ligation assay. therefore expanding the efficiency of each sequencing run by covering more of the genome and reducing biases imposed by RE site density As such. Omni-C™ generates libraries where more of the genome is included in analyses and thereby making the data more versatile and unbiased by RE sites.
Endonuclease-based molecular biology
The Omni-C™ process starts with endogenous chromatin being fixed in place (cross-linked) to create a stabilized nucleosome scaffold. A sequence independent endonuclease digests the cross-linked chromatin in situ. Upon release of the digested chromatin from the cell, free ends are ligated in a two step process that incorporates a biotinylated bridge between chromatin ends. The probability of two chromatin ends ligating together is dependent upon their proximity to each other within the scaffold. Following ligation and cross-link reversal, ligation products are enriched and sequenced using Illumina paired-end chemistry.
The Omni-C™ Kit is currently validated for mammalian cells and tissues. The assay has a 2-day workflow from sample to sequencing-ready NGS library – sample prep and proximity ligation is completed on day 1 and library generation on day 2.
The assay incorporates three molecular biology-based quality control checks to enable identification of suboptimal libraries prior to sequencing. The 8-reaction Omni-C™ Kit provides enzymes and buffers necessary to complete day 1. The Dovetail ™ Library Module for Illumina or a paired-end sequencing kit for Illumina sequencing of your choosing is required for day 2. Omni-C™ data is compatible with a variety of open-source analysis and visualization tools.
Built-in assay quality control steps enables users to confirm library quality before sequencing.
Quality control is achieved using standard molecular biology tools
Provided QC Workbook streamlines sample quality control
Calculated Chromatin Digestion Efficiency (CDE) and Chromatin Digestion Index (CDI) confirm suitable chromatin digestion
Validation of Sample Input Types
Cells and tissues from human and mice were used as inputs to validate Omni-C™ libraries. All libraries were sequenced to 20-40 million read pairs (2×150 bp) and processed through the Dovetail Genomics Omni-C™ QC pipeline.
Shown in the figure to the left, long-range cis read pairs were plotted as a percent of total cis reads in the library, and complexity was plotted as percent of unique molecules projected per 300 million read pairs.
The Omni-C™ Assay is highly flexible. The workflow enables a wide range of input amounts from 1 million down to 1,000 cells and 50 mg down to 5 mg of tissue. The assay is also compatible with standard targeted enrichment approaches, such as hybrid capture, thereby reducing sequence burden and increasing resolution around sites of interest.
Low Input Options
The quality of long-range information is preserved across the range of sample inputs tested. Grey circles represent the short-range intra-chromosomal cis interactions (< 1 kb), the blue circles represent the long-range intra-chromosomal cis interactions (> 1 kb), and the orange circles represent the inter-chromosomal trans reads (n = 2 or 3).
The Omni-C™ Assay provides superior coverage across the genome. The data exhibit a narrow per base coverage histogram very similar to that achieved with a shotgun libraries.Typical Hi-C approaches, which rely on restriction enzyme fragmentation, miss a significant portion of the genome. This leads to wider histograms and portions of the genome with no coverage at all. Uneven coverage is also reflected in the pile-up of reads at RE sites.
The improved coverage inherent to the Omni-C™ Assay enables applications relying on coverage uniformity, such as:
Structural variant calling
Contact Map Resolution at a Fixed Sequencing Depth
Omni-C™ libraries were generated from cell and muscle inputs from both human and mice. The resulting libraries were sequenced to 20-40 million read pairs (2×150 bp) and assessed on the Omni-C™ QC pipeline.
Omni-C libraries display shotgun-like coverage without the sequence bias inherent to RE-based proximity ligation methods.
Omni-C (blue lines) and RE-based Hi-C (orange – multi-RE; pink – one-RE) libraries were compared to a shotgun library (black dashed line) at 300 M read pairs. A. Histogram plot of per base coverage. B. Coverage at the RE sites GATC (DpnII, MboI, Sau3AI) and GANTC (HinfI) plotted separately as the average of the absolute value both upstream and downstream of RE sites.
In addition to uniform coverage, Omni-C™ technology delivers on conformation. In the first figure below, Omni-C™ libraries from GM12878 cells were sequenced to 1.77 billion read pairs and loops were called using HiCCUPs. The resulting loops were then compared to loops found by Rao et al. (2014) after sequencing GM12878 cells to 4.9 billion read pairs.
Contact matrices generated from Omni-C™ libraries are more complete compared to RE-based Hi-C libraries. Areas with low RE coverage result in blank vectors due to normalization, as seen in the Hi-C matrices, whereas Omni-C™data uniformity generates a more complete contact matrix at these sites. Below, we highlight three regions where Omni-C™ reveals topological features interrupted in RE-based Hi-C.
Loop calling with Omni-C™ data from GM12878 via HiCCUPS detected 16,628 chromatin loops with 5062 overlapping with Rao et al., 2014, despite 3-fold less sequencing. Overlapping loop calls between Omni-C™ and Rao et al., are similar in number to other such comparisons.
Omni-C™ Libraries Generate More Complete Contact Matrices
Blank bands in the contact matrix occur in regions where coverage is too low, causing the contacts to be normalized with a zero value in the denominator during contact matrix balancing. Here, we present several examples in which these results could cause confounding interpretations from RE-based Hi-C data, and the subsequent improvements to these matrices by Omni-C™.
(Top left) Chr5 at a cancer susceptibility locus that is often over-expressed in lung cancer. (Top middle) An ~5Mbp region in Chr9 that encompasses the TRAF2 gene, which plays a key role in apoptotic signaling. (Top right) Chr17, containing a suspected oncogene, FASN, which is often over-expressed in breast cancer. (Bottom left and right) Zoomed in sections of Chr5 that encompass TERT- vital to telomere maintenance and often over-expressed in lung cancers- at 4kbp and 1kbp resolutions, respectively.
The blue boxes below these comparisons denote 3kbp regions devoid of RE sites, and the black and red arrows are gene tracks.
A staple application of proximity ligation data is scaffolding contigs for genome assembly. The ability of Hi-C data to scaffold correctly depends on the RE site density captured within each assembled contig. The analysis on scaffolding a human genome shows RE-dependent Hi-C scaffolding misses contigs that Omni-C™ data can include. The Omni-C™ assay is RE agnostic, enabling it to scaffold contigs more efficiently than Hi-C data, where RE frequency per contig is low. Where RE frequency increases, RE-based Hi-C and Omni-C™ scaffolds at similar rates, which is expected as RE-based Hi-C is an efficient means of scaffolding high-quality input assemblies.
Omni-C™ is More Efficient at Scaffolding Contigs with Low RE Site Density
The human genome, HG38, was cut into contigs of random size and libraries were made on both Omni-C™ and RE-based Hi-C. The contigs were scaffolded using HiRise™ and then binned into groups by the number of RE sites per contig. Scaffolding efficiency was determined by normalizing the number of contigs scaffolded by the total number of contigs in each RE site group.