What’s Involved In Running A Dovetail® Hi-C Assay? (Part 2)
This is part of a two-part blog; Part 1 can be found here. Okay, you’ve created your Hi-C library, the analytical QC checks discussed in part 1 look good so now you are ready to sequence. What do you need to know?
Again, we will focus on the requirements of the Dovetail® Omni-C® and Micro-C Kits and this time delve into the computational QC process enabling you to build confidence in the quality of the data generated.
Four steps to check your kit-generated data quality
With the assay completed and your library in hand, you are now ready to assess the quality of your library. The following four steps enable assessment of library quality.
Step 1: Sequence the library
Sequence the library using paired-end sequencing (2 x 75 bp, 2 x 100 bp, or 2 x 150 bp). Given the Dovetail® computational QC process requirements, you’ll need to sequence between 1 and 2 million total read pairs. Whether you perform sequence yourself or submit to a service provider, your Omni-C or Micro-C libraries can be treated like any other next generation sequencing library of high diversity. No special sequencing considerations are required.
TIP: check the quality of your sequencing run and if needed, reach out to your sequencing service provider.
Step 2: Run the sequences through the Dovetail QC analysis workflow available on readthedocs
Dovetail’s readthedocs page has easy-to-use guidelines for QC analysis that takes a few hours to run, depending on the sample genome and the sequencing depth. The readthedocs pages for Dovetail® Omni-C®and Micro-C Kits contain a transparent description of the workflow and the tools used to process the read pairs. You can rest easy knowing that our analysis approach is aligned with the 4D Nucleome Consortium best practices. To access these pages, click on the links below:
QC analysis starts with FASTQ files of paired-end reads that are mapped to the corresponding reference genome using BWA-MEM.
The pairs are filtered to remove unmapped read pairs, read pairs with low mapping quality, and PCR duplicates using pairtools.
The filtered pairs are then categorized as artifactual or valid based on (1) whether the pair maps to the same chromosome (cis) or different chromosomes (trans) and (2) the distance between the interacting points in cis.
The counts of read pairs in each category are determined.
Finally, the library complexity is estimated using preseq tool
Once the workflow is completed, a simple python script counts and summarizes the key QC metrics of a proximity-ligation library and display them in a table. We have created a guide that walks you through how each QC metric was computed and what it means.
The summary table reports key QC metrics: library complexity and percentages of PCR duplicates, valid interaction read pairs, and long-range interactions captured in the library.
Step 4: Decide if you’re satisfied with your data quality or you need to start over
It is now time to put to work the cut-off values which we determined for the QC metrics and assess the quality of your library before you move forward with deep sequencing. The cut-off values for the QC metrics are included in the readthedocs pages for each product. Based on the expected Omni-C and Micro-C library complexity, we would not recommend sequencing the library beyond a maximum of 300 M read pairs. If you require deeper sequencing, multiple libraries can be generated from a single proximity ligation assay. If your library QC metrics meet all cutoffs, you are ready to move onto deep sequencing and data interpretation. That will have to be a topic of a future blog.