Stage 0: Install the Tools
The HiChIP informatics workflow relies on a number of open-source program packages to enable the conversion of sequence data to biological insights. To make use of these packages, you will have to spend a little time configuring your bioinformatics environment. But don’t worry, we provide detailed instructions on how to complete this here.
Step 1: Alignment and Pairs Generation
Once the tools are installed and all relevant files have been gathered, you will perform the following actions:
- Align your HiChIP *.fastq file to your genome reference file
- Parse and sort the valid ligation events
- Remove any PCR duplicates
More details on how to complete this step are provided here. The output stats file generated during this step will be used in the next step to QC your library.
Step 2: Library QC
When assessing data quality, we are interested in confirming:
- The amount of long-range information captured by the library
- The degree of enrichment achieved by the chromatin immunoprecipitation (ChIP) step
Within the stats file generated in Step 1, you will find metrics describing the long-range quality of the library, so you are already halfway there.
Tip: for mammalian samples, we consider them acceptable if the “mapped no-dup cis read pairs > 1kb” stat is greater than 20% of the total mapped no-dup pairs.
It is useful to note that HiChIP data is composed of primary peaks that represent direct protein binding, and secondary peaks that result from interactions occurring in 3D chromatin space. For the purpose of assessing ChIP success, we want to focus on primary peaks in our HiChIP data.
In a successful ChIP experiment, we expect our read-pairs coverage distribution to be biased towards our primary peaks given that these are the targets of enrichment. Therefore, to assess the extent of enrichment, we can compare the depth of our HiChIP reads at primary peaks to the expected number of reads if the read-pairs were evenly distributed across the genome. This computed observed/expected coverage ratio provides a measure of enrichment.
A visual check is also useful: plotting normalized sequence coverage around our primary peaks offers a profile of ChIP enrichment.
Tip: the script plot_chip_enrichment.py identifies ChIP peak regions, sets 1 kb windows up- and down-stream of the peak centers, calculates read coverage within these windows, and then plots the global fold coverage change.
The figure below is an example of a passing QC plot. Note the significantly higher coverage centered over the peak centers compared to the outlying area. The height and shape of this plot will depend on the antibody used, however.
At this point, you should be confident that your data is of high-quality and be ready to move onto visualizing your HiChIP data for final analysis!
Step 3: Data Analysis & Visualization
This is where it starts to get fun. As with all Hi-C datatypes, the next step is to generate your contact matrix. This enables proximal ligations produced in the HiChIP assay to be visualized and is an input for the subsequent steps. You have two options for viewing your data: Juicebox (as part of Juicer tools) or HiGlass.
Keep in mind, different from standard Hi-C data, HiChIP data identifies significant interactions between your protein of interest and the surrounding genome. As such, HiChIP data highlights chromatin “loops” that are anchored at a protein binding site. So, next up in our analysis path is calling and visualizing those loops.
There are a number of different tools that can be used for this, some of which we call out in the table below:
||Considers Interaction Types*
*What are “interaction types,” you may be wondering? There are three main interactions that help describe the loops we’ll be calling in a bit: peak-to-peak, peak-to-all, and all-to-all. We describe each of these in the FAQs section at the end.
In our step-by-step guide, we take you through an example workflow using FitHiChIP that enables you to plot your HiChIP data and visualize the chromatin loops. After some editing in Adobe Illustrator (or another PDF editor), you should end up with something similar to the following figure:
And voilà! We are left with a beautiful figure that reflects interactions between our protein of interest and the DNA surrounding it. Now, determining what it all means is your job, and we can’t wait to read all about it in future publications!
Still have questions? For step-by-step instructions, please check out our complete HiChIP documentation that will walk you through the concepts described above in detail.