Counting of valid read pairs between pairs of restriction fragments

Mapped Hi-C read pairs are typically transformed into contact matrices, whereby the pairs are counted between windows of fixed size, typically 5 kbp (Forcato et al., 2017) provide a review of the methodology). Diachromatic was developed for capture Hi-C, which achieves a much higher resolution than Hi-C. Therefore, for Diachromatic the read counts are determined for each restriction digest.

Required input files

GOPHER digest file

Due to the fact that the counts are determined on the restriction fragment level, the digest file needs to be passed to Diachromatic count. If the captured viewpoints were designed with GOPHER, this file also includes information about active and inactive restriction fragments.

BAM file with unique valid pairs

The second required input file contains the unique valid mapped read pairs in BAM format. If this file was generated using Diachromatic with the align subcommand, nothing has to be done or taken care of. If the BAM file was produced in a different way, make sure that the two reads of any given pair occur consecutively. Furthermore, make sure that duplicates were previously removed.

Running the count subcommand

Use the following command to run the counting step:

$ java -jar Diachromatic.jar count \
    -v prefix.valid_pairs.aligned.bam \
    -d hg19_HinDIII_DigestedGenome.txt\
    -x prefix \
    -o outdir
Short option Long option Example Required Description Default
-v --valid-pairs-bam prefix.valid_pairs.aligned.bam yes Path to BAM file containing unique valid pairs.
-d --digest-file /data/GOPHER/hg38_DpnII_DigestedGenome.txt yes Path to the digest file produced with GOPHER.
-o --out-directory cd4v2 yes Directory containing the output of the align subcommand. results
-x --out-prefix stim_rep1 yes Prefix for all generated files in output directory. prefix

Output files

The default name of the output file with statistics is:

  • prefix.count.stats.txt

Interaction counts

The interactions are written to a tab separated text file that has the following name by default:

  • prefix.interaction.counts.table.tsv

The structure of this file is similar to that of (iBED) files. Each line stands for one pair of interacting fragments. Consider the following example:

chr7    42304777        42314850        A       chr7    152941166      152943990      I       14
chr7    42304777        42314850        A       chr7    38624777       38625305       I       11

The first three columns contain the coordinates of a restriction fragment on chromosome 7. The A in column 4 indicates that this fragment is defined to be active, i.e. it is part of a viewpoint that was enriched using capture technology. The information about active states of fragments originates from the GOPHER digest file passed to Diachromatic using the -d option.

In addition, interactions are written to a simple pairwise interaction file format for long-range interactions established by WashU:

chr13:84250549-84256429    chr13:105017710-105020949       1
chr3:74550953-74553110     chr3:83489595-83490326          1

Trans and short-range (<10,000) interactions are discarded.

Read counts at interacting fragments

Another file that is created contains the counts of reads at interacting fragments. By default the name of this file is:

  • prefix.interacting.fragments.counts.table.tsv

The structure is again similar to that of BED files. Consider the following example:

chr7    42304777       42314850        A       25
chr7    152941166      152943990       I       14
chr7    38624777       38625305        I       11

The first three columns contain the coordinates of interacting restriction fragments. This is again followed by either an A or I in column 4, whereby A means active and I inactive. The fifth column contains the read counts aggregated from all interactions that end in the corresponding fragment. For better understanding, compare these counts to the two interactions given above.