Counting of valid read pairs between pairs of restriction fragments¶
Mapped Hi-C read pairs are typically transformed into contact matrices, whereby the pairs are counted between windows of fixed size, typically 5 kbp (Forcato et al., 2017) provide a review of the methodology). Diachromatic was developed for capture Hi-C, which achieves a much higher resolution than Hi-C. Therefore, for Diachromatic the read counts are determined for each restriction digest.
Required input files¶
GOPHER digest file¶
Due to the fact that the counts are determined on the restriction fragment level, the digest file needs to be passed to Diachromatic count
. If the captured viewpoints were designed with GOPHER,
this file also includes information about active and inactive restriction fragments.
BAM file with unique valid pairs¶
The second required input file contains the unique valid mapped read pairs in BAM format. If this file was generated using Diachromatic with the align subcommand, nothing has to be done or taken care of. If the BAM file was produced in a different way, make sure that the two reads of any given pair occur consecutively. Furthermore, make sure that duplicates were previously removed.
Running the count subcommand¶
Use the following command to run the counting step:
$ java -jar Diachromatic.jar count \
-v prefix.valid_pairs.aligned.bam \
-d hg19_HinDIII_DigestedGenome.txt\
-x prefix \
-o outdir
Short option | Long option | Example | Required | Description | Default |
-v | --valid-pairs-bam | prefix.valid_pairs.aligned.bam | yes | Path to BAM file containing unique valid pairs. | – |
-d | --digest-file | /data/GOPHER/hg38_DpnII_DigestedGenome.txt | yes | Path to the digest file produced with GOPHER. | – |
-o | --out-directory | cd4v2 | yes | Directory containing the output of the align subcommand. | results |
-x | --out-prefix | stim_rep1 | yes | Prefix for all generated files in output directory. | prefix |
Output files¶
The default name of the output file with statistics is:
prefix.count.stats.txt
Interaction counts¶
The interactions are written to a tab separated text file that has the following name by default:
prefix.interaction.counts.table.tsv
The structure of this file is similar to that of (iBED) files. Each line stands for one pair of interacting fragments. Consider the following example:
chr7 42304777 42314850 A chr7 152941166 152943990 I 14
chr7 42304777 42314850 A chr7 38624777 38625305 I 11
The first three columns contain the coordinates of a restriction fragment on chromosome 7. The A
in column 4
indicates that this fragment is defined to be active, i.e. it is part of a viewpoint that was enriched using capture technology.
The information about active states of fragments originates from the GOPHER digest file passed to Diachromatic
using the -d
option.
In addition, interactions are written to a simple pairwise interaction file format for long-range interactions established by WashU:
chr13:84250549-84256429 chr13:105017710-105020949 1
chr3:74550953-74553110 chr3:83489595-83490326 1
Trans and short-range (<10,000) interactions are discarded.
Read counts at interacting fragments¶
Another file that is created contains the counts of reads at interacting fragments. By default the name of this file is:
prefix.interacting.fragments.counts.table.tsv
The structure is again similar to that of BED files. Consider the following example:
chr7 42304777 42314850 A 25
chr7 152941166 152943990 I 14
chr7 38624777 38625305 I 11
The first three columns contain the coordinates of interacting restriction fragments. This is again followed by either an A
or I
in column 4,
whereby A
means active and I
inactive. The fifth column contains the read counts aggregated from all
interactions that end in the corresponding fragment. For better understanding, compare these counts to the two
interactions given above.