Truncation of chimeric reads

Valid Hi-C read pairs originate from chimeric fragments with DNA from two different loci linked by the ligation junction. Diachromatic searches read sequences in 5’-3’ direction and truncates chimeric reads at the location of the ligation site, thereby removing the following sequence.

For Capture Hi-C, the sticky ends are filled in with biotinylated nucleotides, and the resulting blunt ends are ligated. The corresponding ligation junctions can then be observed as two consecutive copies of the overhang sequence at restriction enzyme cutting sites. For Capture-C, no fill in of the overhangs is performed, and the ligation junctions occur as plain restriction sites.

_images/sticky_and_blunt_ends.png

Use the --sticky-ends option if no fill in was performed.

Running the truncate subcommand

Use the following command to run the truncation step:

$ java -jar Diachromatic.jar truncate \
    -q test_1.fastq \
    -r test_2.fastq \
    -e HinDIII \
    -x prefix \
    -o outdir

Available arguments:

Short option Long option Example Required Description Default
-q --fastq-r1 forward.fq.gz yes Path to the forward FASTQ file.
-r --fastq-r2 reverse.fq.gz yes Path to the reverse FASTQ file.
-e --enzyme HindIII yes Symbol of the restriction enzyme. null
-s --sticky-ends false no True, if no fill-in of sticky ends was performed. false
-o --out-directory cd4v2 yes Directory containing the output of the truncate command. results
-x --out-prefix stim_rep1 yes Prefix for all generated files in output directory. prefix

Output files

The default names of the truncated and gzipped FASTQ files are:

  • prefix.truncated_R1.fastq.gz
  • prefix.truncated_R2.fastq.gz

In addition, a file is produced that contains summary statistics about the truncation step.

  • prefix.truncation.stats.txt