Supported Format

Pre-processed Matrices: If the data is already processed into matrices for intra-chromosomal contacts, the chromosome from the same cell must be stored in the same folder with chromosome names as file names (e.g., scHiC/cell_1/chr1.txt). You only need to provide the folder name for a cell (e.g., scHiC/cell_1).
- npy: numpy.array / numpy.matrix
- npz: scipy.sparse.coo_matrix
- matrix: matrix stored as pure text
- matrix_txt: matrix stored as .txt file
- HiCRep: the format required by HiCRep package

Edge List
For all formats below:
str - strand (forward / reverse)
chr - chromosome
pos - position
score - contact reads
frag - fragments (will be ignored)
mapq - map quality

Shortest

<chr1> <pos1> <chr2> <pos2>

Shortest_Score

<chr1> <pos1> <chr2> <pos2> <score>

Short

<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2>

Short_Score

<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <score>

Medium

<readname> <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <mapq2>

Long

<str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <cigar1> <sequence1> <mapq2> <cigar2> <sequence2> <readname1> <readname2>

## pairs format v1.0
#columns: readID chr1 position1 chr2 position2 strand1 strand2

.hic format: we adapted “straw” from JuiceTools.
.mcool format: we adapted “dump” from cool.
Other formats: simply give the indices (start from 1) in the order of
“chromosome1 - position1 - chromosome2 - position2 - score” or
“chromosome1 - position1 - chromosome2 - position2” or
“chromosome1 - position1 - chromosome2 - position2 - mapq1 - mapq2”.
For example, you can provide “2356” or [2, 3, 5, 6] if the file takes this format:

<name> <chromosome1> <position1> <frag1> <chromosome2> <position2> <frag2> <strand1> <strand2>
contact_1 chr1 3000000 1 chr1 3001000 1 + -

Last updated on May 5, 2019

Edit this page