Supported Format

  • Pre-processed Matrices: If the data is already processed into matrices for intra-chromosomal contacts, the chromosome from the same cell must be stored in the same folder with chromosome names as file names (e.g., scHiC/cell_1/chr1.txt). You only need to provide the folder name for a cell (e.g., scHiC/cell_1).

    • npy: numpy.array / numpy.matrix
    • npz: scipy.sparse.coo_matrix
    • matrix: matrix stored as pure text
    • matrix_txt: matrix stored as .txt file
    • HiCRep: the format required by HiCRep package
  • Edge List
    For all formats below:
      str - strand (forward / reverse)
      chr - chromosome
      pos - position
      score - contact reads
      frag - fragments (will be ignored)
      mapq - map quality

    • Shortest
    <chr1> <pos1> <chr2> <pos2>
    
    • Shortest_Score
    <chr1> <pos1> <chr2> <pos2> <score>
    
    • Short
    <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2>
    
    • Short_Score
    <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <score>
    
    • Medium
    <readname> <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <mapq2>
    
    • Long
    <str1> <chr1> <pos1> <frag1> <str2> <chr2> <pos2> <frag2> <mapq1> <cigar1> <sequence1> <mapq2> <cigar2> <sequence2> <readname1> <readname2>
    
    • 4DN
    ## pairs format v1.0
    #columns: readID chr1 position1 chr2 position2 strand1 strand2
    
  • .hic format: we adapted “straw” from JuiceTools.

  • .mcool format: we adapted “dump” from cool.

  • Other formats: simply give the indices (start from 1) in the order of
    “chromosome1 - position1 - chromosome2 - position2 - score” or
    “chromosome1 - position1 - chromosome2 - position2” or
    “chromosome1 - position1 - chromosome2 - position2 - mapq1 - mapq2”.
    For example, you can provide “2356” or [2, 3, 5, 6] if the file takes this format:

<name> <chromosome1> <position1> <frag1> <chromosome2> <position2> <frag2> <strand1> <strand2>
contact_1 chr1 3000000 1 chr1 3001000 1 + -
Previous
Next