Clustering

There are two functions to cluster cells.

>>>label=loaded_data.clustering(
... n_clusters=4, clustering_method='kmeans', similarity_method='innerproduct',
... aggregation='median', n_strata=None
... )

clustering function returns a numpy array of cell labels clustered.

  • n_clusters (int): Number of clusters.

  • clustering_method (str): Clustering method in ‘kmeans’, ‘spectral_clustering’ or ‘HAC’(hierarchical agglomerative clustering).

  • similarity_method (str): Reproducibility measure. Value in ‘InnerProduct’, ‘HiCRep’ or ‘Selfish’.

  • aggregation (str): Method to aggregate different chromosomes. Value is either ‘mean’ or ‘median’. Default: ‘median’.

  • n_strata (int or None): Only consider contacts within this genomic distance. If it is None, it will use the all strata kept (the argument keep_n_strata) from previous loading process. Default: None.

  • print_time (bool): Whether to print the processing time. Default: False.

>>>hicluster=loaded_data.scHiCluster(dim=2,cutoff=0.8,n_PCs=10,k=4)

scHiCluster function returns two componments. First componment is a numpy array of embedding of cells using HiCluster. Second componment is a numpy of cell labels clustered by HiCluster.

  • dim (int): Number of dimension of embedding. Default: 2.

  • cutoff (float): The cutoff proportion to convert the real contact matrix into binary matrix. Default: 0.8.

  • n_PCs (int): Number of principal components. Default: 10.

  • k (int): Number of clusters. Default: 4.

Previous
Next