Skip to main content

Table 2 Cluster processing pipeline parameters used in experiments

From: Improving clustering performance using independent component analysis and unsupervised feature learning

Componenta Feature extraction Graph construction Graph embedding Clustering
PCA ICA BSS UFL kNN graph Spectral clusteringb GNMF NMF K-means
Hyper-param. 1 # PC’s = # classes # IC’s = # classes Method: RICA & SFT k: 5 Lsym & Lrw Max iters. = 100 Max iters. = 100 200 repetitions
Hyper-param. 2 # feats. = 256 σ = mean of kth nearest neighbors α = 100
  1. PC: principal component. IC: independent component. PCA: The number of features is set to the number of classes. ICA BSS: The number of source signals is set to the number of classes. UFL: The number of features is 256 for both RICA and SFT, which was determined by the source code examples from [27, 28]. kNN graph: The nearest neighbor value k is set to 5, as in Cai et al. [7]. The σ scaling factor is set to the mean of value of all of the kth nearest neighbors from the similarity matrix [19]. On large datasets, a smaller set of observations from the data is used to calculate σ, using the method of estimating population proportions [51]. Spectral Clustering: Two types of normalized graph Laplacians were used, the symmetric Laplacian and random walk Laplacian [19]. GNMF and NMF: The default parameters as provided in the code available from Cai et al. [7] were used. K-means algorithm: 200 repetitions of the algorithm were used in order to obtain stable clusters
  2. a Some processing components have more than one hyper-parameter. Thus, a second hyper-parameter is provided where needed
  3. b Spectral clustering also refers to the use of eigendecomposition for the Laplacian matrix