Skip to main content

Table 2 Cluster processing pipeline parameters used in experiments

From: Improving clustering performance using independent component analysis and unsupervised feature learning

Componenta

Feature extraction

Graph construction

Graph embedding

Clustering

PCA

ICA BSS

UFL

kNN graph

Spectral clusteringb

GNMF

NMF

K-means

Hyper-param. 1

# PC’s = # classes

# IC’s = # classes

Method: RICA & SFT

k: 5

Lsym & Lrw

Max iters. = 100

Max iters. = 100

200 repetitions

Hyper-param. 2

–

–

# feats. = 256

σ = mean of kth nearest neighbors

–

α = 100

–

–

  1. PC: principal component. IC: independent component. PCA: The number of features is set to the number of classes. ICA BSS: The number of source signals is set to the number of classes. UFL: The number of features is 256 for both RICA and SFT, which was determined by the source code examples from [27, 28]. kNN graph: The nearest neighbor value k is set to 5, as in Cai et al. [7]. The σ scaling factor is set to the mean of value of all of the kth nearest neighbors from the similarity matrix [19]. On large datasets, a smaller set of observations from the data is used to calculate σ, using the method of estimating population proportions [51]. Spectral Clustering: Two types of normalized graph Laplacians were used, the symmetric Laplacian and random walk Laplacian [19]. GNMF and NMF: The default parameters as provided in the code available from Cai et al. [7] were used. K-means algorithm: 200 repetitions of the algorithm were used in order to obtain stable clusters
  2. a Some processing components have more than one hyper-parameter. Thus, a second hyper-parameter is provided where needed
  3. b Spectral clustering also refers to the use of eigendecomposition for the Laplacian matrix