Skip to main content

Table 4 Comparison of clustering performance across different datasets and clustering techniques

From: Improving clustering performance using independent component analysis and unsupervised feature learning

Baseline

Performance of applied processing on different datasets (NMI, ACC)

COIL20

COIL100

CMU-PIE

USPS

MNIST

REUTERS-10K

Method

 K-means

0.735, 0.597

0.822, 0.615

0.532, 0.239

0.659, 0.694

0.527, 0.553

0.356, 0.541

Deep learning

 AE+K-means (2016)

    

–, 0.818

–, 0.666

 NMF-D (2014)

0.692, –

0.719, –

0.920, 810

0.287, 0.382

0.152, 0.75

 

 TSC-D (2016)

0.928, 0.899

   

0.651, 0.692

 

 DEN (2014)

0.870, 0.725

     

 DBC (2017)

0.895, 0.793

0.905, 0.775

 

0.724, 0.743

0.917, 0.964

 

 IEC (2016)

 

0.787, 0.546

 

0.641, 0.767

0.542, 0.609

 

 AEC (2013)

   

0.651, 0.715

0.669, 0.760

 

 DCN (2016)

    

0.810, 0.830

 

 DEC (2016)

  

0.924, 0.801

0.586, 0.619

–, 0.818

–, 0.722

 DCEC (2017)

   

0.826, 0.790

0.885, 0.890

 

 DEPICT (2017)

  

0.974, 0.883

0.927, 0.964

0.917, 0.965

 

 JULE-SF (2016)

1.000,–

0.978, –

0.984, 0.980

0.858, 0.922

0.906, 0.959

 

 JULE-RC (2016)

1.000, –

0.985, –

1.000, 1.000

0.913, 0.950

0.913, 0.964

 

 VaDE (2016)

    

–, 0.945

–, 0.798

 IMSAT (2017)

    

–, 0.984

–, 0.719

 SpectralNet (2018)

    

0.924, 0.971

 

Non-deep learning

 AC-GDL (2012)

0.865, –

0.797, –

0.934, 0.842

0.824, 0.867

0.017, 0.113

 

 AC-PIC (2013)

0.855, –

0.840, –

0.902, 0.797

0.840, 0.855

0.017, 0.015

 

 SEC (2011)

   

0.511, 0.544

0.779, 0.804

 

 LDMGI (2010)

   

0.563, 0.580

0.802, 0.842

 

Oursa

0.965, 0.93

0.962, 0.897

0.986, 0.937

0.868, 0.926

0.824, 0.882

0.460, 0.714

  1. a The maximum clustering performance obtained by the processing components proposed in this study is provided for each dataset. When available, both NMI and ACC are presented in each cell, where the first value is the NMI. If the cell is blank then the clustering method was not used on the dataset. Italic font within a row indicates the maximum performance obtained for a dataset. The full names and references of the compared methods are: Deep Embedding Network (DEN) [11], Discriminatively Boosted Clustering (DBC) [55], Infinite Ensemble Clustering (IEC) [56], Autoencoder-based Clustering (AEC) [10], Deep Embedded Clustering (DEC) [8], Deep Clustering Network (DCN) [13], Deep Convolutional Embedded Clustering (DCEC) [57], Deep Embedded Regularized Clustering (DEPICT) [14], Variational Deep Embedding (VaDE) [15], autoencoder with K-means clustering (AE + K-means) [8], Information Maximizing Self-Augmented Training (IMSAT) [58], NMF with Deep learning model (NMF-D) [59], Joint Unsupervised Learning (JULE) “-SF” and “-RC” [16], Task-specific Deep Architecture for Clustering (TSC-D) [60] Graph Degree Linkage-based Agglomerative Clustering (AC-GDL) [61] and Agglomerative Clustering via Path Integral (AC-PIC) [62], Spectral Embedded Clustering (SEC) [63], and Local Discriminant Models and Global Integration (LDMGI) [52]