Improving clustering performance using independent component analysis and unsupervised feature learning

Table 1 Description of datasets used in experiments

Dataset	COIL20	COIL100	CMU-PIE	USPS	MNIST	REUTERS-10K
# Observations	1440	7200	2856	9298	70,000	10,000
# Classes	20	100	68	10	10	4
Dimensions	32 × 32 × 1	32 × 32 × 3	32 × 32 × 1	16 × 16 × 1	28 × 28	2000
Type	Image, pixel	Image, pixel	Image, pixel	Image, pixel	Image, pixel	Text, tf-idf
Task	Object rec.	Object rec.	Face rec.	Digit rec.	Digit rec.	Topic rec.

COIL20: grayscale images for object recognition dataset containing 20 objects positioned at 72 different angles [29]. COIL100: RGB images of 100 objects at 72 different poses [29]. The images were downsampled to 32 × 32 pixels from the original 128 × 128 pixels to facilitate analysis for unsupervised feature learning [24]. CMU-PIE: grayscale images of 68 human faces with 4 different poses [30]. USPS: grayscale images of handwritten digits (0–9) from the USPS postal service [31]. MNIST: grayscales images of handwritten digits (0–9) obtained from NIST [32]. REUTERS-10K: A Reuters news service dataset containing text documents in English that is used for topic recognition, which was processed according to Xie et al. [8]. The term frequency–inverse document frequency (tf-idf) [33] feature matrix was computed, using the 2000 most frequent words