Skip to main content

Table 1 Description of datasets used in experiments

From: Improving clustering performance using independent component analysis and unsupervised feature learning

Dataset

COIL20

COIL100

CMU-PIE

USPS

MNIST

REUTERS-10K

# Observations

1440

7200

2856

9298

70,000

10,000

# Classes

20

100

68

10

10

4

Dimensions

32 × 32 × 1

32 × 32 × 3

32 × 32 × 1

16 × 16 × 1

28 × 28

2000

Type

Image, pixel

Image, pixel

Image, pixel

Image, pixel

Image, pixel

Text, tf-idf

Task

Object rec.

Object rec.

Face rec.

Digit rec.

Digit rec.

Topic rec.

  1. COIL20: grayscale images for object recognition dataset containing 20 objects positioned at 72 different angles [29]. COIL100: RGB images of 100 objects at 72 different poses [29]. The images were downsampled to 32 × 32 pixels from the original 128 × 128 pixels to facilitate analysis for unsupervised feature learning [24]. CMU-PIE: grayscale images of 68 human faces with 4 different poses [30]. USPS: grayscale images of handwritten digits (0–9) from the USPS postal service [31]. MNIST: grayscales images of handwritten digits (0–9) obtained from NIST [32]. REUTERS-10K: A Reuters news service dataset containing text documents in English that is used for topic recognition, which was processed according to Xie et al. [8]. The term frequency–inverse document frequency (tf-idf) [33] feature matrix was computed, using the 2000 most frequent words