Skip to main content

Table 1 Description of datasets used in experiments

From: Improving clustering performance using independent component analysis and unsupervised feature learning

Dataset COIL20 COIL100 CMU-PIE USPS MNIST REUTERS-10K
# Observations 1440 7200 2856 9298 70,000 10,000
# Classes 20 100 68 10 10 4
Dimensions 32 × 32 × 1 32 × 32 × 3 32 × 32 × 1 16 × 16 × 1 28 × 28 2000
Type Image, pixel Image, pixel Image, pixel Image, pixel Image, pixel Text, tf-idf
Task Object rec. Object rec. Face rec. Digit rec. Digit rec. Topic rec.
  1. COIL20: grayscale images for object recognition dataset containing 20 objects positioned at 72 different angles [29]. COIL100: RGB images of 100 objects at 72 different poses [29]. The images were downsampled to 32 × 32 pixels from the original 128 × 128 pixels to facilitate analysis for unsupervised feature learning [24]. CMU-PIE: grayscale images of 68 human faces with 4 different poses [30]. USPS: grayscale images of handwritten digits (0–9) from the USPS postal service [31]. MNIST: grayscales images of handwritten digits (0–9) obtained from NIST [32]. REUTERS-10K: A Reuters news service dataset containing text documents in English that is used for topic recognition, which was processed according to Xie et al. [8]. The term frequency–inverse document frequency (tf-idf) [33] feature matrix was computed, using the 2000 most frequent words