Skip to main content

A multilevel features selection framework for skin lesion classification


Melanoma is considered to be one of the deadliest skin cancer types, whose occurring frequency elevated in the last few years; its earlier diagnosis, however, significantly increases the chances of patients’ survival. In the quest for the same, a few computer based methods, capable of diagnosing the skin lesion at initial stages, have been recently proposed. Despite some success, however, margin exists, due to which the machine learning community still considers this an outstanding research challenge. In this work, we come up with a novel framework for skin lesion classification, which integrates deep features information to generate most discriminant feature vector, with an advantage of preserving the original feature space. We utilize recent deep models for feature extraction, and by taking advantage of transfer learning. Initially, the dermoscopic images are segmented, and the lesion region is extracted, which is later subjected to retrain the selected deep models to generate fused feature vectors. In the second phase, a framework for most discriminant feature selection and dimensionality reduction is proposed, entropy-controlled neighborhood component analysis (ECNCA). This hierarchical framework optimizes fused features by selecting the principle components and extricating the redundant and irrelevant data. The effectiveness of our design is validated on four benchmark dermoscopic datasets; PH2, ISIC MSK, ISIC UDA, and ISBI-2017. To authenticate the proposed method, a fair comparison with the existing techniques is also provided. The simulation results clearly show that the proposed design is accurate enough to categorize the skin lesion with 98.8%, 99.2% and 97.1% and 95.9% accuracy with the selected classifiers on all four datasets, and by utilizing less than 3% features.


Melanoma belongs to the category of inoperable type of skin cancers, and its occurrence rate has increased tremendously over the past three decades [1]. According to statistics provided by the World Health Organization (WHO), almost 132,000 new cases of melanoma are reported each year worldwide. It has been reported [2] that diagnosis of melanoma, in its early stages, significantly increases chances of the patient’s survival. Dermatoscopy, also knows as dermoscopy is a non-invasive clinical procedure used for melanoma detection, in which physicians apply gel on the affected skin, prior to examining it with a dermoscope. It allows recognition of sub-surface structures of the infected skin that are invisible to naked eye. With this clinical procedure, the skin lesion is amplified up to 100 times, thereby easing the examination [3].

For the diagnosis of melanoma, dermatologists mostly rely on ABCD rule [4], seven-point checklist [5], and Menzie’s method [6]. These aforementioned methods have been formally approved at the 2000 Consensus Net Meeting on Dermoscopy (CNMD) [7], and are widely exploited by the physicians for diagnostics. Even though, these methods of manual inspection have shown improved performance, due to a number of constraints, including a large number of patients, human error and infrastructure etc., they have not proven feasible. Additionally, melanoma, at its initial stages, exhibits a similar type of features like benign lesions, which makes it difficult to recognize; Fig. 1 presents two such examples. Furthermore, physician analysis may also be quite subjective, since it clearly depends on their clinical experience and human vision as well—making the diagnosis procedure quite challenging.

Fig. 1
figure 1

A few examples of pigmented skin lesions: a benign lesion, b malignant lesion

To handle such constraints, there still exists a requirement for an automated system that has a capacity to differentiate melanoma from benign at its very initial stages. Computer-aided diagnosis (CAD) system maybe useful for the physicians to use technological developments in the field of dermoscopy, and it may also provide a second opinion. The CAD systems adopt various machine learning techniques, for example, extracting various features (color, shape, and texture) from each dermoscopic image, followed by applying a state-of-the-art classifier [8, 9]. These classification approaches mostly rely on the extracted set of features for the training, which are broadly divided into three main levels: low, mid, and higher levels [10]. Various existing classification methods exploit the extracted features by simply concatenating them in order to generate a fused feature vector. Feature fusion methodology, on one hand, increases the classification accuracy by taking into account all the advantages from the host models, but on the other hand increases the computational time and memory requirements [11].

Recently, convolutional neural networks (CNN) [12] have been introduced in this domain, and their models have been widely accepted for feature extraction—leading to improved classification [13, 14]. In such solutions, discriminant deep features are extracted by using set of convolution, pooling and feedforward layers from the images, by embedding a concept of transfer learning (TL) using fine-tuning and features descriptors [15]. To achieve further improvement in the classification results, in terms of overall accuracy (OA), computational time, and memory, feature selection process plays a pivotal role by identifying the most discriminate features. This is something we exploit in the proposed framework, entropy-controlled neighborhood component analysis (ECNCA), for skin lesion classification. The latter exploits the resilience of deep features and utilizes them in the lower dimensions—preserving the original feature space information. We demonstrate that our approach utilizes less than 3% deep features—equivalent to 97.55% average reduction rate, and is substantively superior to state-of-the-art approaches in terms of OA. Most of the existing literature, to the best of our knowledge, does not reduce the deep features to this level.

The exclusive contributions of this work are enumerated below:

  1. 1

    We exploit behavior of the selected layers of deep architectures, including DenseNet 201, Inception-ResNet-v2, and Inception-V3, on the performance of classifiers.

  2. 2

    We propose to fine-tune the existing pre-trained models with smaller learning rate and keep weights of the initial layers frozen to avoid distortion of the complete model. We exploit feature fusion technique, which takes advantage of all the three selected architectures to generate a denser feature space.

  3. 3

    We propose a hierarchical architecture for feature selection and dimensionality reduction, which in the initial step relies upon entropy for feature selection, followed by dimensionality reduction using neighborhood component analysis (NCA).

The rest of the article is organized as follows. In the following section, "Literature review" section, we present a detailed overview of the existing literature in this domain. "Mathematical model" section presents the mathematical model, whereas, materials and methods are discussed in "Materials and methods" section. The proposed framework is detailed in "Proposed framework" section, and "Results and discussion" section contains the experimental results and discussions. We conclude the manuscript in "Conclusion".

Literature review

In literature, several CAD systems [16, 17] have been proposed for melanoma detection, which, to some extent, try to mimic the procedure performed by dermatologists, based on a range of features extracted using machine learning approaches. These systems mostly follow four primary steps [18]: (1) preprocessing, (2) lesion segmentation, (3) feature extraction and selection, and (4) classification.

Lesion image segmentation is one of the primary steps that have abiding effects [19] on this classification process. Accurate segmentation of a lesion is an arduous task due to a number of reasons; range of lesion sizes, shapes, colors, and skin texture. Secondly, sometimes there exists a smooth transition between skin color and lesion [19, 20]. In addition to that, a few other constraints include specular reflection, presence of hair, falloff towards the edges, and air and immersion-fluid bubbles. Sumithra [21] proposed to initially remove the unwanted hair from lesion prior to applying the segmentation algorithm. Feature extraction was performed subsequently using color and texture features. For the classification both support vector machine (SVM), and K-nearest neighbor (KNN) were used. Similarly, Attia et al. [22] implemented a hybrid framework for hair segmentation by combining convolutional and recurrent layers. They utilized deep encoded features for hair delineation, which are later fed into recurrent layers to inscribe the spatial dependencies among the incoherent image patches. The segmentation accuracy calculated using Jaccard Index is 77.8% in comparison to the existing methods, 66.5%.

Joseph [23] used fast marching and 2D derivative of Gaussian in painting algorithm for hair artifact removal. Cheerla et al. [24] proposed automatic method for segmentation. They used otsu’s thresholding for segmentation, and for texture feature extraction local binary patterns (LBP) [25] was utilized. Neural network classifiers were used for classification, which yielded 97\(\%\) sensitivity and 93\(\%\) specificity. Hawas et al. [26] proposed an optimized clustering estimation using neutrosophic graph-cut (OCE-NGC) algorithm for skin lesion segmentation. They made use of bio-inspired technique (genetic algorithm), which optimizes the histogram-based clustering procedure, which searches the optimal centroid/threshold values. In the following step, they grouped the pixels by using the generated threshold value using neutrosophic c-means algorithm. Finally, a graph-cut methodology [27] is implemented to segregate the foreground and background regions in the dermoscopic image. Authors claimed to achieve 97.12% average accuracy and 86.28% average Jaccard values. Similarly, [28] implemented a novel scheme (transform domain representation-driven CNN) for skin lesion segmentation. They trained the model from scratch and successfully managed to cope with the constraints including small data set, artifact removal, excessive data augmentation, and contrast stretching. Authors claimed to achieve 6% higher Jaccard index and a less training time on a publicly available ISBI 2016 and 2017 datasets. Euijoon et al. [29] proposed a saliency [30] based segmentation algorithm, in which detection of background was based on spatial layout including color and boundary information. To minimize detection error, they implemented Bayesian framework.

Features play a vital role in classification, which are extracted by following local, global or local–global scenarios [7]. Barata et al. [31] adopted a local–global method for detecting melanoma from dermoscopic images. Local methods were applied to extract features using bag-of-words, whilst, global methods were explored for the classification of skin lesions. Promising results were achieved in terms of greater sensitivity and specificity. Abbas et al. [32] suggested a perceptually oriented framework for border identification—combining the strengths of both edge and region based segmentation. Later, a hill-climbing [33] approach was efficiently utilized to identify the region-of-interest (ROI), followed by an adaptive threshold mechanism to detect the optimal lesion border.

Chatterjee et al. [34] proposed a cross-correlation based technique for feature extraction with an application to skin lesion classification. The authors considered both spatial and spectral features of lesion region based on visual coherency using cross-correlation technique. kernel patches are later selected based on the skin disease categories, which are later classified using proposed multi-label ensemble multi-class classifier. The acquired sensitivities of a set of classes including nevus, melanoma, BCC and SK diseases are 99.01%, 98.7%, 98.87%, and 99.41%. Lei et al. [35] proposed a lesion detection and recognition methodology—built on a multi-scale lesion-biased representation (MLR) and joint reverse classification. This proposed algorithm takes advantage of scales and rotations to detect lesion, compared to the conventional single rotation method. Omer et al. [36] provided a unique solution for skin lesion segmentation using global thresholding based on color features. As a following feature extraction step, they utilized 2D fast Fourier transform (2D-FFT) and 2D discrete Fourier transform (2D-DFT). Mahbod et al. [37] introduced an ensemble technique by combining inter and intra-architecture of CNN. The extracted deep features from each CNN network are later utilized in classification using multi-SVM classifiers. The proposed method proved to be robust in terms of feature extraction, fusion and classification for skin lesion images. Kahn et al. [18] presented a techniques for classification of skin lesion using probabilistic distribution, and for feature selection entropy based method was used. Al-masni et al. [38] investigated a set of deep frameworks both for segmentation and classification. Initially, they implemented a full resolution convolution network for lesion segmentation. Later, the lesion regions are used to extricate the features using multiple deep architectures including Inception-ResNet-v2, and DenseNet 201. Proposed framework is trained on three datasets, ISIC 2016, ISIC 2017, and ISIC 2018, to achieving the promising results. Similarly, a pool of researchers [39,40,41] are utilizing deep frameworks to detect multiple abnormalities with an application to skin lesion classification.

From the detailed review, it is concluded that various existing methods show improved performance on dermoscopic images, but the following conditions were already satisfied:

  1. 1

    High contrast distinctness between the lesion area and the surrounding region.

  2. 2

    Color uniformity inside the lesion area.

  3. 3

    Marginal existence or absence of different artifacts including dark corners, hair, color chart, to name but a few.

Therefore, considering the aforementioned conditions, our primary focus is to develop a technique which efficiently handles the negation of given conditions.

Mathematical model

Given a dermoscopic image database, we are required to assign a label to each and every image—belonging to a class of either benign or malignant. Let us consider \(D \subset {\mathbb {R}}^{(r\times c\times p)}\) be a demoscopic image, \(\psi = \psi (j)|j \in {\mathbb {R}}\) be a formally specified image dataset, where \(\Big ( \big ( \psi _1(j),\ldots ,\psi _k(j) \subset \psi \big )\in {\mathbb {R}}\Big )\) are the pixel values of k-channels. The number of classes \({\mathbb {C}}\) is provided by the user, therefore a class is discriminated as \(\overset{\sim }{\psi }\)—a modified version of \(\psi\), interpreted as \(\overset{\sim }{\psi }: \psi \rightarrow \overset{\sim }{\psi }\). The modeling of \(\psi\) to achieve output \(\overset{\sim }{\psi }\) is described in terms of:

$$\overset{\sim }{\psi }\triangleq \big ( \psi ^f,\psi ^{fs},\kappa (\psi ^{fs})\big ) \in {\mathbb {Z}}^3$$

where \(\psi ^f\) represents the extracted features after applying transfer learning, \(\psi ^{fu}\) represents the fused features from fully connected layers of different architectures, and \(\kappa ( \psi ^{fu} )\) is the selected features’ representation after processing through a hierarchical structural design.

Materials and methods

Convolutional neural networks

CNN are one of the most powerful deep feedforward neural network models used for object detection and classification [42]. In CNN, all neurons are connected to a set of neurons in the next layer in a feedforward fashion. The CNN’s basic architecture, as given in Fig. 2, incorporates three primary sub-blocks, comprising convolution, pooling, and fully connected layers.

Fig. 2
figure 2

Basic architecture of convolutional neural network

  1. 1

    Convolution layer A fundamental unit in the CNN architecture, called convolution layer, is supposed to detect and extract local features from an input image sample \(X_{p}^{(r \times c \times p)}\), where \(r=c\) for a square input. Let us consider an input image sample, \(X_{p}=\{x_{1}, x_{2},\ldots , x_{n}\}\), where n represents size of the training dataset. For each input image, the corresponding output is \(y_{p}= \{y_{1}, y_{2}, \ldots , y_{n} \}\), where \(y_{p}\in \{ 1,2, \ldots , C \}\), C represents the number of classes. Convolution layer includes a kernel that slides across the input image as \(X ^{(r \times c \times p)}\)\(*\)\(H^{(r^{'}\times c^{'}\times p)}\), and local features \(f \in f_{l}\) are extracted using the following relation:

    $${\mathbb {F}}_{i}^{l}=\sigma \left( \sum _{i=1}^{n}x_{i}^{l -1} \times \delta _{i}^{l}+ b_{l}^{j} \right)$$

    where \({\mathbb {F}}_{i}^{l}\) provides feature map output for the layer, l; \(\omega _{i}^{l}+ b_{l}^{j}\) are the trainable parameters for layer, l; \(\delta (.)\) represents an activation function.

  2. 2

    Pooling layer Addition of a pooling layer is another substantial concept in CNN, which is considered to be a non-linear down sampling technique. It is a meaningful combination of two fundamental concepts, max pooling and convolution. Here max-pooling step extracts a set of maximum responses with an objective of feature reduction, as well as robustness against noise and variations. Configuration of max-pooling is represented with the help of the following equation:

    $${\mathbb {F}}_{i}^{l}=max\left(z_{2i-1}^{l-1}, z_{2i}^{l-1}\right), \quad l=2\varsigma \,\forall \, \varsigma \in {\mathbb {R}}$$
  3. 3

    Fully Connected Layer Convolution and pooling layers are followed by a fully connected feedforward layer, FC. It follows the same principle of traditional fully connected feedforward network having set of inputs and output units. This layer extracts responses based on features’ weights calculated from the previous layer.

    $$V_{j}^{l}=Sig\left( \sum _{i=1}^{n}x_{i}^{l-1}\times \omega _{ji}^{l}\times b_{l}^{j} \right)$$
Fig. 3
figure 3

A fundamental model of transfer learning

Transfer learning

Conventional algorithms work by making an assumption that the feature characteristics of both training and testing data are quite identical and can be comfortably approximated [43]. Several pretrained models are trained on natural images, and hence not suitable for the specialized applications. Additionally, data collection for the real world applications is a tedious task. Therefore, TL is a solution to provide accurate classification with a limited number of training samples. This concept is briefly defined as a system’s capability to transfer the skills and knowledge learnt while solving one class of problems to a different class of problems, (source–target relation), Fig. 3. The real potential of TL may be best leveraged when the target and source domain datasets are highly disparate in size, such that target domain dataset is significantly smaller than the source domain dataset [44]. Given a source domain, \({D_S} = \left\{ {\left( {x_1^S,y_1^S} \right), \ldots ,\left( {x_i^S,y_i^S} \right), \ldots ,\left( {x_n^S,y_n^S} \right)} \right\},\) where \(\left( {x_n^S, y_n^S} \right) \in {\mathbb {R}};\) with specified learning tasks, \(L _{S}\), and target domain \({D_T} = \left\{ {\left( {x_1^T,y_1^T} \right), \ldots ,\left( {x_i^T,y_i^T} \right), \ldots ,\left( {x_m^T,y_m^T} \right)} \right\}\) having learning task \(L _{T},\)\(\left( {x_n^T,y_n^T} \right) \in {\mathbb {R}}\). Let \(\left( {\left( {m,n} \right)\left| {\left( {n \ll m} \right)} \right.} \right)\) be a training data size and \(y_{1}^{D}\) and \(y_{1}^{T}\) are their respective labels. The fundamental function of TL is to boost the learning capability of the target function \(D _{T}\)—utilizing the knowledge gained from the source \(D _{S}\) and the target \(D _{T}\).

Table 1 Pretrained deep models description: following a yearly sequence

Pre-trained CNN models

Several researchers have proposed set of CNN architectures for computer vision applications like segmentation and classification, etc. [53, 54]. In this work, we utilize three widely used pre-trained models for features extraction including Inception-V3, Inception-ResNet-V2 and DenseNet-201. The selection of these models is on the basis of their performance in terms of their Top-1 accuracy, Table 1.


Inception-V3 is trained on ImageNet database. It comprises two fundamental units: feature extraction and classification. Inception-V3 employs inception units that allow the framework to escalate the depth and width of a network, but also lower the computational parameters.


Inception-ResNet-V2 is an extension of inception-V3, and is also trained on ImageNet database. In its core, it combines the inception with ResNet module. The remaining connections allow bypasses in the model to make the network behave more robustly. Inception-Resnet-v2 fuses the computational adeptness of the Inception units with the optimization leverage contributed by the residual connections.


DenseNet 201 is also trained on ImageNet database. It is designed on a more sophisticated connectivity pattern that iteratively integrates all output features in a regular feedforward fashion. Moreover, it mitigates the vanishing-gradient problem, reduces number of input/functional parameters, and strengthens feature propagation.


In this work, we have performed our simulations on four publicly available datasets:

  1. 1

    \(PH^{2}\): This dataset is composed of 200 RGB images, classified as 160 benign and 40 melanoma. These images were collected at the Hospital Pedro Hispano, Matosinhos during clinical examination with the help of dermoscope [55]. The ground truth is also provided, which is segmented manually with the help of physicians; classified as normal, atypical nevus (benign) or melanoma.

  2. 2

    ISIC-MSK: The second dataset used in this research is International Skin Imaging Collaboration (ISIC) [56]. This dataset contains 225 RGB dermoscopic images, acquired from various international hospitals with the help of different devices.

  3. 3

    ISIC-UDA: It is another subdataset of ISIC. We have collected 557 images having 446 training and 111 testing samples from ISIC-UDA dataset.

  4. 4

    ISBI-2017: ISBI-2017 [57] is another publicly available dataset used for characterization of skin cancer in dermoscopic images. It contains 2750 images, with 2200 training and 550 testing samples. The ISBI-2017 dataset has three disease classes: melanoma, keratosis and benign; however, since keratosis is a common benign skin condition, we have divided the samples into two: malignant and benign.

Manual annotations of all datasets, discussed above, by dermatologists have been provided as ground truths for the evaluation purposes. Repartition of above mentioned datasets is shown in Table 2. Note that we have divided the target dataset into two sets with pre-defined 80% for training and 20% for testing. The training set comprises a combination of training set (70%)—used to train the models, and the validation set (10%) for models’ evaluation/fine tuning.

Table 2 Splitting datasets into training and testing samples

Proposed framework

In dermoscopy, cancer classification is still an outstanding challenge, which is efficiently dealt with by the proposed design; discussed below. Most of the constraints enumerated in "Literature review" section are successfully undertaken, and a cascaded framework is proposed, which comprises four fundamental blocks: preprocessing, lesion segmentation, feature extraction and selection, and labeling/classification. Figure 4 summarizes the adopted methodology.


The preprocessing step copes with image imperfections introduced at the initial step of acquisition, by eliminating multiple artifacts, such as hair or ruler markings. Contrarily, their presence may affect segmentation, which, in turn, leads to an inaccurate classification. Ideally, the collected image should be free from these artifacts, however, due to certain complications, its strenuous to remove the hair. Therefore, an algorithmic approach, rather than the latter, is preferably followed. In this work, a widely used software, Dull Razor [58], is utilized, which is capable of localizing the hair and extricate them by implementing bilinear interpolation. Additionally, it also implements an adaptive median filter to smoothen the replaced hair pixel.

Fig. 4
figure 4

Proposed system architecture for skin lesion segmentation and classification

Lesion/image segmentation

Segmentation is one critical step that plays its primary role in classification of the skin lesion. In addition to solving various problems, including color variations, hair presence, and lesion irregularity, a robust segmentation method has a capacity to identify infected regions with improved accuracy. Once the images have been transformed to keep the same aspect ratio, the following two steps are performed in turn to complete the segmentation process:

  1. 1

    Contrast stretching, to make lesion (foreground) region distinct compared to the background.

  2. 2

    Segment the lesion region based on mean and mean deviation based segmentation procedure.

The immediate objective behind implementing contrast stretching scheme is to make foreground (lesion region) maximally differentiable compared to the background. Additionally, introduction of this pre-processing step refines images to much extent which leads to improved classification accuracy [59]. Initially, each channel of a three dimensional RGB image (\(I_{{\mathbb {D}}} \in {\mathbb {R}}^{r\times c\times p}\)) is processed independently to make foreground region visually distinguishable. A series of interlinked steps needs to be followed by each channel; those steps are enumerate below:

  1. 1

    Initially, gradients are computed for each single channel using Sobel–Feldman operator, with a fixed kernel size of \((3 \times 3)\).

  2. 2

    Divide each channel into equal sized blocks (4, 8, 12, …), and rearrange them in a descending order. Now weights are assigned to each block according to gradient magnitude.

    $$W_{\xi }={\left\{ \begin{array}{ll} w_{b}^1 & \text{ if } I_s(x,y)\le \xi _1;\\ w_{b}^2 & \text{ if } \xi _1 < I_{s}(x,y)\le \xi _2;\\ w_{b}^3 & \text{ if } \xi _2 < I_{s}(x,y)\le \xi _3;\\ w_{b}^4 & \text{otherwise}.\\ \end{array}\right. }$$

    where \(w_{b}^i(i= 1,\ldots,4)\) is a weight coefficient and \(\xi\) represents threshold values against computed gradient.

  3. 3

    Compute the overall weighted gray value against each block

    $$W_g(b) = \sum _{k=1}^{4}\xi _b^in_j(b)$$

    where \(n_k(b)\) represents number of gray pixels encased in block k.

To get improved results, few aspects are stringently considered; (a) standard block size, (b) optimized weight criteria, and (c) selection of regions with maximum information. Upon assiduous examination of dermoscopic images, regions with maximum information (lesion) are in the possible range of 25% to 75%. Therefore, worst case is considered and we partition the image 12 basic cells, with a ratio of 8.3%. Later, based on the criteria of maximum information these cells are selected (summation of pixels against each cell). Finally, according to edge points, weights are assigned for each block, \(E_p^c\).

$$C_{wi} = \frac{E_p^c}{E^c_{max}}$$

where \(E^c_{max}\) represents cells with maximum edges. An addition of post log operation further refines the channel [18], \(I_c(x,y)\), compared to original, \(I_s(x,y)\).

$$I_{c}^l = C \times log(\beta + I_c(x,y))$$

where \(\beta\) is chosen to be 3 by following a trial and error method.

Addition of a contrast stretching block facilitates segmentation step in extracting lesion area with improved accuracy. The probabilistic methods (mean segmentation and mean deviation based segmentation) are applied independently on a same image which are later subjected to image fusion in the following step.

Mean segmentation is calculated using:

$$I(\mu )= \frac{1}{(1+(\frac{\mu }{I_c^l}))^{\varsigma }}+\frac{1}{2\mu }+C$$
$$I_\mu ^{MS} = \left\{ {\begin{array}{*{20}{c}} 1&{{\rm{if}}\;I(\mu ) \ge {\varphi _{thresh}};} \\ 0&{{\rm{otherwise}}.} \end{array}} \right.$$

where \(\varphi _{\tiny {thresh}}\) is Otsu’s threshold, \(\varsigma\) is a scaling factor—selected to be 7 by following trial and error method. C is a constant and its value is in the range of 0 to 1.

Similarly mean deviation based segmentation is also calculated on enhanced image by following an activation function, having \(\sigma _{MD}\) calculated to be 0.7979 by following trial and error method.

$$I(\kappa) = \frac{1}{\left(1+\left(\frac{\sigma _{MD}}{I_c^l}\right)\right)^{\varsigma }}+\frac{1}{2\sigma _{MD}}+C$$
$$I_{(\kappa , \sigma ^2)}^{\tiny {MD}} = {\left\{ \begin{array}{ll} {1} \quad \text{if } I(\kappa) \ge \varphi _{\tiny {thresh}};\\ {0} \quad \text{otherwise}.\\ \end{array}\right. }$$

Segmented image from both distributions are later fused to get the resultant image.

$$I_{seg} = I_{(\kappa , \sigma ^2)}^{\tiny {MD}} \cap I_{\mu }^{MS}$$
Fig. 5
figure 5

Segmentation results: (1) original image; (2) fused segmented image; (3) mapped RGB image; (4) ground truth

Sample segmentation results are provided in Fig. 5, where it can be observed that they are visually similar when compared with the available ground truths. In some cases, the foreground and background are not distinct enough; the segmentation, in such cases, does not pan out sufficiently acceptable. This may be correlated with the images given in Fig. 6.

Fig. 6
figure 6

Segmented skin lesions: a correctly segmented, b incorrectly segmented

Deep features extraction

The proposed framework can be observed in Fig. 4, showing various stages from extraction to the final classification. Following the segmentation step, the proposed hierarchical design is applied on the extracted set of features to conserve the salient deep features.

Feature layers

It has been observed that the systems relying on deep features extracted from a single layer and utilizing a single pre-trained model, are not robust enough [60]. Therefore, alternative strategies are opted—multiple models and even multiple layers are utilized. The most discriminant features from all the three re-trained (transfer learning) models are selected by exploiting three fundamental output layers, fc1000 and predictions. During the training phase, transferred weights are kept frozen on their initial values to extract off-the-shelf deep features. A complete information regarding the selected deep layers, along with their notations, is provided in Table 3. The fully connected layers of Densenet-201, Inception-Resnet-V2, and Inception-V3 are selected as FV0, FV1, and FV2 respectively.

Table 3 Fully connected layers of different pre-trained models and their notations

Fusion mechanism

Rather than utilizing independent features from the selected pre-trained models, we adopted a feature fusion strategy. Feature sets originating from different re-trained models are consolidated to generate a fused feature set to retain most discriminant features. Our objective here is to explore the classifier’s behavior upon fusing multiple ConvNet fetures. A rudimentary strategy of feature fusion is opted by serially concatenating them to construct a resultant feature vector, which takes advantage of all feature spaces. Let us consider a joint vector \(FV \in {\mathbb {R}}^{\{1 \times 3\}} = \{FV_k^i\}\), where \(i \in \{1,2,3\}\)—representing selected pre-trained architecture, and \(k \in \{1, 2, 3\}\) be a selected layer.

The fused feature vector \(FV^{\kappa } = FV_k^i || FV_l^j\), exhibits set of two or three pre-trained models, having \(\kappa = \{1,\ldots,4\}\) combinations. Its not imperative for the systems that adopt feature fusion strategy to perform better than those which are using single layer. Fusion strategy increases features redundancy, which makes the classifier behave inefficiently. Therefore, an addition of feature selection and dimensionality reduction steps not only decrease the redundancy but also computation time—leads to an improved classification accuracy. On contrary, overall classification accuracy increases.

Entropy-controlled NCA

Our proposed strategy revolves around the core concept—achieve best classification accuracy by exploiting minimum number of features. In this regard, a hierarchical framework is implemented, which consolidates both feature selection and dimensionality reduction—so as to avoid the problem of curse of dimensionality.

Feature selection

The resultant fused vector \(FV^\kappa\), may include redundant or irrelevant features which are formally passed through an attribute or variable selection procedure. This complete process of selecting a subset of most discriminant variables is termed as feature selection [60]. In the proposed work, the concept of entropy [61] is utilized, which has a capacity to analyze uncertain data and unveil the signal’s randomness by exhibiting the system’s disorder.

Let \(FV^\kappa = \{(x_1,t_1),\ldots,(x_k,t_k),\ldots,(x_N,t_N)\}\) be a set of training matrix containing N labels, where \(X \in \{x_j\}^N_{j=1} \in {\mathbb {R}}^\nu\) is a \(\nu\)-dimension feature vector, and \(T = \{t_j\}_{j=1}^N\) are the class labels with \(t_j \in [0,\,1]\) to be a binary class. This feature space has \(\phi\) measure with the probability \(\phi (X) = 1\), then the entropy is calculated as:

$${\mathbb {E}}(X) = - \sum _{j=1}^{N}(x_j)log \phi (x_j)$$

where \(\phi (x_j)\) is an observation probability for a particular features \(x_i \in X\). The basic purpose of applying entropy is to identify a set of unique features having natural variability, whilst entropy value tends towards 0 with minimum feature variability. The concept of entropy has been adopted in one of the recent works [18], where the authors proposed to apply entropy on a distance matrix generated from feature space—yielding restricted OA. On the other hand, in the proposed approach, we assign ranks to the features, \(FV^{{\mathbb {E}}}\), having \((R<N)\) dimensions. The top 80% features with maximum entropy value are included to generate the resultant set. This rank based selection criteria at this stage only down-samples the original feature space, while keeping the original information conserved for the next level, dimensionality reduction.

Dimensionality reduction

Classifiers behave ineptly when there exists too many variables or these variables are highly correlated. At this stage, dimensionality reduction techniques play their vital role by reducing the number of random variables and retain the resultant vectors in the lower dimensions, \(FV^{{\mathbb {S}}}\), where \((S \ll R)\). For this application, we are implementing NCA as a dimensionality reduction technique, on contrary, it is mostly used as a feature selection method. NCA, originally introduced by Goldberger et al. [62], is a distance metric learning algorithm which selects the projection in the projected space by optimizing the performance of nearest neighbor classifier. NCA learns projections from both features and their associated labels that will be cogent enough at partitioning classes in the projected space. For the function, NCA optimizes the criterion related to leave-one-out (LOO) accuracy of a stochastic NN classifier in the projection of space induced by the training set. Selected entropy-controlled fused training vector, \(FV^{{\mathbb {E}}}\), consists of \(\{(x_1,t_1),\ldots,(x_R,t_R)\}\), where \(\{x_j, y_j\} \in {\mathbb {R}}^{m}\). NCA learns a projection matrix \({\mathbb {Q}} \in {\mathbb {R}}^{s \times m}\), representing transformation that projects \(x_j\) into s dimensional space, \(\varpi _j = {\mathbb {Q}}x_j \in {\mathbb {R}}^s\), and \(s\le m\). The projection matrix \({\mathbb {Q}}\) construe a Mahalanobis distance metric, calculated between two samples \(x_j\) and \(x_k\) in the projected space.

$${\mathfrak {D}}(x_j,x_k) = ({\mathbb {Q}}x_j - {\mathbb {Q}}x_k)^T({\mathbb {Q}}x_j - {\mathbb {Q}}x_k)$$

The primary objective of this method is to learn a projection \({\mathbb {Q}}\) that maximizes the separation of a labeled data by construing the cost function, in the transformed space, based on soft-neighbor assignments. Stating a rationale that every sample \(x_j\) keeps the neighboring sample \(x_k\) as a reference with some associated probability, \(p_{jk}\).

$$p_{jk} ={\left\{ \begin{array}{ll} \frac{\varUpsilon (-{\mathfrak {D}}(x_j,x_k)}{ \sum _{j \ne k}\varUpsilon (-{\mathfrak {D}}(x_j,x_k))} & \text{ if } j \ne k,\\ 0 & \text{otherwise}.\\ \end{array}\right. }$$

where \(\varUpsilon (\psi ) = exp(-\phi /\varsigma )\) represents a kernel function having kernel width \(\varsigma\) to an input argument that has a clear influence on the data samples probability—this additional step makes the model more robust and influential. Under the power of stochastic selection rule, the optimization criterion comfortably be defined by utilizing soft-neighbor assignments. The probability that the quantity \(x_j\) will be assigned a correct class label.

$$p_j= \sum _{k \in C_j}^{}\omega _{jk}p_{jk}$$
$$\omega _{jk} = {\left\{ \begin{array}{ll} 1 & \text{ iff } \omega _j = \omega _k,\\ 0 & \text{ otherwise }.\\ \end{array}\right. }$$

The optimization criterion searches to maximize the correct labels under leave-one-out policy:

$$\varXi ({\mathbb {Q}}) = \sum _j\sum _k \omega _{j}p_{ij} = \sum _{j}p_j$$

To perform a featured reduction, as well to avoid the problem of overfitting, a regularization term \(\hbar > 0\) is introduced as a standard weight in the cost function which can be tuned via cross validation [63], given as:

$$\varXi ({\mathbb {Q}}) = \sum _j\sum _k \omega _{j}p_{ij} - \hbar \sum _{k=1}^{d} q_k^2$$

This complete criterion gives rise to a gradient rule, used to maximize the projection matrix \({\mathbb {Q}}\) and solve by differentiating \(\varXi ({\mathbb {Q}})\) with respect to \(q_k\) as follow:

$$\frac{\partial \varXi ({\mathbb {Q}})}{\partial q_{(k)}} = 2 q_k\left( \frac{1}{\tau }\sum _{j}\left( p_i \sum _{j \ne k}p_{jk} |x_{ik} - x_{jk}|\right) - \sum_{j}\omega _{jk}p_{jk}|x_{ik} - x_{jk}| - \hbar \right)$$

To maximize the objective function, several gradient optimizers can be employed. However, in this article, we employed conjugate gradient method. Algorithm 1 explains the proposed approach from feature extraction (after transfer learning) to final classification.

figure a

Results and discussion

Simulations are performed on four publicly available datasets, Table 2. Three families of state-of-the-art classifiers are utilized for classification including KNN, SVM, and Ensemble (ES). The evaluation of the proposed framework is carried out using three simulation setups: in the first, the classification results are obtained from a few selected individual layers of the pre-trained models. The second simulation setup incorporates two cases: while in the first, we simply fuse the selected layers; in the second, we combine NCA technique with the proposed feature reduction approach. We have also tested the proposed technique with other state-of-the-art classifiers. All the base parameters for the selected classifiers are given in Table 4. Additionally, a fair comparison with recent methods is also provided with remarks on the effectiveness of the proposed technique, in comparison to the state-of-the-art approaches.

Table 4 Selected classifiers and their base parameters

Evaluation of the single layer features

Figure 7 presents classification results of each of the different layers used on the four datasets discussed in "Dataset" section. It has been observed that the models that were pre-trained by CNN architectures are powerful features representatives. From the selected pre-trained models, it has been observed that DenseNet-201 and Inception-ResNet-V2 show almost similar performance on all datasets. For example, in ISIC-UDA dataset, OA of FV0 is found to be 80.5%, whereas, OA of FV1 is 81.6%. It has also been observed that Inception-V3 shows decline in performance; hence, it is not a suitable candidate for skin cancer detection.

Fig. 7
figure 7

OA with different layers of pre-trained models used as a feature extractor on \(PH^{2}\), ISIC MSK, ISIC UDA and ISBI-2017 datasets

Evaluation of the proposed technique

Prior to the feature selection and dimensionality reduction step, the extracted features from various architectural layers are concatenated. Table 5 shows reduction percentage of fused feature vectors achieved after applying a hierarchical framework of entropy and NCA, before the classification phase. It is evident from the figures that maximum reduction percentage achieved is 98.50% on \(PH^2\) dataset, whilst, average reduction on all dataset is \(95.17\%\). We create four combinational feature vectors from each dataset.

Table 5 Features fusion and reduction percentage

Table 6 presents a comparison of classification results, in terms of OA, for two different cases: (1) simple fusion approach, (2) entropy-controlled NCA (proposed). The two cases are implemented on fused feature vector, and on four different datasets, using the selected classifiers. Discussion for the two cases are given below:

  • Case 1: on \(PH^{2}\) dataset, the best classification accuracy achieved is 83.2% using Fine KNN (F-KNN), 82.2% using SVM and 82.4% using ES-KNN classifier, when FV0–FV1–FV2 are fused. Similarly, on ISIC-MSK dataset, by using the same fusion, F-KNN outperforms SVM and ES-KNN by achieving 76.4%. In case of ISIC-UDA, F-KNN yields 76.5% classification accuracy, which is greater than SVM (73.5%) and ES-KNN (76.0%). On ISBI-2017 dataset ES-KNN gives 76.1% accuracy, which is greater than both SVM and F-KNN. It has been observed, and hence concluded, that irrespective of the given dataset, the best classification results are obtained with the fusion of FV0–FV1–FV2, thereby validating the strength of the feature fusion approach.

  • Case 2: using entropy-controlled feature fusion approach, on \(PH^{2}\), ISIC-MSK, and ISIC-UDA datasets, F-KNN yields the best accuracy of 98.8%, 99.2%, and 97.1% respectively, courtesy the feature fusion approach. In case of ISB1-2017 dataset, however, ES-KNN gives maximum accuracy of 95.9%. Note that the number of image samples in ISBI-2017 is larger as compared to other datasets; it may be concluded that ES-KNN gives classification results better as compared to other classifiers for datasets having greater number of samples.

Table 6 Classification results of the proposed technique compared to simple feature fusion with four datasets using F-KNN, ES-KNN and SVM

In Table 7, the average classification time and average accuracy of all datasets are shown. From this table it is evident that the proposed technique outperforms simple fusion approach with substantial time margin and with maximum classification accuracy. Additionally, a confidence interval is plotted in Fig. 8 against all selected datasets and using two different classifiers (F-KNN, ES-KNN), which works best compared to others. Moreover, to provide a better insight and to facilitate researchers working in this domain, a comprehensive comparison of set of classifiers is also provided, Table 8. From the stats, its quite clear that the classifiers belong to the family of KNN performs best both in terms of average classification accuracy (94.73%) and average computational time (1.30 s). The second best family in this domain is SVM—showing average classification accuracy of 93.83% and average computational time of 1.96 s. Ensemble and Tree family is not showing improved results in terms of average classification accuracy (89.87%, 84.91%), whilst, average computational time of ensemble family is 6.05 sec, but tree family is time efficient by taking only 1.57 s. Same trend is being followed in calculating AUC.

Table 7 Average classification time and accuracy on all datasets
Fig. 8
figure 8

Confidence interval on all selected datasets using state-of-the-art classifiers (F-KNN, ES-KNN)

Table 8 Performance comparison of state-of-the-art classifiers on selected datasets using set of performance measures

Comparison with state of the art techniques

A comprehensive comparison with existing techniques utilizing \(PH^{2}\), ISBI-2017 and ISIC-MSK datasets is given in Table 9. It can be clearly observed that our proposed methodology achieves best classification accuracy on all the given datasets. The maximum classification accuracy achieved by the previous works on \(PH^{2}\) dataset is 96.00% using color and texture features, while using the proposed methodology, it is 98.80%. Similarly on ISBI-2017 dataset, the maximum accuracy achieved by the proposed methodology is 95.90%, compared to other methods, e.g. [64] achieving 94.08% on the same dataset. Similarly on ISIC-MSK, the accuracy achieved by [18] is 97.20%, while the proposed methodology gives 99.20%.

Table 9 Comparison with state-of-the-art methods


Considering the recent success of deep architectures, we presented an effective approach for the classification of skin lesion. Comparing with conventional techniques, we introduced a hierarchical framework of discriminant features selection followed by a dimensionality reduction step. We exploited extracted information from the selected pre-trained models after fine tuning, which contributed significantly in the improvement of classification accuracy. With the proposed method, we utilized less than 3% of total features, which not only improves the classification accuracy by removing redundancy but also minimizes the computational time. After implementing this idea, we are in a position to put forth a few claims including: (a) fusion of extracted features from set of pre-trained models improves the overall accuracy, (b) an addition of feature selection and dimensionality reduction step significantly improve the classification results. As a future work, an improved segmentation criteria will be our primary focus along with the extended feature selection criteria. Moreover, we will include a few more and challenging datasets in order to provide a comprehensive comparison.

Availability of data and materials

Four datasets are utilized in this research including PH2 (, ISBI-2017 (, ISIC-UDA (!/topWithHeader/onlyHeaderTop/gallery), and ISIC-MSK (!/topWithHeader/onlyHeaderTop/gallery). These datasets are publicly available.



Entropy-controlled neighborhood component analysis


International Skin Imaging Collaboration


International Symposium on Biomedical Imaging


World Health Organization


Consensus Net Meeting on Dermoscopy


Computer-aided diagnosis


Convolutional neural networks


Transfer learning


Overall accuracy


Neighborhood component analysis


Support vector machines


K Nearest Neighbors


Local binary patterns


Region of interest


Multi-scale lesion biased representation


Fast Fourier transform


Berkeley Vision and Learning Center






Random UnderSampling Boost


  1. Skin cancer facts, 2017. URL

  2. Barata C, Ruela M, Francisco M, Mendonca T, Marques J (2014) Two systems for the detection of melanomas in dermoscopy images using texture and color features. Syst J 8:965–979

    Google Scholar 

  3. Hoshyar AN, Al-Jumaily A (2014) The beneficial techniques in preprocessing step of skin cancer detection system comparing. Procedia Comput Sci 42:25–31

    Article  Google Scholar 

  4. Nachbar F, Stolz W, Merkle T, Cognetta AB, Vogt T, Landthaler M, Bilek P, Braunfalco O, Plewig G (1994) The ABCD rule of dermatoscopy. J Am Acad Dermatol 4:521–527

    Google Scholar 

  5. Delfino M, Argenziano G, Fabbrocini G, Carli P, Giorgi VD, Sammarco E (1998) Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions. Comparison of the ABCD rule. Arch Dermatol 134:1563–1570

    Google Scholar 

  6. Menzies SW, Ingvar C, Crotty KA, McCarthy WH (1996) Frequency and morphologic characteristics of invasive melanomas lacking specific surface microscopic features. Arch Dermatol 132:1178–1182

    Article  Google Scholar 

  7. Argenziano G, Soyer HP, Chimenti S, Talamini R, Corona R, Binder M, Sera F, Cerroni L, De Rosa G, Ferrara G (2003) Dermoscopy of pigmented skin lesions: results of a consensus meeting via the internet. J Am Acad Dermatol 48:679–693

    Article  Google Scholar 

  8. Ruela M, Barata C, Mendonca T, Marques J (2013) On the role of shape in the detection of melanomas. In: 8th international symposium on image and signal processing and analysis (ISPA 2013)

  9. Nida N, Irtaza A, Javed A, Yousaf MH, Mahmood MT (2019) Melanoma lesion detection and segmentation using deep region based convolutional neural network and fuzzy C-means clustering. Int J Med Inform 124:37–48

    Article  Google Scholar 

  10. Fernando B, Fromont E, Tuytelarrs T (2014) Mining mid-level features for image classification. J Comput Vis 108(3):186–203

    Article  MathSciNet  Google Scholar 

  11. Akram T, Khan MA, Sharif M, Yasmin M (2018) Skin lesion segmentation and recognition using multichannel saliency estimation and M-SVM on selected serially fused features. J Ambient Intell Humaniz Comput.

    Article  Google Scholar 

  12. Khatami A, Nazari A, Khosravi A, Lim CP, Nahavandi S (2020) A weight perturbation-based regularisation technique for convolutional neural networks and the application in medical imaging. Expert Syst Appl 149:113196

    Article  Google Scholar 

  13. Li Y, Shen L (2018) Skin lesion analysis towards melanoma detection using deep learning network. Sensors 18(2):556

    Article  Google Scholar 

  14. Dolz J, Desrosiers C, Wang L, Yuan J, Shen D, Ayed IB (2020) Deep CNN ensembles and suggestive annotations for infant brain MRI segmentation. Comput Med Imaging Gr 79:101660

    Article  Google Scholar 

  15. Bi L, Kim J, Ahn E, Kumar A, Fulham M, Feng D (2017) Dermoscopic image segmentation via multi-stage fully convolutional networks. IEEE Trans Biomed Eng 64(9):2065–2074

    Article  Google Scholar 

  16. Marques JS, Barata C, Mendonca T (2012) On the role of texture and color in the classification of dermoscopy images. In: Annual international conference of the IEEE engineering in medicine and biology society (EMBC)

  17. Ganster H, Pinz A, Rohrer R, Wildling E, Blinder M, Kittler H (2001) Automated melanoma recognition. IEEE Trans Biom Eng 20(3):233–239

    Google Scholar 

  18. Khan MA, Tallha A, Muhammad S, Aamir S, Khursheed A, Musaed A, Syed IH, Abdualziz A (2018) An implementation of normal distribution based segmentation and entropy-controlled features selection for skin lesion detection and classification. BMC Cancer 18(1):638

    Article  Google Scholar 

  19. Naeem S, Riaz F, Hassan A, Miguel Tavares C, Nisar R (2015) Description of visual content in dermoscopy images using joint histogram of multiresolution local binary patterns and local contrast. In: Proceedings of 16th international conference on intelligent data engineering and automated learning (IDEAL 2015), Poland

  20. Khan MA, Sharif M, Akram T, Bukhari SA, Nayak RS (2020) Developed newton-raphson based deep features selection framework for skin lesion recognition. Pattern Recognit Lett 129:293–303

    Article  Google Scholar 

  21. Sumithra R, Suhil M, Guru DS (2015) Segmentation and classification of skin lesions for disease diagnosis. Procedia Comput Sci 45:76–85

    Article  Google Scholar 

  22. Attia M, Hossny M, Zhou H, Nahavandi S, Asadi H, Yazdabadi A (2019) Digital hair segmentation using hybrid convolutional and recurrent neural networks architecture. Comput Methods Progr Biomed 177:17–30

    Article  Google Scholar 

  23. Joseph S, Panicker JR (2016) Skin lesion analysis system for melanoma detection with an effective hair segmentation method. In: International conference on information science (ICIS). IEEE, New York, pp 91–96

  24. Cheerla N, Frazier D (2014) Automatic melanoma detection using multi-stage neural networks. Int J Innov Res Sci Eng Technol 3(2):9164–9183

    Google Scholar 

  25. Khan KA, Shanir PP, Khan YU, Farooq O (2020) A hybrid local binary pattern and wavelets based approach for EEG classification for diagnosing epilepsy. Expert Syst Appl 140:112895

    Article  Google Scholar 

  26. Hawas AR, Guo Y, Du C, Polat K, Ashour AS (2020) OCE-NGC: a neutrosophic graph cut algorithm using optimized clustering estimation algorithm for dermoscopic skin lesion segmentation. Appl Soft Comput 86:105931

    Article  Google Scholar 

  27. Hajiaghayi M, Kortsarz G, MacDavid R, Purohit M, Sarpatwar K (2020) Approximation algorithms for connected maximum cut and related problems. Theor Comput Sci 814:74–85

    Article  MathSciNet  MATH  Google Scholar 

  28. Pour MP, Seker H (2020) Transform domain representation-driven convolutional neural networks for skin lesion segmentation. Expert Syst Appl 144:113129

    Article  Google Scholar 

  29. Ahn E, Bi L, Jung YH, Kim J, Li C, Fulham M, Feng DD (2015) Automated saliency-based lesion segmentation in dermoscopic images. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, New York, pp 3009-3012

  30. Khan MA, Akram T, Sharif M, Javed K, Rashid M, Bukhari SAC (2019) An integrated framework of skin lesion detection and recognition through saliency method and optimal deep neural network features selection. Neural Comput Appl.

    Article  Google Scholar 

  31. Barata C, Ruela M, Francisco M, Mendona T, Marques JS (2014) Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Syst J 8(3):965–979

    Article  Google Scholar 

  32. Qaisar A, Garcia IF, Emre Celebi M, Ahmad W, Mushtaq Q (2013) A perceptually oriented method for contrast enhancement and segmentation of dermoscopy images. Skin Res Technol 19(1):e490–e497

    Article  Google Scholar 

  33. Nagarajan G, Babu LD (2019) A hybrid of whale optimization and late acceptance hill climbing based imputation to enhance classification performance in electronic health records. J Biomed Inform 94:103190

    Article  Google Scholar 

  34. Chatterjee S, Dey D, Munshi S, Gorai S (2019) Extraction of features from cross correlation in space and frequency domains for classification of skin lesions. Biomed Signal Process Control 53:101581

    Article  Google Scholar 

  35. Bi L, Kim J, Ahn E, Feng D, Fulham M (2016) Automatic melanoma detection via multi-scale lesion-biased representation and joint reverse classification. In: 2016 IEEE 13th international symposium on biomedical imaging (ISBI). IEEE, New York, pp 1055–1058

  36. Abuzaghleh O, Faezipour M, Barkana BD (2016) A portable real-time noninvasice skin lesion analysis system to assist in melanoma early detection and prevention

  37. Mahbod A, Schaefer G, Ellinger I, Ecker R, Pitiot A, Wang C (2018) Fusing fine-tuned deep features for skin lesion classification. Comput Med Imaging Gr 71:19

    Article  Google Scholar 

  38. Al-masni MA, Kim DH, Kim TS (2020) Multiple skin lesions diagnostics via integrated deep convolutional networks for segmentation and classification. Comput Methods Progr Biomed 190:105351

    Article  Google Scholar 

  39. Ibtehaz N, Rahman MS (2020) MultiResuNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87

    Article  Google Scholar 

  40. Hajabdollahi M, Esfandiarpoor R, Sabeti E, Karimi N, Soroushmehr SM, Samavi S (2020) Multiple abnormality detection for automatic medical image diagnosis using bifurcated convolutional neural network. Biomed Signal Process Control 57:101792

    Article  Google Scholar 

  41. Kadampur MA, Al Riyaee S (2020) Skin cancer detection: applying a deep learning based model driven architecture in the cloud for classifying dermal cell images. Inform Med Unlocked 18:100282

    Article  Google Scholar 

  42. Xie D, Lei Z, Li B (2017) Deep learning in visual computing and signal processing. Appl Comput Intell Soft Comput.

    Article  Google Scholar 

  43. Karl W, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. J Big Data 3(1):9

    Article  Google Scholar 

  44. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Advances in neural information processing systems. Springer, Singapore, pp 3320–3328

    Google Scholar 

  45. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. MIT Press, Cambridge, pp 1097–1105

    Google Scholar 

  46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  47. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  48. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: alexnet-level accuracy with 50× fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360

  49. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  50. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. CoRR, abs/1608.06993, arXiv:1608.06993

  51. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  52. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, p 12

  53. Duan Y, Fang L, Licheng J, Peng Z, Zhang L (2017) SAR image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recognit 64:255–267

    Article  Google Scholar 

  54. Tang P, Hanli W (2017) Richer feature for image classification with super and sub kernels based on deep convolutional neural network. Comput Electr Eng 62:499–510

    Article  Google Scholar 

  55. Mendoncya T, Ferreira PM, Marques J, Marcyal ARS, Rozeira J (2013) A dermoscopic image database for research and benchmarking. Presentation in proceedings of PH2 IEEE EMBC

  56. Gutman D, Codella NCF, Celebi E, Helba B, Marchetti M, Mishra N, Halpern A (2016) Skin lesion analysis toward melanoma detection: achallenge. In: The international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1605.01397. 2016

  57. Codella NCF, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza SW, et al (2017) Skin lesion analysis toward melanoma detection: a challenge at the 2017 int. symp. biomed. imaging. arXiv preprint arXiv:1710.05006

  58. Tim L, Vincent N, Richard G, Andrew C, David M (1997) Dullrazor: a software approach to hair removal from images. Comput Biol Med 27(533–43):12.

    Article  Google Scholar 

  59. Duan Q, Akram T, Duan P, Wang X (2016) Visual saliency detection using information contents weighting. Optik 127(19):7418–7430

    Article  Google Scholar 

  60. Akram T, Laurent B, Naqvi SR, Alex MM, Muhammad N (2018) A deep heterogeneous feature fusion approach for automatic land-use classification. Inf Sci 467:199–218

    Article  Google Scholar 

  61. Sankar AS, Nair SS, Dharan VS, Sankaran P (2015) Wavelet sub band entropy based feature extraction method for BCI. Procedia Comput Sci 46:1476–1482

    Article  Google Scholar 

  62. Goldberger J, Hinton GE, Roweis ST, Salakhutdinov RR (2005) Neighbourhood components analysis. Advances in neural information processing systems. MIT Press, Cambridge, pp 513–520

    Google Scholar 

  63. Wei Y, Kuanquan W, Wangmeng Z (2012) Neighborhood component feature selection for high-dimensional data. JCP 7(1):161–168

    Google Scholar 

  64. Bi L, Kim J, Ahn E, Kumar A, Feng D, Fulham M (2019) Step-wise integration of deep class-specific learning for dermoscopic image segmentation. Pattern Recognit 85:78–89

    Article  Google Scholar 

  65. Zaqout I (2016) Diagnosis of skin lesions based on dermoscopic images using image processing techniques. Int J Signal Process Image Process Pattern Recognit 9(9):189–204

    Google Scholar 

  66. Shehzad K, Uzma J, Kashif S, Usman Akram M, Manzoor W, Ahmed W, Sohail A (2016) Segmentation of skin lesion using Cohen-Daubechies-Feauveau biorthogonal wavelet. SpringerPlus 5(1):1603

    Article  Google Scholar 

  67. Waheed Z, Waheed A, Zafar M, Raiz F (2017) An efficient machine learning approach for the detection of melanoma using dermoscopic images. In: International conference on communication, computing and digital systems (C-CODE). IEEE, New York, pp 316–319

  68. Sultana NN, Mandal B, Puhan NB (2018) Deep residual network with regularised fisher framework for detection of melanoma. IET Comput Vis 12(8):1096–1104

    Article  Google Scholar 

  69. Harangi B, Baran A, Hajdu A (2018) Classification of skin lesions using an ensemble of deep neural networks. In: 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2575–2578

Download references


Research is funded by Deanship of Scientific Research at University of Ha’il.


Research is funded by Deanship of Scientific Research at University of Ha’il.

Author information

Authors and Affiliations



Conceptualization: TA. Data curation: SRN, NNQ. Funding acquisition: MAlhaisoni. Investigation: HMJL. Methodology: SRN, TA. Project administration: NNQ. Resources: SAH, MAli. Software: SN, MAli. Supervision: TA, SRN. Validation: TA, MAlhaisoni. Visualization: HMJL, SAH. Writing original draft: SN, SRN, TA. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Syed Rameez Naqvi.

Ethics declarations

Competing interests

Authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akram, T., Lodhi, H.M.J., Naqvi, S.R. et al. A multilevel features selection framework for skin lesion classification. Hum. Cent. Comput. Inf. Sci. 10, 12 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: