You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

Published: 14 February 2022

Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification

and
Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal 462003, India
*
Author to whom correspondence should be addressed.

Abstract

Class imbalance is a phenomenon of asymmetry that degrades the performance of traditional classification algorithms such as the Support Vector Machine (SVM) and Extreme Learning Machine (ELM). Various modifications of SVM and ELM have been proposed to handle the class imbalance problem, which focus on different aspects to resolve the class imbalance. The Universum Support Vector Machine (USVM) incorporates the prior information in the classification model by adding Universum data to the training data to handle the class imbalance problem. Various other modifications of SVM have been proposed which use Universum data in the classification model generation. Moreover, the existing ELM-based classification models intended to handle class imbalance do not consider the prior information about the data distribution for training. An ELM-based classification model creates two symmetry planes, one for each class. The Universum-based ELM classification model tries to create a third plane between the two symmetric planes using Universum data. This paper proposes a novel hybrid framework called Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to handle the classification of binary class-imbalanced problems. The proposed RKWELM-UFS combines the Universum learning method with a Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM) for the first time to inherit the advantages of both techniques. To generate efficient Universum samples in the feature space, this work uses the kernel trick. The performance of the proposed method is evaluated using 44 benchmark binary class-imbalanced datasets. The proposed method is compared with 10 state-of-the-art classifiers using AUC and G-mean. The statistical t-test and Wilcoxon signed-rank test are used to quantify the performance enhancement of the proposed RKWELM-UFS compared to other evaluated classifiers.

1. Introduction

The performance of a classification problem is affected by various data complexity measures such as class imbalance, class overlapping, length of the decision boundary, small disjuncts of classes, etc. In the classification domain, most of the real-world problems are class imbalanced. Examples of such problems are cancer detection [,], fault detection [], intrusion detection system [], software test optimization [], speech quality assessment [], pressure prediction [], etc. In a problem when the number of samples in one class outnumbers the numbers of samples in some other class, it is considered as a class imbalanced/asymmetric problem. The class with a greater number of instances is the majority class and the class with fewer instances is the minority class. In real-world problems, usually, the minority class instances have more importance than the majority class.
Traditional classifiers such as the support vector machine (SVM), Naive Bayes, decision tree, and extreme learning machine (ELM) are biased towards the correct classification of majority class data. Various approaches have been proposed to handle such class-imbalanced classification problems, which can be classified as data sampling, algorithmic and hybrid methods [].
In classification, the idea of using additional data along with the original training data has been used widely for better training of the model. The virtual example method, oversampling method, noise injection method, and Universum data creation method are some examples that use additional data. The oversampling method generates additional data in the majority class to balance the data distribution in the classes. In the virtual example and noise injection methods, labeled synthetic data are created that may not come from the same distribution as the original data. Universum data creation methods allow the classifier to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand as stated in []. In Universum learning-based classification models, the Universum data are added to the training data to enhance performance. Universum data are data that do not belong to any of the target classes. The two main factors which affect the performance of Universum data are the number of Universum data created and the method used for the creation of Universum data. Different methods have been used for the creation of Universum; among those, the two most common methods widely used are the use of examples from other classes and random averaging [].
Several methods have been proposed that use Universum data in the training of SVM based classifiers to handle the class imbalance problem, such as the Universum Support Vector Machine (USVM) [], Twin support vector machine with Universum data (TUSVM) [], and Cost-Sensitive Universum-SVM (CS-USVM) []. A Universum support vector machine-based model for EEG signal classification has been proposed in []. A nonparallel support vector machine for a classification problem with Universum learning has been proposed in []. An improved non-parallel Universum support vector machine and its safe sample screening rules are proposed in []. Tencer et al. [] used Universum data with other classifiers such as fuzzy models to demonstrate its usefulness in combination with fuzzy models. Recently, a Multiple Universum Empirical Kernel Learning (MUEKL) [] classifier has been proposed to handle class imbalance by combining the Universum learning with Multiple Empirical Kernel Learning (MEKL).
Extreme Learning Machine (ELM) [] is a single hidden-layer feed-forward neural network designed for regression and classification with fast speed and better generalization performance, but it cannot handle the classification of class-imbalanced problems effectively. Various ELM based models have been proposed to handle the classification of class imbalance problems, such as Weighted Extreme Learning Machine (WELM) [], Class-Specific Cost Regulation Extreme Learning Machine (CCR-ELM) [], Class-Specific Kernelized Extreme Learning Machine (CSKELM) [], Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM) [], UnderBagging-based Kernelized Weighted Extreme Learning Machine (UBKWELM) [], and UnderBagging-based Reduced-Kernelized Weighted Extreme Learning Machine (UBRKWELM) []. The proposed work is motivated by the idea that none of the existing ELM-based models for classification encode prior knowledge in the training model using Universum data.
This work proposes a novel hybrid classification model called Reduced-Kernel Weighted Extreme Learning Machine using Universum data in Feature Space (RKWELM-UFS) which incorporates the Universum data in the RKWELM model. The contributions of the proposed approach are listed below.
  • This work is the first attempt that utilized the Universum data in a Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM)-based classification model to handle the class imbalance problem.
  • The Weighted Kernelized Synthetic Minority Oversampling Technique (WKSMOTE) [] is an oversampling-based classification method in which the synthetic samples are created in the feature space of the Support Vector Machine (SVM). Inspired by WKSMOTE, the proposed work creates the Universum samples in the feature space.
  • The proposed method uses the kernel trick to create the Universum samples in the feature space between randomly selected instances of the majority and minority classes.
  • In a classification problem, the samples located near the decision boundary contribute more to better training. The creation of Universum samples in feature space ensures that the Universum samples lie near the decision boundary.
The rest of the paper is structured as follows. In the related work section, Universum learning, class imbalance learning, ELM classifier, and its variants are discussed in detail. The proposed work section provides a detailed explanation of the proposed RKWELM-UFS classifier. The experimental setup and result analysis section provide the specification of the dataset used in the experiments, parameter settings of the proposed algorithm, the evaluation metrics used for performance evaluation, and the experimental results obtained in form of various tables and figures. The last section provides the concluding remarks and future research directions.

3. Proposed Method

This work proposes a novel Reduced-Kernel Weighted Extreme Learning Machine using Universum data in Feature Space (RKWELM-UFS) to handle the class imbalance classification problem. In the proposed work, the Universum data along with the original training data is provided to the classifier for training purposes, to improve its learning capability. The proposed method creates Universum samples in the feature space because the mapping of input data from the input space to the feature space is not conformal.
The following subsections describe the process of creation of the Universum samples in the input space, the process of creation of the Universum samples in the feature space, the proposed RKWELM-UFS classifier, and the computational complexity of the proposed RKWELM-UFS classification model. Algorithm 1 provides the pseudo-code of the proposed RKWELM-UFS.

3.1. Generation of Universum Samples in the Input Space

To generate a Universum sample x u between a majority sample x m and a minority sample x n , the following equation can be used:
x u = x m + δ x n x m
where δ represents a random number in the uniform distribution U [0, 1].

3.2. Generation of Universum Samples in the Feature Space

To generate a Universum sample in the feature space between a majority sample x m and a minority sample x n the following equation can be utilized:
ϕ x m n = ϕ x m + δ m n ϕ x n ϕ x m
where, ϕ . is the feature transformation function which is generally unknown and δ m n is a random number between [0, 1]. The proposed work uses δ m n = 0.5. Similarly to SVM, LS-SVM, and PSVM, the transformation function ϕ . need not be known to users; instead, its kernel function K ( x m ,   x n ) can be deployed. If a feature mapping ϕ . is unknown to users, one can apply Mercer’s conditions on ELM to define a kernel matrix for KELM [] as follows:
Ω K E L M = H H T : Ω K E L M m , n = K x m ,   x n = ϕ x m T . ϕ x n
In the proposed work, we have to calculate the kernel function K x i ,   x j m n , where x i represents the original target training sample and x j m n   is the Universum sample. According to [] without computing   ϕ x i and ( x j m n ) , we can obtain the corresponding kernel K x i , x j m n using the following equation:
K x j m n ,   x i = ϕ x i T   ϕ x j m n = ϕ x i T ( ϕ x m + δ m n ϕ x n ϕ x m = ϕ x i T ( ϕ x m + δ m n ϕ x i T ϕ x n δ m n ϕ x i T ϕ x m = K x i ,   x m + δ m n K x i ,   x n + δ m n K x i ,   x m K x j m n ,   x i = 1 δ m n K x i ,   x m + δ m n K x i ,   x n

3.3. Proposed Reduced-Kernel Weighted Extreme Learning Machine Using Universum Samples in Feature Space (RKWELM-UFS)

Training of an ELM [] based classifier requires the computation of the output layer weight matrix β. The proposed RKWELM-UFS uses the same equation as RKWELM [] to obtain the output layer weight matrix β which is reproduced below:
β = I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1 Ω R K E L M U F S T   W T
where, W is the diagonal weight matrix, which gives different weights to the majority class, the minority class, and the Universum instances using Equation (10), T is the target vector in which the class label for Universum samples is set to 0 (given the class label of majority and minority class are +1 and −1 respectively), and Ω R K E L M U F S is the kernel matrix of the proposed RKWELM-UFS.
In the proposed work, the Universum instances are added to the training process along with the original training instances. The reason behind computing β in the same manner as RKWELM is that the proposed RKWELM-UFS computes the kernel matrix Ω R K E L M U F S by deploying the original training instances excluding the Universum instances as centroids. The value of Ω R K E L M U F S is obtained by augmentation of the two matrices Ω R K E L M   a n d   Ω U F S   . The following subsections describe the computation of Ω R K E L M   , Ω U F S   and Ω R K E L M U F S .

3.3.1. Computation of Ω KELM  

The proposed work computes the kernel matrix for the N number of original training instances termed as Ω K E L M   in the same manner as it was computed in the KELM [], which is represented as:
Ω K E L M = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N N × N

3.3.2. Computation of Ω UFS  

Equation (20) can be used to create a Universum sample ϕ x m n between two original training samples ϕ x m and ϕ x n in feature space. As we have discussed the transformation function ϕ . is unknown to the user, so the computation of ϕ x m n is not possible here. For convenience, we will refer to the Universum sample ϕ x m n as ϕ u i . In the proposed work without computing ϕ u i , we can directly compute the corresponding kernel K u i , x j . K ( u i , x j ) is calculated using Equation (22). In the proposed algorithm, only the original training samples are used as centroids, so the matrix Ω U F S for p number of Universum samples and N number of original training samples can be represented as:
Ω U F S = K u 1 ,   x 1 K u 1 ,   x 2 . K u 1 ,   x N K u 2 ,   x 1 K u 2 ,   x 2 . K u 2 ,   x N . . . . K u p ,   x 1 K u p ,   x 2 . K u p ,   x N p × N

3.3.3. Computation of Ω RKELM UFS

The addition of Universum samples in the training process requires that the original kernel matrix i.e., Ω R K E L M be augmented to include the matrix Ω U F S . The final hidden layer output kernel matrix of the proposed RKWELM-UFS is obtained by augmentation of the two matrices Ω K E L M and Ω U F S which is denoted as Ω R K E L M U F S .
Ω R K E L M U F S = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N K u 1 ,   x 1 K u 1 ,   x 2 . K u 1 ,   x N K u 2 ,   x 1 K u 2 ,   x 2 . K u 2 ,   x N . . . . . . . . K u p ,   x 1 K u p ,   x 2 . K u p ,   x N ( ( N + P ) × N )
The output of RKWELM-UFS can be obtained using Equation (18) used in RKWELM, which is reproduced below:
f x = s i g n K x t , x 1 . K x t , x i . . K x t , x N T I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1   Ω R K E L M U F S T   W T
Here x t represents the test instance and x i   represent the training instance for i = 1, 2, …, N.
Algorithm 1 Pseudocode of the proposed RKWELM-UFS
INPUT: Training Dataset   X x i , t i i = 1 N
Number of Universum samples to be generated: p
OUTPUT:
1: Calculate the kernel matrix Ω K E L M   ϵ   N × N as shown in Equation (24) for the N number of original training instances using Equation (21).
2: Calculate the kernel matrix Ω U F S   ϵ   p × N as shown in Equation (25) for the N number of training instances and p number of Universum instances as follows.
       for j = 1 to p
       Randomly select one majority instance x m
       Randomly select one minority instance x n
          for i = 1 to N
          calculate K x j m n ,   x i using Equation (22)
         End
      End
3: Augment the matrix Ω K E L M with the matrix Ω U F S to obtain the reduced kernel matrix
using Universum samples Ω R K E L M U F S shown in Equation (26).
4: To obtain the output weight matrix β use the Equation (23).
5: To determine the class label of an instance x use the Equation (27).

3.4. Computational Complexity

For training of the ELM-based classification algorithm, it is necessary to obtain the output layer weight matrix i.e., β . For the proposed RKWELM-UFS β is obtained using Equation (23) which is reproduced below:
β = I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1 Ω R K E L M U F S T   W T
Here, Ω R K E L M U F S is a matrix of size N + p × N , where N is the number of training instances and p is the number of Universum samples. The weight matrix, i.e., W, is of size N + p × N + p and the target matrix i.e., T is of size N + p × c where c is the number of target class labels; here, the number of target class labels is 2 because we are using the binary classification problems. To compute Ω R K E L M U F S first we need to compute the Ω R K E L M and Ω U F S . In the following steps the computational complexity of computing β is identified step by step:
  • The computational complexity of calculating Ω R K E L M i.e., the kernel matrix shown in Equation (24) is O n N 2 , where n is the number of features of training data in input space.
  • The computational complexity of calculating matrix Ω U F S shown in Equation (25) is O p .
  • The computational complexity of the output weights β can be calculated as
    3.1
    Matrix multiplications: ( Ω R K E L M U F S T   W Ω R K E L M U F S )
    Computational complexity: O 2 N N + p 2
    3.2
    Computational complexity of computing the inverse of N × N matrix computed in Step 3.1 is O N 3
    3.3
    Computational complexity of matrix multiplications Ω T W T is
    O N 2 N + p + N c N + p
    3.4
    Computational complexity of matrix multiplication of 2 matrices obtained in Step 3.1 and Step 3.3 is O N 2 c
The final computational complexity of calculating β is O 2 N N + p 2 + N 3 + N 2 N + p + N 2 n + N 2 c + N c N + p + p . The computational complexity can be simplified to O N 3 because the value of c is 2, the value of n is smaller than N, and the maximum value of p can be N.

4. Experimental Setup and Result Analysis

This section provides the experiments performed to evaluate the proposed work, which includes the specification of the datasets used for experimentation, the parameter settings of the proposed algorithm, the evaluation metrics used for performance comparison, and the results obtained through experiments and performance comparison with the state-of-the-art classifiers.

4.1. Dataset Specifications

The proposed work uses 44 binary class-imbalanced datasets for performing the experiments. These datasets are downloaded from the KEEL dataset repository [,] in 5-fold cross-validation format. Table 2 provides the specification of these datasets. In Table 2, # Attributes denote the number of features, # Instances denotes the number of instances and, IR denotes the class imbalance ratio in the presented datasets. The class imbalance ratio (IR) for the binary class dataset can be defined as follows:
I R = n u m b e r   o f   i n s t a n c e s   i n   m a j o r i t y   c l a s s n u m b e r   o f   i n s t a n c e s   i n   m i n o r i t y   c l a s s
Table 2. Specification of 44 benchmark datasets from KEEL dataset repository.
The datasets used for the experiments are normalized using min-max normalization in the range [1, −1] using the following equation:
x = x m i n n m a x n m i n n     2 1
Here, the original feature value of nth feature is denoted by x, minimum value of nth feature is denoted by minn and the maximum value of nth feature is denoted by maxn.

4.2. Evaluation Matrix

The confusion matrix, also called the error matrix, can be employed to evaluate the performance of a classification model. It allows the visualization of the performance of an algorithm. In a confusion matrix TP denotes True Positive, TN denotes True Negative, FP denotes False Positive, and FN denotes False Negative.
Accuracy is not a suitable measure to evaluate the performance of a classifier when dealing with a class-imbalanced problem. The other performance matrices used for the performance evaluation in such problems are G-mean and AUC (area under the ROC curve). The AUC defines the measure of the entire area under the ROC curve in two dimensions. The ROC known as receiver operating characteristic curve is a graph that shows the performance of the model by plotting T P r a t e and T N r a t e on the graph.
G m e a n = T P r a t e T N r a t e A U C = 1 + T P r a t e   T N r a t e 2
Here,
T P r a t e = T P T P + F N   a n d   T N r a t e = T N T N + F P

4.3. Parameter Settings

The proposed RKWELM-UFS creates Universum samples between randomly selected pairs of majority and minority samples. Because of the randomness, this work presents the mean (denoted as tstR or TestResult) and standard deviation (denoted as std) of the test G-mean and test AUC obtained for 10 trials. The proposed RKWELM-UFS has two parameters, namely the regularization parameter C and the Kernel width parameter σ (denoted as KP). The optimal values of these parameters are obtained using grid search, by varying them on the range 2 18 , 2 16 , , 2 48 ,   2 50   a n d   2 18 , 2 16 , , 2 18 ,   2 20   respectively.

4.4. Experimental Results and Performance Comparison

The proposed RKWELM-UFS is compared with three sets of algorithms used to handle class imbalance learning. The first set contains the existing approaches which use Universum samples in the classification model generation to handle class-imbalanced problems such as MUEKL [] and USVM []. The second set of approaches consists of the single classifiers such as KELM [], WKELM [], CCR-KELM [], and WKSMOTE [] which are used to handle class-imbalanced problems. The third set contains the popular ensemble classifiers such as RUSBoost [], BWELM [], UBRKELMMV [], UBRKELM-SV [], UBKELM-MV [], and UBKELM-SV [].
The statistical t-test and Wilcoxon signed-rank test are used to evaluate the performance of the proposed RKWELM-UFS and other methods in consideration. In the t-test result, the value of H (null hypothesis) is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise.
In the Wilcoxon signed-rank test result, the value of H (null hypothesis) is 1 if the test rejects the null hypothesis that there is no difference between the grade medians at the 5% significance level. In the statistical tests, the p-value indicates the level of significant difference between the compared algorithms; the lower the p-value, the higher the significant difference between the compared algorithms. This work uses AUC and G-mean as the measures of the performance evaluation. The AUC results of classifiers MUEKL and USVM shown in Table 3 are obtained from the work MUEKL [].
Table 3. Performance comparison of the proposed RKWELM-UFS with other existing Universum-based classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively.

4.4.1. Performance Analysis in Terms of AUC

Table 3, Table 4 and Table 5 provide the performance of the proposed RKWELM-UFS and other classification models in terms of AUC. The reported test AUC of the proposed RKWELM-UFS given in Table 3, Table 4 and Table 5 is the averaged test AUC obtained in 10 trials, using 5-fold cross-validation in each trial. Table 3 provides the performance of the proposed RKWELM-UFS and the existing Universum-based classifiers MUEKL and USVM on 35 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 32 datasets. Table 4 provides the performance of the proposed RKWELM-UFS and the existing single classifiers like KELM, WKELM, CCR-KELM, and WKSMOTE on 21 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 14 datasets. Table 5 provides the performance of the proposed RKWELM-UFS and the existing ensemble of classifiers such as RUSBoost, BWELM, UBRKELM-MV, UBRKELMSV, UBKELM-MV, UBKELM-SV on 21 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 10 datasets.
Table 4. Performance comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average AUC (std., KP, and C denotes the standard deviation, Kernel width parameter, and regularization parameter, respectively.
Table 5. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Figure 1a–c shows the boxplot diagram for the AUC results of the classifiers on various datasets shown in Table 3, Table 4 and Table 5 respectively. The boxplot creates a visual representation of the data to visualize the performance. It can be seen in Figure 1a,b that the proposed method RKWELM-UFS has the highest median value and smallest inter-quantile range, which shows that the RKWELM-UFS is performing better than MUEKL, USVM, KELM, WKELM, CCR-KELM, and WKSMOTE. It can be seen in Figure 1c that RKWELM-UFS is performing better than RUSBoost. Table 6 provides the t-test results and Table 7 provides the Wilcoxon Signed-rank test results on the AUC of various algorithms provided in Table 3, Table 4 and Table 5 for comparison. The results provided in Table 6 and Table 7 suggest that the proposed RKWELM-UFS performs significantly better than MUEKL, USVM, KELM, WKELM, CCR-KELM, RUSBoost, and BWELM, and its performance is approximately similar to that of WKSMOTE, UBRKELM-MV, UBRKELM-SV, UBKELMMV, and UBKELM-SV in terms of AUC.
Figure 1. Boxplot diagrams; each box visually represents the performance in terms of average AUC of algorithms labeled on X axis. (a) Boxplot for results of Table 3. (b) Boxplot for results given in Table 4. (c) Boxplot for results given in Table 5.
Table 6. T-test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.
Table 7. Wilcoxon Signed-rank test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.

4.4.2. Performance Analysis in Terms of G-mean

Table 8 and Table 9 provide the performance of the proposed RKWELM-UFS and other classification models in terms of the G-mean. The reported test G-mean of the proposed RKWELM-UFS given in Table 8 and Table 9 is the averaged test g-mean obtained in 10 trials, using 5-fold cross-validation in each trial. Table 8 provides the performance of the proposed RKWELM-UFS and the existing single classifiers such as KELM, WKELM, CCR-KELM, and WKSMOTE on 21 datasets in terms of average G-mean, where the RKWELM outperforms the other classifiers on 16 datasets. Table 9 provides the performance of the proposed RKWELM-UFS and the existing ensemble of classifiers such as RUSBoost, BWELM, UBRKELM-MV, UBRKELMSV, UBKELM-MV, and UBKELM-SV on 21 datasets in terms of average Gmean, where the RKWELM outperforms the other classifiers on seven datasets.
Table 8. Performance Comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Table 9. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Figure 2a,b shows the boxplot diagram for the G-mean results of the classifiers on various datasets shown in Table 8 and Table 9, respectively. It can be seen in Figure 2a that the proposed RKWELM-UFS has the highest median value and smallest inter-quantile range, which shows that the RKWELM-UFS is performing better than KELM, WKELM, CCR-KELM, and WKSMOTE in terms of the G-mean. It can be seen in Figure 2b that RKWELM-UFS is performing better than RUSBoost and BWELM in terms of the G-mean. Table 10 provides the t-test results and Table 11 provides the Wilcoxon signed-rank test results on the G-mean of various algorithms provided in Table 8 and Table 9 for comparison. The results provided in Table 10 and Table 11 suggest that the proposed RKWELM-UFS performs significantly better than KELM, CCR-KELM, WKSMOTE, and RUSBoost, and performs approximately similarly to WKELM, BWELM, UBRKELM-MV, UBRKELM-SV, UBKELM-MV, UBKELM-SV in terms of the G-mean.
Figure 2. Boxplot diagrams; each box visually represents the performance in terms of average G-mean of algorithms labelled on X axis. (a) Boxplot for G-mean results given in Table 8. (b) Boxplot for G-mean results given in Table 9.
Table 10. T-test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.
Table 11. Wilcoxon signed-rank test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.

5. Conclusions and Future Work

The use of additional data for training along with the original training data has been employed in many approaches. The Universum data are used to add prior knowledge about the distribution of data in the classification model. Various ELM-based classification models have been suggested to handle the class imbalance problem, but none of these models use prior knowledge. The proposed RKWELM-UFS is the first attempt that employs Universum data to enhance the performance of the RKWELM classifier. This work generates the Universum samples in the feature space using the kernel trick. The reason behind the creation of the Universum instances in the feature space is that the mapping of input data to the feature space is not conformal. The proposed work is evaluated on 44 benchmark datasets with an imbalance ratio between 0:45 to 43:80 and a number of instances between 129 to 2308. The proposed method is compared with 10 state-of-the-art methods used for class-imbalanced dataset classification. G-mean and AUC are used as metrics to evaluate the performance of the proposed method. The paper also incorporates statistical tests to verify the significant performance difference between the proposed and compared methods.
In Universum data-based learning, it has been observed that the efficiency of such classifiers depends on the quality and volume of Universum data created. The methodology of choosing or creating the appropriate Universum samples should be the subject of further research. In the proposed work, the Universum samples are created between randomly selected pairs of majority and minority class samples. In the future, some strategic concepts can be used to select the majority and minority samples instead of random selection. In the future, Universum data can be incorporated in other ELM-based classification models to enhance their learning capability on class imbalance problems. The future work also includes the development of a multi-class variant of the proposed RKWELM-UFS.

Author Contributions

Conceptualization, R.C.; methodology, R.C. and S.S.; Software, R.C.; validation, R.C. and S.S.; formal analysis, R.C.; investigation, R.C.; resources, data curation, R.C.; writing—original draft preparation, R.C.; writing—review and editing, S.S. supervision, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work received no funding from any organization, institute, or person.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schaefer, G.; Nakashima, T. Strategies for addressing class imbalance in ensemble classification of thermography breast cancer features. In Proceedings of the 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; pp. 2362–2367. [Google Scholar]
  2. Sadewo, W.; Rustam, Z.; Hamidah, H.; Chusmarsyah, A.R. Pancreatic Cancer Early Detection Using Twin Support Vector Machine Based on Kernel. Symmetry 2020, 12, 667. [Google Scholar] [CrossRef] [Green Version]
  3. Hao, W.; Liu, F. Imbalanced Data Fault Diagnosis Based on an Evolutionary Online Sequential Extreme Learning Machine. Symmetry 2020, 12, 1204. [Google Scholar] [CrossRef]
  4. Mulyanto, M.; Faisal, M.; Prakosa, S.W.; Leu, J.-S. Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry 2020, 13, 4. [Google Scholar] [CrossRef]
  5. Tahvili, S.; Hatvani, L.; Ramentol, E.; Pimentel, R.; Afzal, W.; Herrera, F. A novel methodology to classify test cases using natural language processing and imbalanced learning. Eng. Appl. Artif. Intell. 2020, 95, 103878. [Google Scholar] [CrossRef]
  6. Furundzic, D.; Stankovic, S.; Jovicic, S.; Punisic, S.; Subotic, M. Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng. Appl. Artif. Intell. 2017, 64, 440–461. [Google Scholar] [CrossRef]
  7. Mariani, V.C.; Och, S.H.; dos Santos Coelho, L.; Domingues, E. Pressure prediction of a spark ignition single cylinder engine using optimized extreme learning machine models. Appl. Energy 2019, 249, 204–221. [Google Scholar] [CrossRef]
  8. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  9. Weston, J.; Collobert, R.; Sinz, F.; Bottou, L.; Vapnik, V. Inference with the universum. In Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 25 June 2006; pp. 1009–1016. [Google Scholar]
  10. Qi, Z.; Tian, Y.; Shi, Y.J. Twin support vector machine with Universum data. Neural Netw. 2012, 36, 112–119. [Google Scholar] [CrossRef]
  11. Dhar, S.; Cherkassky, V. Cost-sensitive Universum-svm. In Proceedings of the 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 12–15 December 2012; pp. 220–225. [Google Scholar]
  12. Richhariya, B.; Tanveer, M.J. EEG signal classification using Universum support vector machine. Expert Syst. Appl. 2018, 106, 169–182. [Google Scholar] [CrossRef]
  13. Qi, Z.; Tian, Y.; Shi, Y. A nonparallel support vector machine for a classification problem with Universum learning. J. Comput. Appl. Math. 2014, 263, 288–298. [Google Scholar] [CrossRef]
  14. Zhao, J.; Xu, Y.; Fujita, H.J. An improved non-parallel Universum support vector machine and its safe sample screening rule. Knowl. Based Syst. 2019, 170, 79–88. [Google Scholar] [CrossRef]
  15. Tencer, L.; Reznáková, M.; Cheriet, M.J. Ufuzzy: Fuzzy models with Universum. Appl. Soft Comput. 2017, 59, 1–18. [Google Scholar] [CrossRef]
  16. Wang, Z.; Hong, S.; Yao, L.; Li, D.; Du, W.; Zhang, J. Multiple universum empirical kernel learning. Eng. Appl. Artif. Intell. 2020, 89, 103461. [Google Scholar] [CrossRef]
  17. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  18. Zong, W.; Huang, G.-B.; Chen, Y.J. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
  19. Xiao, W.; Zhang, J.; Li, Y.; Zhang, S.; Yang, W. Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 2017, 261, 70–82. [Google Scholar] [CrossRef]
  20. Raghuwanshi, B.S.; Shukla, S. Class-specific kernelized extreme learning machine for binary class imbalance learning. Appl. Soft Comput. 2018, 73, 1026–1038. [Google Scholar] [CrossRef]
  21. Raghuwanshi, B.S.; Shukla, S. Underbagging based reduced kernelized weighted extreme learning machine for class imbalance learning. Eng. Appl. Artif. Intell. 2018, 74, 252–270. [Google Scholar] [CrossRef]
  22. Raghuwanshi, B.S.; Shukla, S. Class imbalance learning using UnderBagging based kernelized extreme learning machine. Neurocomputing 2019, 329, 172–187. [Google Scholar] [CrossRef]
  23. Mathew, J.; Pang, C.K.; Luo, M.; Leong, W.H. Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4065–4076. [Google Scholar] [CrossRef]
  24. Chen, S.; Zhang, C. Selecting informative Universum sample for semi-supervised learning. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009. [Google Scholar]
  25. Zhao, J.; Xu, Y.J. A safe sample screening rule for Universum support vector machines. Knowl. Based Syst. 2017, 138, 46–57. [Google Scholar] [CrossRef]
  26. Cherkassky, V.; Dai, W. Empirical study of the Universum SVM learning for high-dimensional data. In Proceedings of the International Conference on Artificial Neural Networks, Limassol, Cyprus, 14–17 September 2009; pp. 932–941. [Google Scholar]
  27. Hamidzadeh, J.; Kashefi, N.; Moradi, M. Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 2020, 90, 103500. [Google Scholar] [CrossRef]
  28. Lin, W.-C.; Tsai, C.-F.; Hu, Y.-H.; Jhang, J.-S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409, 17–26. [Google Scholar] [CrossRef]
  29. Ofek, N.; Rokach, L.; Stern, R.; Shabtai, A. Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 2017, 243, 88–102. [Google Scholar] [CrossRef]
  30. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  31. Zhu, T.; Lin, Y.; Liu, Y. Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognit. 2017, 72, 327–340. [Google Scholar] [CrossRef]
  32. Agrawal, A.; Viktor, H.L.; Paquet, E. SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In Proceedings of the 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), Lisbon, Portugal, 12–14 November 2015; pp. 226–234. [Google Scholar]
  33. Wang, Z.; Chen, L.; Fan, Q.; Li, D.; Gao, D. Multiple Random Empirical Kernel Learning with Margin Reinforcement for imbalance problems. Eng. Appl. Artif. Intell. 2020, 90, 103535. [Google Scholar] [CrossRef]
  34. Raghuwanshi, B.S.; Shukla, S. Class-specific extreme learning machine for handling binary class imbalance problem. Neural Netw. 2018, 105, 206–217. [Google Scholar] [CrossRef]
  35. Guo, W.; Wang, Z.; Hong, S.; Li, D.; Yang, H.; Du, W. Multi-kernel Support Vector Data Description with boundary information. Eng. Appl. Artif. Intell. 2021, 102, 104254. [Google Scholar] [CrossRef]
  36. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2009, 40, 185–197. [Google Scholar] [CrossRef]
  37. Haixiang, G.; Yijing, L.; Yanan, L.; Xiao, L.; Jinling, L. BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. Artif. Intell. 2016, 49, 176–193. [Google Scholar] [CrossRef]
  38. Shen, C.; Wang, P.; Shen, F.; Wang, H. UBoost: Boosting with theUniversum. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 825–832. [Google Scholar] [CrossRef]
  39. Freund, Y.; Schapire, R.; Abe, N.J. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
  40. Zhang, Y.; Liu, B.; Cai, J.; Zhang, S. Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution. Neural Comput. Appl. 2017, 28, 259–267. [Google Scholar] [CrossRef]
  41. Li, K.; Kong, X.; Lu, Z.; Wenyin, L.; Yin, J. Boosting weighted ELM for imbalanced learning. Neurocomputing 2014, 128, 15–21. [Google Scholar] [CrossRef]
  42. Huang, G.B.; Zhou, H.M.; Ding, X.J.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Deng, W.Y.; Ong, Y.S.; Zheng, Q.H. A Fast Reduced Kernel Extreme Learning Machine. Neural Netw. 2016, 76, 29–38. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Alcala-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; Garcia, S.; Sanchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J. Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  45. Alcala-Fdez, J.; Sanchez, L.; Garcia, S.; del Jesus, M.J.; Ventura, S.; Garrell, J.M.; Otero, J.; Romero, C.; Bacardit, J.; Rivas, V.M.; et al. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 2009, 13, 307–318. [Google Scholar] [CrossRef]
  46. Zeng, Y.J.; Xu, X.; Shen, D.Y.; Fang, Y.Q.; Xiao, Z.P. Traffic Sign Recognition Using Kernel Extreme Learning Machines with Deep Perceptual Features. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1647–1653. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.