A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets

Oversampling is the most popular data preprocessing technique. It makes traditional classifiers available for learning from imbalanced data. Through an overall review of oversampling techniques (oversamplers), we find that some of them can be regarded as danger-information-based oversamplers (DIBOs) that create samples near danger areas to make it possible for these positive examples to be correctly classified, and others are safe-information-based oversamplers (SIBOs) that create samples near safe areas to increase the correct rate of predicted positive values. However, DIBOs cause misclassification of too many negative examples in the overlapped areas, and SIBOs cause incorrect classification of too many borderline positive examples. Based on their advantages and disadvantages, a boundary-information-based oversampler (BIBO) is proposed. First, a concept of boundary information that considers safe information and dangerous information at the same time is proposed that makes created samples near decision boundaries. The experimental results show that DIBOs and BIBO perform better than SIBOs on the basic metrics of recall and negative class precision; SIBOs and BIBO perform better than DIBOs on the basic metrics for specificity and positive class precision, and BIBO is better than both of DIBOs and SIBOs in terms of integrated metrics.


Introduction
Data is said to be imbalanced when one of its classes (majority class, negative class) has many more examples than that of other classes (minority class, positive class). This occurs in many real-world cases, such as customer credit risk prediction [1], bankruptcy prediction [2], product fault diagnosis [3], medical data analyses [4], fraud prediction [5], etc. In these cases, their minority class is typically interesting, important, and has high misclassification costs. However, when traditional classifiers are used to classify them, the classifications usually bias the majority class. This paper calls these unsatisfactory learning results for imbalanced data learning problems.
Many studies have shown the reasons for imbalanced data learning problems including: (1) a high imbalance ratio, where the misclassifications of positive examples are regarded as tolerable errors because the total classification accuracy is high enough even though the examples are classified as negative, (2) a small disjuncts problem [6,7], where a small number of positive examples forming subclusters that cannot be ignored are usually misclassified to reduce model complexity, (3) overlapping [8,9], which is an area that includes both the majority class and the minority class. Positive examples in the overlapping are typically sacrificed in order to minimize structural risk.
To improve the performance of learning from imbalanced data, many kinds of methods have been proposed. This paper divides them into five classes as follows: (1) Algorithmic modification. Some traditional classifiers work well on imbalanced data after their internal operations are changed. Yu et al. [10] utilized an optimized decision threshold adjustment strategy in a support vector machine (SVM). Zhao et al. [11] proposed a weighted maximum margin criterion to optimize the data-dependent kernel in an SVM. In addition to kernel-based SVMs, fuzzy rule-based classification systems are used and modified to deal with imbalanced data, such as in López et al. [12] and Alshomrani et al. [13]. (2) Cost-sensitive classification. This classification takes into consideration that the minority class misclassification costs are more expensive than those for the majority class. Zhou and Liu [14] moved an output threshold toward majority class such that minority class examples become more difficult to misclassify in a training cost-sensitive neural network. Siers and Islam [15] proposed a cost-sensitive voting technique to minimize the classification costs for a decision forest. Lee et al. [16] adjusted factor scores by categorizing instances based on an SVM's margin for AdaBoost. (3) Ensemble learning. This class can be used reduce variances by aggregating predictions of a set of base classifiers. Sun et al. [17] investigated cost-sensitive boosting algorithms with different weight updating strategies for imbalanced data. Sun et al. [18] turned an imbalanced dataset into multiple balanced sub-datasets and used them in base classifiers. Another very common way type of ensemble learning is where it is combined with resampling techniques, such as SMOTEBagging [19], random balanceboost [20], and the synthetic oversampling ensemble [21]. (4) Data particle geometrical divide (GD). The GD technique creates class-based data particles to classify data examples by comparing data gravitation between different data particles. Rybak and Dudczyk [22] developed a new GD method with four algorithms for determining the mass of a data particle to effectively improve gravitational classification in the Moons and Circles datasets. Furthermore, Rybak and Dudczyk [23] proposed the variant of GD method named unequal geometrical divide to improve classification performance of imbalanced occupancy detection datasets. (5) Resampling techniques. Here, the aim is to balance the class distribution by removing majority class examples (undersampling) or by inflating minority class examples (oversampling). Since the synthetic minority oversampling technique (SMOTE) [24] was proposed in 2002, it has become one of the most influential data preprocessing/oversampling techniques in machine learning and data mining. To improve the SMOTE, undersampling techniques e.g., condensed nearest neighbor (CNN) [25], Tome lines [26], etc. were used after the oversampling. SMOTE_IPE [27] is another combined resampling method. It uses an iterative-partitioning filter [28] to remove noisy samples in both majority and minority classes to clean up boundaries and make them more regular. Li et al. [29] used the mega-trend-diffusion technique [30] for undersampling and used a two-parameter Weibull distribution estimation for oversampling in their work. A more improved oversampling technique will be introduced in Section 2.
Among the above referenced techniques, resampling is the most common for handling imbalanced data since it can be regarded as preprocessing of the previous three techniques, and it is simple-to-use do to not involving complex classifier algorithms. Instead of undersampling, which may result in discarding useful data that worsen variances and produces warped posterior probabilities [31], developing oversampling techniques (or oversamplers) has attracted more attention. The next section introduces more oversamplers. In Section 3, their advantages and shortcomings are discussed, and the motivation of the study is provided. Then, our method using boundary information is proposed in Section 4. In Section 5, two experiments are designed to provide strength comparisons of different oversamplers and the performance verifications of the proposed method. The experimental results are shown and discussed in Section 6, and conclusions are drawn in Section 7.

Oversampling Techniques
More oversampling techniques are introduced in this section. For convenience, suppose m is a majority class example selected to be oversampled; p (or n) is a positive (or negative) example selected to be oversampled with m, and s is a synthetic sample generated by m and p (or n).
The simplest method for oversampling is random oversampling (ROS). Because it is sampling with replacement, each sample can be trained at least two times using classifiers, which causes overfitting. Chawla et al. [24] suggested that ROS makes classifiers learn specific patterns. Instead of duplication, SMOTE created s by s = m + r × (p − m), where p was a random one of the k nearest positive neighbors (kNPN) of m, and r was a real value between 0 and 1. Because s was different from m and p, it was called a synthetic sample. Synthetic samples can help classifiers create more general patterns.
To generate more helpful synthetic samples, some studies have suggested that dangerous minority class examples are more important and that new samples should be created dependent on them. Han et al. [32] regarded m as dangerous if at least half of its k nearest neighbors (kNNs, containing both minority class and majority classes are majority class examples. Then, only dangers were selected to be oversampled by using SMOTE. This procedure is called Borderline-SMOTE1 (B1_SMOTE). If s was not only created between two dangerous examples, but also between a dangerous example (m) and its k nearest negative neighbor (kNNN, n), the approach is called Borderline-SMOTE2 (B2_SMOTE), where s is computed using the formula s = m + r × (n − m), and r is a real value between 0 and 0.5, so that s is closer to m. Similarly, the adaptive synthetic sampling approach (ANASYN) [33] deems that harder-to-learn examples are more important. It defines the difficulty level of learning m by the ratio of the number of majority classes to the number of minority classes in the kNNs of m. Then, examples with greater difficulty levels are more easily oversampled using SMOTE. Instead of using the counts to determine important examples, borderline over-sampling (BOS) [34] and synthetic informative minority over-sampling (SIMO) [35] identify dangerous examples based on decision boundaries trained by an SVM. Then, BOS generates synthetic samples by using interpolation or extrapolation techniques based on the ratio of the majority class to the minority class; SIMO generates synthetic samples based on the distance to decision boundaries, and the examples being misclassified by the SVM are thus more likely to be oversampled. The majority weighted minority oversampling technique (MWMOTE) [36] was designed as a new approach to determine boundary examples. Initially, the majority set near the minority set was considered to be the borderline majority class. Second, the minority set near the borderline majority set was considered to be the borderline minority class. Then, the denser the majority set was, and the sparser the minority set was, the more important the borderline examples were. In the latest approaches, an attribute weighted kNN hub on SMOTE (AWH_SMOTE) [37] was applied to the kNN hub to find informative examples. Examples with rare occurrences in kNN hub were considered to be more dangerous and thus more important.
Conversely, other oversamplers considers safe minority class examples to be more important, and thus, it is felt that new samples should be created based on them. Safelevel-SMOTE (SL_SMOTE) [38] attempts to generate synthetic samples in safe regions. It defines the safe level of m by using the number of minority class in its kNNs. Let slm be the safe level of m, and slp be the safe level of p. If slm is larger than slp, then the s generated between m and p is positioned near m, and vice versa. However, when m and p are from two different subgroups, s will fall into a majority class group to become noise. The local neighborhood extension of SMOTE (LN_SMOTE) [39] fixed this problem by selecting oversampled examples from their kNNs rather than from their kNPN. Instead of using kNNs, cluster-SMOTE (C_SMOTE), proposed by Cieslak et al. [40], clusters the minority class first, then selects oversampled examples in the same clusters. In addition, synthetic oversampling of instances by clustering and jittering (SOI_CJ) [41] utilize jittering process within the same clusters so that only one example is selected to be oversampled each time. Douzas et al. [42] proposed the k-means SMOTE (km_SMOTE), which uses a k-means algorithm to cluster the entire dataset; then, only the clusters dominated by the minority class can be used to oversample, where the sparser the clusters are, the more synthetic samples the clusters generate.
To be brief, when a minority class example is surrounded by most of majority class examples, it is called the dangerous minority class example. On the contrary, it is called the safe minority class example as shown in Figure 1. class first, then selects oversampled examples in the same clusters. In addi oversampling of instances by clustering and jittering (SOI_CJ) [41] utilize jit within the same clusters so that only one example is selected to be oversamp Douzas et al. [42] proposed the k-means SMOTE (km_SMOTE), which us algorithm to cluster the entire dataset; then, only the clusters dominated by class can be used to oversample, where the sparser the clusters are, the m samples the clusters generate.
To be brief, when a minority class example is surrounded by most of examples, it is called the dangerous minority class example. On the contra the safe minority class example as shown in Figure 1.

Motivation
From the overall reviews of the oversampling techniques in Section 2 some oversamplers suggest that the important minority class examples are dangerous, borderline, hard to learn, and misclassified, whereas others sug examples are the most important. When oversamplers based on dangerous examples to create samples, oversamplers were called danger-informatio samplers (DIBOs). Conversely, based on safe ones, others were called safe based oversamplers (SIBOs). We summarized oversamplers in Table 1, resp

Motivation
From the overall reviews of the oversampling techniques in Section 2, we find that some oversamplers suggest that the important minority class examples are those that are dangerous, borderline, hard to learn, and misclassified, whereas others suggest that safe examples are the most important. When oversamplers based on dangerous minority class examples to create samples, oversamplers were called danger-information-based oversamplers (DIBOs). Conversely, based on safe ones, others were called safe-informationbased oversamplers (SIBOs). We summarized oversamplers in Table 1, respectively. Table 1. Two classes of oversamplers note apparent row misalignment.

Danger-information-based oversamplers (DIBOs)
Han et al. [ We find that DIBOs generate synthetic samples biased toward majority class areas, which can strengthen the decision boundaries necessary to be correctly classified as minority class. SIBOs generate synthetic samples biases toward minority class areas, which can protect safe regions from being misclassified as majority class. However, the extra samples generated by DIBOs become noises that affect the classification of the majority class, while SIBOs tend to ignore the minority class borderline examples. To visually understand these, a makeup imbalanced dataset shown in Figure 2a is taken to be resampled to balance classes using different methods, and then they are classified by using SVM classifiers with the same parameter settings. As can be seen in Figure 2e-h, the synthetic samples generated by DIBOs are more radical, so the predicted decision boundaries are biased toward the majority class. As shown in Figure 2i-l, the synthetic samples generated by SIBOs are more conservative, so fewer majority class examples are misclassified, while at the same time, more borderline minority class examples are not being correctly classified. minority class. SIBOs generate synthetic samples biases toward minority class areas, which can protect safe regions from being misclassified as majority class. However, the extra samples generated by DIBOs become noises that affect the classification of the majority class, while SIBOs tend to ignore the minority class borderline examples. To visually understand these, a makeup imbalanced dataset shown in Figure 2a is taken to be resampled to balance classes using different methods, and then they are classified by using SVM classifiers with the same parameter settings. As can be seen in Figure 2e-h, the synthetic samples generated by DIBOs are more radical, so the predicted decision boundaries are biased toward the majority class. As shown in Figure 2i-l, the synthetic samples generated by SIBOs are more conservative, so fewer majority class examples are misclassified, while at the same time, more borderline minority class examples are not being correctly classified.
Based on the advantages and disadvantages of DIBOs and SIBOs respectively, this paper proposes a new boundary information concept, where the created samples depending on it are close to the decision boundaries, rather than close to the more dangerous areas as with DIBOs or close to the safer areas as with SIBOs.

Methodology
In this section, this paper firstly defines the boundary information; then, this section introduces the procedure for a boundary-information-based oversampler (BIBO), after which we provide an analysis of its strengths.

Boundary Information
In MWMOTE [36], the Euclidean distance is used to compute the information weight for minority class examples. Euclidean distance is also used in computing the similarity of two points. As in a similarity calculation, this paper claims that the information weight Based on the advantages and disadvantages of DIBOs and SIBOs respectively, this paper proposes a new boundary information concept, where the created samples depending on it are close to the decision boundaries, rather than close to the more dangerous areas as with DIBOs or close to the safer areas as with SIBOs.

Methodology
In this section, this paper firstly defines the boundary information; then, this section introduces the procedure for a boundary-information-based oversampler (BIBO), after which we provide an analysis of its strengths.

Boundary Information
In MWMOTE [36], the Euclidean distance is used to compute the information weight for minority class examples. Euclidean distance is also used in computing the similarity of two points. As in a similarity calculation, this paper claims that the information weight (IW) of b on a denoted by IW(a ← b) is negative to the Euclidean distance from b to a denoted by b − a and that is exponential decay by b − a as Equation (1): Suppose in a minority class example m that its IW is very small if another example is far away from it. Therefore, we can only consider the IWs between m and its kNNs. Suppose that minkN Ns = {p 1 , p 2 , . . . , p i } are kNPNs in the kNNs of m, then IW(m ← minkNNs) = ∑ i 1 e(− p i − m ); majkNNs = n 1 , n 2 , . . . , n j are kNNNs in the kNNs of m, and IW(m ← majkNNs) = ∑ j 1 e − n j − m , where i + j = k. Obviously, compared with IW(m ← majkNNs), the larger IW(m ← minkNNs) is, the safer m is, and vice versa. Thus, this paper calls the safe information weight (SIW) of m and call IW(m ← majkNNs) the danger information weight (DIW). It can be said that the virtual samples generated by DIBOs are biased towards dangerous examples and that the SIBOs generate new samples biased towards safe examples. As discussed in Section 3, this paper suggests that created samples should be biased towards decision boundaries.
As a rule of thumb, examples having as much DIW and SIW are more likely to be decision points. To find desirable decision boundaries, this paper defines a new concept of boundary information (BI), and the BI weight (BIW) is computed using Equation (2): From Equation (2), it is known that an example has zero BIW when all of its kNNs are minority class, which this paper calls a redundancy; an example also has zero BIW when all of its kNNs are majority class, which this paper calls noise; an example has great BIW only when its DIW and SIW are both large. This paper proposes that synthetic samples should be biased towards examples with larger BIW, and   (2), it is known that an example has zero BIW when all of its kN minority class, which this paper calls a redundancy; an example also has zero BI all of its kNNs are majority class, which this paper calls noise; an example has gr only when its DIW and SIW are both large. This paper proposes that synthetic should be biased towards examples with larger BIW, and Figure 3 is used to dem the expected effects of this assumption. For example, the ab and bc are far away since the b is a safe example with very low BIW; the bc and cd are near c since c is a point with great BIW; the cd is far away from d since d is a noise; the ef is close compared to e since the BIW of f is larger than the BIW of e; in the case of small examples h and i, their created sample hi is within them so it is easier for them to ognized.

Procedure for the Boundary-Information-Based Oversampler
This paper calls the oversampler that generates synthetic samples near the e having larger BIW, the boundary-information-based oversampler (BIBO). The pr for the BIBO is proposed in Table 2.

Procedure for the Boundary-Information-Based Oversampler
This paper calls the oversampler that generates synthetic samples near the examples having larger BIW, the boundary-information-based oversampler (BIBO). The procedure for the BIBO is proposed in Table 2.

Computational Complexity of BIBO
The computational complexity of the proposed BIBO algorithm depends on the number of major class examples N, the number of original imbData n, and the number of minority class examples P. In Table 2, the for loop (p in P) indicates that we perform P times of calculations of BIW and r with each synthetic sample generation. In our algorithm, the number of synthetic samples is set as 2N − n. Namely, when the size of imbData is increased from n to 2N, the proposed BIBO algorithm is stopped. Therefore, the computational complexity of the BIBO algorithm can be calculated by Equation (3): Table 2. The algorithm of the proposed boundary-information-based oversampler.

Input:
imbData: An imbalanced dataset. K: The number of kNPNs for oversampling. k: The number of kNNs for computing BIW.

Output:
resData: The imbData that had been resampled by this procedure

Procedure Begin
1. P ← minority class set from imbData 2. N ← majority class set from imbData 3. resData = imbData 4. while the length of resData < twice the length of N: 5. for p in P:

6.
KNNp ← the K nearest neighbors of p from P 7.
kNNp ← the k nearest neighbors of p from imbData 8.
pp ← randomly select an example from KNNp 11.
kNNpp ← the k nearest neighbors of pp from imbData 12.
if BIW(pp) ≥ BIW(p): if the length of resData >= twice the length of N 21.
break # the numbers of the two classes are balanced 22. return the resData Procedure End.

Strengths Analysis
The proposed BIBO selects minority class examples to be oversampled for which the BIWs are not zero in order to filter out examples with noise and redundancy. Then, the created samples are far away from both safe examples and dangerous examples, and they are closer to boundary examples. Also, it is easily understood. Only two parameters are considered. The capital letter K is used for the purpose of determining the kNPNs for oversampling, and the small letter k is used for determining the kNNs for the purpose of computing BIWs.
To illustrate the strengths of the proposed BIBO, the imbalanced dataset shown in Figure 2a is applied to be oversampled by using the BIBOs with different values of K and k. Then, they are trained and classified using the same SVM classifiers. The classification results are shown in Figure 4, where from top to bottom, the K values increase from 5 to 15, and from left to right, the k values increase from 5 to 30. From the figures, this paper finds that when both the K and k are small (see Figure 4a), the BIBO is conducted like SIBOs; when the K increases from 5 to 15, the areas predicted to be positive (PPAs) are larger, and some separated PPAs are merging, as the kNPNs become larger and generate virtual samples inside them; when the k increases from 5 to 30, the PPAs become larger and start to intrude into the majority class areas as the old noises in the overlapping areas become fewer in number. However, even though both K and k are large enough, as shown in Figure 4l, the synthetic samples are still near the decision boundaries. Therefore, the BIBO has a great tolerance for parameter value settings. results are shown in Figure 4, where from top to bottom, the K values increase from 5 to 15, and from left to right, the k values increase from 5 to 30. From the figures, this paper finds that when both the K and k are small (see Figure 4a), the BIBO is conducted like SIBOs; when the K increases from 5 to 15, the areas predicted to be positive (PPAs) are larger, and some separated PPAs are merging, as the kNPNs become larger and generate virtual samples inside them; when the k increases from 5 to 30, the PPAs become larger and start to intrude into the majority class areas as the old noises in the overlapping areas become fewer in number. However, even though both K and k are large enough, as shown in Figure 4l, the synthetic samples are still near the decision boundaries. Therefore, the BIBO has a great tolerance for parameter value settings.

Experiment
The experiment designs weighing the comparative strengths of the various oversamplers and the performance verification of the BIBO are introduced in this section. Before introducing them, this section provides the results of the classification evaluation metrics and the oversampler evaluation procedure, respectively.

Evaluation Metrics
Accuracy rate (acc) is the common metric for evaluating classifications, for which Equation (4) is the formula. However, using acc is cause of imbalanced data learning problems because it creates biases toward the majority class, as mentioned in Section 1. To balance the effects of two classes, the confusion matrices shown in Table 3 are used to

Experiment
The experiment designs weighing the comparative strengths of the various oversamplers and the performance verification of the BIBO are introduced in this section. Before introducing them, this section provides the results of the classification evaluation metrics and the oversampler evaluation procedure, respectively.

Evaluation Metrics
Accuracy rate (acc) is the common metric for evaluating classifications, for which Equation (4) is the formula. However, using acc is cause of imbalanced data learning problems because it creates biases toward the majority class, as mentioned in Section 1. To balance the effects of two classes, the confusion matrices shown in Table 3 are used to formulate the imbalanced data classification evaluation metrics. The recall (rec) calculated using Equation (5) and the specificity (spec) calculated using Equation (6) are the true positive rate and true negative rate, respectively, that is to say, the percentages of correct classifications of the classes. The positive class (pre P ) precision calculated using Equation (7) and the negative class (pre N ) precision calculated using Equation (8) are the positive predictive value and the negative predictive value, respectively, in other words, the correct rates for the predicted values. This section calls these the five basic metrics, and the metrics integrated by two or more basic metrics are called integrated metrics.
Considering rec and spec, g-measure (Gmean) is defined as the geometric mean of rec and spec, which is calculated using Equation (9). Instead of considering only proportions that are being correctly classified, the F-measure (Fmeas) takes pre P into account and is calculated using Equation (10). Another well-known measure is the area under the ROC curve (AUC) [43]. In the ROC chart, the x-axis is 1-spec, and the y-axis is rec, and the Entropy 2022, 24, 322 9 of 16 curve shows their tradeoff by giving a decision cut-off. Obviously, these five are integrated metrics, in which the β in Fmeas is set as 1 in our experiments:

Dataset Description
To verify the universality of the oversamplers, this paper tests some datasets that are downloaded from the KEEL-dataset repository [44]. Because the differences in the MM-metrics on different datasets are not commensurate, this paper uses the rankings of the oversamplers on each dataset to obtain their mean. Then, the mean rankings can be regarded as the performance measures of the oversamplers on the classifier.

The Real-World Datasets
Oversamplers using the kNN concept are not applicable when dealing with highly imbalanced datasets because most minority class examples in them would be recognized as noise and lead to the wrong results. Therefore, the real-world datasets with imbalance ratios between 1.5 and 9 used in Fernández et al. [46] are used in this experiment. In addition, the ionosphere dataset downloaded from the UCI machine learning repository [47] is used in our experiment. The dataset has 17 pulse numbers with two attributes and one output to indicate returns of electromagnetic signals. Moreover, one big dataset named Swarm Behaviour Aligned with 2400 attributes and 24,017 samples downloaded from UCI machine learning repository is used in our experiments. Thus, a total of 23 datasets are used to implement our experiments. They are shown in Table 5, where Att. is the number of attributes.

Oversampler Performance Evalutation
This paper uses the k-fold cross-validation procedure to obtain the performance measures of oversamplers on every metric. For an imbalanced dataset and its one crossvalidation process, first, the data set is partitioned into a training set and a testing set. Second, the training set is oversampled using an oversampler. Third, a classifier is trained using the oversampled set. Fourth, the testing set is used on the trained classifier to obtain the evaluation metrics. This process is repeated k times to obtain the mean of the metrics. Since the oversampler can increase the variances in the classifier, this paper further repeats the k-fold-cross-validation process K times to obtain the mean of the mean metrics (MM-metrics). Then, the MM-metrics can be regarded as the performance measures of the oversampler on both the dataset and the classifier.
Based on the above process, this paper uses different oversamplers containing RAW (without oversampler), SMOTE, B1_SMOTE, B2_SMOTE, ADASYN, MWMOTE, SL_SMOTE, LN_SMOTE, SOI_CJ, km_SMOTE, and BIBO to obtain their MM-metrics, respectively. Among them, RAW is where the original training set is used without being oversampled; the B1_SMOTE, B2_SMOTE, ADASYN, and MWMOTE are DIBOs; the SL_SMOTE, LN_SMOTE, SOI_CJ, and km_SMOTE are SIBOs. These programs are imported from the smote_variants package [48], and their parameters are the default settings. The BIBO is programed as shown in Table 2, where the K and the k are set as 5 and 15, respectively. Then, these oversamplers are ranked based on their MM-metrics on every metric.

Results and Discussion
In this section, the two experiment results are introduced and discussed.

Comparative Strengths Results
As is known, some evaluation metrics contradict each other, such as rec and spec, pre P and pre N . However, in most studies, only partial metrics were used to verify the effectiveness of their own proposed oversamplers. In this experiment, we attempt to apply all of the metrics to determine the comparative strengths of the different oversamplers. Since most oversamplers employ the concept of kNN, the kNN classifier is applied in this experiment. The comparative strengths of the oversamplers are presented using different metrics. The experimental results are shown in Table 6, and the findings are summarized as follows: (1) All the oversamplers outperform the RAW on the rec and pre N basic metrics, but they do not on the acc, spec, and pre P metrics. This means that, for oversamplers, and it outperforms all the DIBOs. On the contrary, for the rec and pre N metrics, BIBO outperforms all the SIBOs, similar to the DIBOs. These findings confirm that BIBO is better than the SIBOs and DIBOs in general due to moving virtual samples toward decision boundaries. (5) BIBO has better performance results on the Fmeas, AUC and ave. Hence, BIBO is better oversampler for improving imbalanced dataset learning problems.

Performance Results
In this experiment, the most representative metrics containing acc, Gmean, Fmeas, AUC, and their average (ave) are used to measure the performance of oversamplers on the four classifiers kNN, C4.5, SVC_L, and SVC_S. This paper uses SVMs with linear and sigmoid kernel functions in which the C4.5 program [49] is a "DecisionTreeClassifier" with the "entropy" criterion imported from "sklearn.tree"; the SVM programs are "sklearn.svm. SVC" with "linear" and "sigmoid" kernels denoted as SVC_L and SVC_S, respectively. The results are shown in Table 7, and the findings are summarized as follows: (1) When C4.5 is used as the classifier, BIBO obtains better results on all of the metrics, even the acc. It can be deduced that the virtual samples created by BIBO are near the real decision nodes on the decision tree. (2) For the four classifiers and the five metrics on each of them, 20 metrics in total, half of them indicate that BIBOs have better performance results (the values in bold and underlined). Consequently, this further confirms that BIBO is better technique for improving imbalanced data learning problems. (3) Some oversampler performance results are not better than those for RAW, especially in the case of SVC_L and SVC_S. This may have been caused by (1) contradictory metrics, (2) overlapping blurriness, (3) the noise of virtual samples, or (4) the effectiveness of some classifiers on some imbalanced datasets.

Comparative Results of Computational Complexity
In this paper, the test PC is equipped with an Intel ® Core TM i7-10700 CPU @ 2.90 GHz and 32 GB RAM. The operation system is Ubuntu 20.04.2 LTS. A total of 23 datasets were used to perform comparisons of computational complexity between the proposed BIBO and nine algorithms. We sample 80 percent of data in each dataset to run 50 experiments. The experiments are implemented under above-mentioned environment with Python 3.8.10. The averages of computational time of the algorithms can be obtained as shown in Table 8. The SOI_CJ algorithm has the longest running time among them because it performs more computation on clustering in one big dataset, namely Swarm Behaviour Aligned, as shown in Table 5. The BIBO algorithm outperforms those of five algorithms on computational time. In this section, we random draw 80 percent of data from ecoli-0_vs_1 dataset as an example to explain the proposed BIBO method in details. The data is set as a training dataset listed in.
The implementation procedure of the BIBO method is explained in the following: Step 1. The training dataset has 115 majority class examples (Positive) and 61 minority class examples (Negative) as shown in Table 9. Step 2. Set K = 10 and k = 10 to compute values of BIW and r as shown in Table 2. We briefly listed the values of BIW(pp), BIW(p), and r in Table 10, respectively. Step 3. Generate synthetic minority class examples as shown in Table 10.
Step 4. Stop the steps 1-3 when the number of training samples is twice of the number of majority class samples.
Step 5. Add generated synthetic examples into the original dataset to build up a balanced training dataset.

Conclusions
This paper defined the information weight (IW) between two points by using the reciprocal of a natural exponential function with the Euclidean distance as its index, where the total IWs of the minority (or majority) class examples in one's kNNs is the safe (or danger) information weight (SIW, or DIW) in the example. Then, examples having larger SIWs (or DIWs) can be consider as safe (or danger). The comparison experiment proved that SIBOs generating synthetic samples near safe areas improves the performance of spec and pre P and that DIBOs generating synthetic samples near dangerous areas can improve the performance of rec and pre N .
In the proposed oversampler (BIBO), the product of SIW and DIW is defined as the boundary information weight (BIW), where synthetic samples are generated near examples with larger BIWs. This indicates that the examples with both large SIWs and large DIWs are more likely to be decision points and that synthetic samples should be generated near them. The comparison experiment proved that BIBO has the advantages of both SIBOs and DIBOs. The performance verification experiment confirmed again that BIBO is better approach on the whole for handling imbalanced data learning problems. However, BIBO did not have the best performance in all cases. A more customized BIBO on different datasets or on different classifiers can be proposed in the future. In our future research, one can use other real datasets downloaded from UCI machine learning repository to verify the effectiveness of the customized BIBO. Another direction is to undertake verification using popular artificial neural network as learning models.   [45]. The real-world datasets presented in this study are openly available at https://doi.org/10.1016/j.fss.2007.12.023 (accessed on 22 February 2022), reference number [46]. The UCI datasets presented in this study are openly available at reference number [47].