Oversampling Algorithm Based on Improved K-Means and Gaussian Distribution
Abstract
1. Introduction
- It improves the K-means algorithm by introducing a new inter-cluster separateness index and calculates the clustering effectiveness index by combining cluster compactness to obtain the optimal number of clusters k and better clustering results.
- In order to make the newly generated samples more consistent with the distribution characteristics of the original minority class samples, the improved K-means algorithm is used to cluster the minority class samples in order to maintain the local distribution characteristics of the minority class samples as much as possible.
- In order to improve the quality of the samples and reduce the influence of noise points, the cluster compactness index is utilized to allocate the sampling ratio for each cluster so that the sampling ratio is more inclined to the clusters with a high cluster compactness index, thus generating more representative samples.
- The experiments are conducted on 24 public datasets from the University of California Irvine (UCI) Repository, and the experimental results show that the algorithm proposed in this paper effectively improves the classification performance of imbalanced datasets.
2. Related Work
3. Proposed Algorithm
3.1. K-Means Clustering Algorithm Combining Compactness and Separateness
3.1.1. Related Definitions of Clustering
- (1)
- Cluster compactness index
- (2)
- Inter-cluster separateness index
3.1.2. Implementation Steps of the K-Means Clustering Algorithm Combining Compactness and Separateness
| Algorithm 1 CSK-means clustering algorithm |
| Input: The minority class samples of dataset Output: Optimal number of clusters and clustering results |
|
3.2. Gaussian Distribution Oversampling Algorithm
3.3. Introduction of the Proposed Algorithm
| Algorithm 2 Oversampling algorithm based on CSK-means algorithm and Gaussian distribution (CSKGO) |
| Input: Imbalanced dataset Output: balanced dataset |
| //Divide dataset into majority class and minority class Initialize , , = number of majority class and minority class while do // Perform K-means clustering to get the clustering result // calculate the cluster compactness index by Equations (1) and (2) // calculate the cluster separateness index by Equations (3)–(5) // calculate the cluster validity index by Equation (6) end while //Final clustering result //Assigns sampling ratios by the compactness index for to do //Get the sample points in the cluster //Gaussian distribution oversampling end for return |
4. Experiments
4.1. Dataset
4.2. Evaluation Measures
4.3. Experiments and Results
4.3.1. Validating the Effectiveness and Efficiency of the CSK-Means Clustering Module
- (1)
- Threshold Selection of β
- (2)
- Comparison of clustering results of each dataset before oversampling
- (3)
- Computational Complexity Analysis
- (a)
- Analysis of Computational Complexity
- (b)
- Actual Operation Time Measurement and Analysis
- (4)
- Visualizing the Effect of Clustering on Minority Samples in CSK-means
4.3.2. Comparison Analysis of Recall Before and After Oversampling
4.3.3. Comparison of Classification Results of Each Dataset After Oversampling
- (1)
- Visualization comparison of oversampling results of CSKGO algorithm and other algorithms on low-dimensional datasets
- (2)
- Comparison of classification results of CSKGO algorithm and other algorithms on high-dimensional datasets after oversampling
4.3.4. Non-Parametric Statistical Test
4.3.5. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Chen, L.; Tian, J.-X.; Abid, F.; Yang, W.; Tang, X.-F. Breast Cancer Diagnosis Using Cluster-based Undersampling and Boosted C5.0 Algorithm. Int. J. Control. Autom. Syst. 2021, 19, 1998–2008. [Google Scholar] [CrossRef]
- Sulaiman, S.; Ibraheem, I.; Hameed, S. Credit Card Fraud Detection Using Improved Deep Learning Models. Comput. Mater. Contin. 2024, 78, 1049–1069. [Google Scholar] [CrossRef]
- Jian, C.; Ao, Y.H. Imbalanced fault diagnosis based on semi-supervised ensemble learning. J. Intell. Manuf. 2022, 34, 3143–3158. [Google Scholar] [CrossRef]
- Wang, X.L.; Jin, Y.C.; Liu, W.W.; Wang, X.Y. KPCA-Based Under-Sampling Algorithm for Imbalanced Data. Adv. Appl. Math. 2024, 13, 4108–4118. [Google Scholar] [CrossRef]
- Zheng, H.Y. A New Cost-sensitive SVM Algorithm for Imbalanced Dataset. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 15–17 January 2021; pp. 402–407. [Google Scholar] [CrossRef]
- Boonchuay, K.; Sinapiromsaran, K.; Lursinsap, C. Decision tree induction based on minority entropy for the class imbalance problem. Pattern Anal. Appl. 2016, 20, 769–782. [Google Scholar] [CrossRef]
- Zhang, X.; He, Z.Q.; Yang, Y.Y. A fuzzy rough set-based undersampling approach for imbalanced data. Int. J. Mach. Learn. Cybern. 2024, 15, 2799–2810. [Google Scholar] [CrossRef]
- Bhattacharya, R.; De, R.; Chakraborty, A.; Sarkar, R. Clustering Based Undersampling for Effective Learning from Imbalanced Data: An Iterative Approach. SN Comput. Sci. 2024, 5, 386. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Dong, Y.J. The Study on Random-SMOTE for the Classification of Imbalanced Data Sets. MA Thesis, Dalian University of Technology, Dalian, China, 2009. [Google Scholar]
- Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Adv. Intell. Comput. 2005, 3644, 878–887. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- López, V.; Fernández, A.; Moreno-Torres, J.G.; Herrera, F. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 2012, 39, 6585–6608. [Google Scholar] [CrossRef]
- Sun, Y.; Pahlavan, H.A.; Chattopadhyay, A.; Hassanzadeh, P.; Lubis, S.W.; Alexander, M.J.; Gerber, E.P.; Sheshadri, A.; Guan, Y. Data Imbalance, Uncertainty Quantification, and Transfer Learning in Data-Driven Parameterizations: Lessons from the Emulation of Gravity Wave Momentum Transport in WACCM. J. Adv. Model. Earth Syst. 2024, 16, e2023MS004145. [Google Scholar] [CrossRef]
- Wang, W.; Liu, F. ADDPC-SMOTE: An Oversampling Algorithm Based on Density Difference Peak Clustering and Spatial Distribution Entropy. IEEE Access 2023, 11, 108152–108166. [Google Scholar] [CrossRef]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. DBSMOTE: Density-based synthetic minority over-sampling technique. Appl. Intell.—APIN 2011, 36, 664–684. [Google Scholar] [CrossRef]
- Yan, Y.; Zheng, L.; Han, S.; Yu, C.; Zhou, P. Synthetic oversampling with Mahalanobis distance and local information for highly imbalanced class-overlapped data. Expert Syst. Appl. 2025, 260, 125422. [Google Scholar] [CrossRef]
- Tang, Y.; Zhou, Y.; Yang, C.; Du, Y.; Yang, M. Instance gravity oversampling method for software defect prediction. Inf. Softw. Technol. 2025, 179, 107657. [Google Scholar] [CrossRef]
- Yang, X.; Xue, Z.; Zhang, L.; Wu, J. An oversampling algorithm for high-dimensional imbalanced learning with class overlapping. Knowl. Inf. Syst. 2024, 67, 1915–1943. [Google Scholar] [CrossRef]
- Liu, B.; Zhou, A.; Wei, B.; Wang, J.; Tsoumakas, G. Oversampling multi-label data based on natural neighbor and label correlation. Expert Syst. Appl. 2025, 259, 125257. [Google Scholar] [CrossRef]
- Zhang, Y.; Deng, L.; Wei, B. Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation. Mathematics 2024, 12, 1709. [Google Scholar] [CrossRef]
- Lv, Z.Z.; Liu, Q.C. Imbalanced Data Over-Sampling Method Based on ISODATA Clustering. IEICE Trans. Inf. Syst. 2023, E106.D, 1528–1536. [Google Scholar] [CrossRef]
- Zhao, X.Y.; Guan, S.; Xue, Y.; Pan, H. HS-CGK: A Hybrid Sampling Method for Imbalance Data Based on Conditional Tabular Generative Adversarial Network and K-Nearest Neighbor Algorithm. Comput. Inform. 2024, 43, 213–239. [Google Scholar] [CrossRef]
- Qin, Q.; Yang, Y.; Chen, M.; Wang, X. Improved SMOTE for Oversampling. J. Guilin Univ. Electron. Technol. 2022, 42, 53–59. [Google Scholar] [CrossRef]
- Chen, J.F.; Zheng, Z.T. Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE. Comput. Eng. Appl. 2021, 57, 106–112. [Google Scholar] [CrossRef]
- Dong, H.C.; Wen, Z.; Wan, Y.; Yan, F. An imbalanced data classification algorithm based on DPC clustering resampling combined with ELM. Comput. Eng. Sci. 2021, 43, 1856–1863. [Google Scholar]
- Steinley, D. K-Means Clustering: A Half-Century Synthesis. Br. J. Math. Stat. Psychol. 2006, 59, 1–34. [Google Scholar] [CrossRef]
- Yu, H.; Mao, C.K. Automatic Three-way Decision Clustering Approach Based on K-means. J. Comput. Appl. 2016, 36, 2061–2065+2091. [Google Scholar]
- Hassan, M.M.; Eesa, A.S.; Mohammed, A.J.; Arabo, W.K. Oversampling Method Based on Gaussian Distribution and K-Means Clustering. Comput. Mater. Contin. 2021, 69, 451–469. [Google Scholar] [CrossRef]
- Bezdek, J.C.; Pal, N.R. Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1998, 28, 301–315. [Google Scholar] [CrossRef] [PubMed]
- Rousseeuw, P. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
- Cali’nski, T.; Harabasz, J.A. A Dendrite Method for Cluster Analysis. Commun. Stat.—Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Choudhary, R.; Shukla, S. SMOTE Based Weighted Kernel Extreme Learning Machine for Imbalanced Classification Problems. In Internet of Things and Connected Technologies; Springer International Publishing: Cham, Switzerland, 2021; pp. 193–200. [Google Scholar]
- Zhou, H.; Tong, J.; Liu, Y.; Zheng, K.; Cao, C. An oversampling FCM-KSMOTE algorithm for imbalanced data classification. J. King Saud Univ.—Comput. Inf. Sci. 2024, 36, 102248. [Google Scholar] [CrossRef]
- Dutta, D.; Sil, J.; Dutta, P. A bi-phased multi-objective genetic algorithm based classifier. Expert Syst. Appl. 2020, 146, 113163. [Google Scholar] [CrossRef]



















| ID | Dataset | Attributes (R/I/N) | Examples | Minority | IR | NOR |
|---|---|---|---|---|---|---|
| fla | Flare-F | 11 (0/0/11) | 1066 | 43 | 23.79 | 19.23 |
| eco2 | ecoli2 | 7 (7/0/0) | 336 | 52 | 5.46 | 6.85 |
| eco1 | ecoli1 | 7 (7/0/0) | 336 | 77 | 3.36 | 6.85 |
| car | car-good | 6 (0/0/6) | 1728 | 69 | 24.04 | 0 |
| gla0 | glass0 | 9 (0/9/0) | 214 | 70 | 2.06 | 21.96 |
| har | harberman | 3 (0/3/0) | 306 | 81 | 2.78 | 1.96 |
| ion | ionosphere | 34 (34/0/0) | 351 | 126 | 1.79 | 25.36 |
| led | led7digit-0-2-4-5-6-7-8-9_vs_1 | 7 (7/0/0) | 443 | 37 | 10.97 | 19.64 |
| new | new-thyroid2 | 5 (4/1/0) | 215 | 35 | 5.14 | 26.51 |
| pag0 | page-block0 | 10 (4/6/0) | 5472 | 559 | 8.79 | 4.84 |
| pim | pima | 8 (8/0/0) | 768 | 268 | 1.87 | 4.17 |
| pok9 | poker-9_vs_7 | 10 (0/10/0) | 244 | 8 | 29.5 | 0 |
| seg0 | segment0 | 19 (19/0/0) | 2308 | 329 | 6.02 | 5.11 |
| veh | vehicle2 | 18 (0/18/0) | 846 | 218 | 2.88 | 1.42 |
| vow0 | vowel0 | 13 (10/3/0) | 988 | 90 | 9.98 | 0 |
| win | winequality-red-4 | 11 (11/0/0) | 1599 | 53 | 29.17 | 3.75 |
| wis | wisconsin | 9 (0/9/0) | 683 | 239 | 1.86 | 16.54 |
| yea1 | yeast1 | 8 (8/0/0) | 1484 | 429 | 2.46 | 5.46 |
| yea3 | yeast3 | 8 (8/0/0) | 1484 | 163 | 8.1 | 5.46 |
| yea4 | yeast4 | 8 (8/0/0) | 1484 | 51 | 28.1 | 5.46 |
| zoo | zoo-3 | 16 (0/0/16) | 101 | 5 | 19.2 | 0 |
| kdd | kddcup-rootkit-imap_vs_back | 41 (26/0/15) | 2225 | 22 | 100.14 | 20.9 |
| kr | kr-vs-k-zero_vs_fifteen | 6 (0/0/6) | 2193 | 27 | 80.22 | 1.23 |
| pok8 | poker-8-9_vs_6 | 10 (0/10/0) | 1485 | 25 | 58.4 | 0 |
| Predict Positive | Predict Negative | |
|---|---|---|
| Actual Positive | TP | FN |
| Actual Negative | FP | TN |
| ID | DI | SC | CH | DB | CVIN |
|---|---|---|---|---|---|
| fla | 5 | 4 | 4 | 3 | 4 |
| eco2 | 3 | 2 | 2 | 6 | 6 |
| eco1 | 3 | 3 | 3 | 7 | 6 |
| car | 7 | 4 | 4 | 4 | 7 |
| gla0 | 2 | 2 | 2 | 2 | 3 |
| har | 5 | 3 | 3 | 3 | 7 |
| ion | 10 | 2 | 2 | 9 | 2 |
| led | 2 | 2 | 3 | 3 | 2 |
| new | 5 | 2 | 4 | 2 | 4 |
| pag0 | 2 | 2 | 20 | 3 | 5 |
| pim | 12 | 3 | 3 | 4 | 2 |
| pok9 | 2 | 2 | 2 | 2 | 2 |
| seg0 | 13 | 6 | 11 | 7 | 15 |
| veh | 9 | 2 | 3 | 2 | 13 |
| vow0 | 2 | 8 | 7 | 8 | 7 |
| win | 3 | 3 | 4 | 5 | 4 |
| wis | 5 | 2 | 2 | 13 | 13 |
| yea1 | 6 | 2 | 3 | 16 | 18 |
| yea3 | 5 | 3 | 2 | 11 | 9 |
| yea4 | 6 | 4 | 3 | 4 | 3 |
| zoo | 2 | 2 | 2 | 2 | 2 |
| kdd | 2 | 2 | 4 | 4 | 2 |
| kr | 4 | 2 | 4 | 4 | 4 |
| pok8 | 2 | 2 | 2 | 2 | 4 |
| Algorithms | Single-k Calculation Complexity | Total Complexity of the Complete Algorithm (Traversal of ) |
|---|---|---|
| (CVIN)CSK-means | ||
| (DI)K-means | ||
| (SC)K-means | ||
| (CH)K-means | ||
| (DB)K-means |
| Dataset | Number | (DI)K-Means | (SC)K-Means | (CH)K-Means | (DB)K-Means | CSK-Means |
|---|---|---|---|---|---|---|
| new | 215 | 0.0621 | 0.1108 | 0.0387 | 0.0682 | 0.0519 |
| wis | 683 | 0.7653 | 0.4219 | 0.247 | 0.3087 | 0.29 |
| vow0 | 988 | 2.4929 | 0.9647 | 0.5788 | 0.6638 | 0.6322 |
| seg | 2308 | 29.5484 | 9.7842 | 6.2487 | 7.6397 | 6.3592 |
| pag0 | 5472 | 284.8947 | 188.7142 | 171.197 | 176.0794 | 171.0506 |
| ID | Before Oversampling | After Oversampling |
|---|---|---|
| RF | 62.45 (±9.37) | 71.92 (±10.06) |
| KNN | 57.02 (±8.88) | 76.31 (±10.53) |
| SVM | 60.91 (±13.61) | 73.23 (±11.10) |
| ID | Accuracy | F1 | G-Mean | AUC | ||||
|---|---|---|---|---|---|---|---|---|
| RF | 93.49 | (±1.41) | 64.96 | (±8.44) | 65.65 | (±8.66) | 92.78 | (±2.65) |
| SMOTE | 93.07 | (±1.45) | 71.71 | (±7.72) | 72.09 | (±7.74) | 92.43 | (±2.95) |
| RSMOTE | 93.27 | (±1.68) | 70.42 | (±9.27) | 70.88 | (±9.07) | 92.83 | (±2.26) |
| BSMOTE | 93.07 | (±1.59) | 68.9 | (±8.65) | 69.34 | (±8.86) | 92.28 | (±2.96) |
| ASY | 92.58 | (±1.51) | 70.92 | (±7.48) | 71.33 | (±7.52) | 91.95 | (±3.09) |
| GOS | 93.12 | (±1.70) | 70.35 | (±8.74) | 70.86 | (±8.64) | 93.30 | (±2.48) |
| WKS | 93.39 | (±1.63) | 70.53 | (±8.53) | 70.91 | (±7.52) | 92.94 | (±2.13) |
| FCMS | 93.55 | (±1.47) | 69.11 | (±8.62) | 69.8 | (±7.52) | 92.99 | (±2.31) |
| MOSIG | 93.85 | (±1.49) | 66.88 | (±9.42) | 67.95 | (±9.81) | 93.06 | (±2.11) |
| MLONC | 93.44 | (±1.54) | 70.09 | (±9.46) | 70.64 | (±9.24) | 93.11 | (±2.46) |
| MLOS | 93.65 | (±1.31) | 64.82 | (±8.40) | 65.58 | (±8.59) | 92.93 | (±2.54) |
| CSKGO | 93.75 | (±1.43) | 68.36 | (±8.53) | 72.54 | (±8.44) | 93.70 | (±1.71) |
| ID | Accuracy | F1 | G-Mean | AUC | ||||
|---|---|---|---|---|---|---|---|---|
| KNN | 92.03 | (±1.73) | 59.99 | (±8.47) | 60.77 | (±8.39) | 86.15 | (±5.15) |
| SMOTE | 88.91 | (±2.07) | 62.91 | (±7.14) | 64.43 | (±7.38) | 86.32 | (±4.85) |
| RSMOTE | 88.38 | (±2.05) | 63.94 | (±7.27) | 65.91 | (±7.65) | 87.87 | (±5.03) |
| BSMOTE | 89.60 | (±2.18) | 62.90 | (±8.27) | 64.05 | (±8.40) | 85.57 | (±4.90) |
| ASY | 88.73 | (±2.09) | 63.00 | (±7.31) | 64.51 | (±7.53) | 86.69 | (±4.82) |
| GOS | 89.35 | (±2.01) | 64.34 | (±7.02) | 65.93 | (±7.10) | 88.55 | (±3.98) |
| WKS | 89.21 | (±1.75) | 63.69 | (±6.37) | 65.51 | (±6.59) | 87.72 | (±4.57) |
| FCMS | 90.21 | (±1.78) | 64.32 | (±7.66) | 64.80 | (±7.80) | 85.22 | (±5.61) |
| MOSIG | 91.19 | (±1.67) | 61.63 | (±7.24) | 61.94 | (±7.36) | 86.38 | (±5.06) |
| MLONC | 90.70 | (±1.68) | 65.50 | (±7.72) | 66.37 | (±7.91) | 86.75 | (±5.10) |
| MLOS | 92.08 | (±1.65) | 58.50 | (±8.31) | 59.60 | (±8.18) | 86.18 | (±5.09) |
| CSKGO | 90.19 | (±1.81) | 66.11 | (±6.31) | 67.53 | (±6.42) | 88.83 | (±3.12) |
| ID | Accuracy | F1 | G-Mean | AUC | ||||
|---|---|---|---|---|---|---|---|---|
| SVM | 82.01 | (±6.67) | 49.49 | (±9.40) | 51.12 | (±9.59) | 77.19 | (±8.78) |
| SMOTE | 86.85 | (±4.60) | 59.45 | (±5.65) | 61.97 | (±6.01) | 83.34 | (±6.01) |
| RSMOTE | 85.03 | (±3.85) | 58.28 | (±5.67) | 61.27 | (±5.93) | 83.96 | (±5.45) |
| BSMOTE | 86.58 | (±4.26) | 59.76 | (±6.44) | 62.22 | (±6.67) | 83.02 | (±6.38) |
| ASY | 85.79 | (±5.03) | 58.57 | (±5.94) | 61.52 | (±6.35) | 83.91 | (±6.11) |
| GOS | 83.49 | (±4.48) | 56.18 | (±5.22) | 59.33 | (±5.69) | 81.62 | (±5.24) |
| WKS | 84.71 | (±3.56) | 57.00 | (±6.33) | 60.01 | (±5.82) | 83.78 | (±6.08) |
| FCMS | 90.08 | (±3.03) | 60.54 | (±5.41) | 61.72 | (±7.91) | 82.40 | (±6.74) |
| MOSIG | 86.29 | (±4.33) | 57.30 | (±5.92) | 59.39 | (±6.41) | 84.85 | (±5.42) |
| MLONC | 90.66 | (±1.68) | 57.87 | (±5.12) | 58.66 | (±5.14) | 84.26 | (±5.66) |
| MLOS | 91.92 | (±1.18) | 51.36 | (±5.15) | 51.25 | (±5.17) | 82.60 | (±5.53) |
| CSKGO | 88.28 | (±4.11) | 59.35 | (±6.39) | 61.48 | (±6.70) | 84.54 | (±6.21) |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.1379 | 0.2776 | 0.0035 | 0.0764 | 0.1446 | 0.1131 | 0.1003 | 0.0722 | 0.0026 | 0.1094 | |
| RSMOTE | NS− | 0.3409 | 0.3483 | 0.3409 | 0.4247 | 0.4801 | 0.4091 | 0.3192 | 0.0392 | 0.0025 | |
| BSMOTE | NS− | NS+ | 0.0681 | 0.2514 | 0.4091 | 0.3745 | 0.1020 | 0.3336 | 0.0048 | 0.0427 | |
| ASY | S− | NS− | NS− | 0.3372 | 0.4721 | 0.4841 | 0.2266 | 0.3446 | 0.0166 | 0.0294 | |
| GOS | NS− | NS− | NS− | NS− | 0.3192 | 0.4286 | 0.4364 | 0.4364 | 0.0262 | 0.0212 | |
| WKS | NS− | NS− | NS− | NS+ | NS+ | 0.4364 | 0.1685 | 0.4052 | 0.0139 | 0.0202 | |
| FCMS | NS− | NS+ | NS− | NS− | NS+ | NS− | 0.1867 | 0.3783 | 0.0294 | 0.1170 | |
| MOSIG | NS− | NS− | NS− | NS− | NS− | NS− | NS− | 0.2483 | 0.0183 | 0.0268 | |
| MLONC | NS− | NS− | NS− | NS− | NS+ | NS− | NS− | NS+ | 0.0026 | 0.0143 | |
| MLOS | S− | S− | S− | S− | S− | S− | S− | S− | S− | 0.00307 | |
| CSKGO | NS+ | S+ | S+ | S+ | S+ | S+ | NS+ | S+ | S+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.1292 | 0.3483 | 0.0042 | 0.0571 | 0.1611 | 0.1660 | 0.119 | 0.07636 | 0.0028 | 0.0301 | |
| RSMOTE | NS− | 0.3557 | 0.3192 | 0.2946 | 0.3897 | 0.4801 | 0.46414 | 0.32997 | 0.04272 | 0.0002 | |
| BSMOTE | NS− | NS+ | 0.0778 | 0.2514 | 0.4247 | 0.3745 | 0.12924 | 0.30153 | 0.0048 | 0.0076 | |
| ASY | S− | NS− | NS− | 0.2843 | 0.4920 | 0.4681 | 0.25143 | 0.3707 | 0.01539 | 0.0076 | |
| GOS | NS− | NS− | NS− | NS− | 0.3409 | 0.3336 | 0.40905 | 0.43644 | 0.07493 | 0.0054 | |
| WKS | NS− | NS− | NS− | NS+ | NS+ | 0.4364 | 0.2177 | 0.36393 | 0.02938 | 0.0026 | |
| FCMS | NS− | NS+ | NS− | NS− | NS+ | NS− | 0.20327 | 0.44038 | 0.02938 | 0.0918 | |
| MOSIG | NS− | NS− | NS− | NS− | NS+ | NS− | NS− | 0.32636 | 0.01659 | 0.00639 | |
| MLONC | NS− | NS− | NS− | NS− | NS+ | NS− | NS− | NS+ | 0.00391 | 0.00272 | |
| MLOS | S− | S− | S− | S− | NS− | S− | S− | S− | S− | 0.00045 | |
| CSKGO | S+ | S+ | S+ | S+ | S+ | S+ | NS+ | S+ | S+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.2946 | 0.0681 | 0.0015 | 0.2946 | 0.4404 | 0.2389 | 0.40517 | 0.19766 | 0.48006 | 0.0032 | |
| RSMOTE | NS− | 0.1379 | 0.0571 | 0.0233 | 0.3707 | 0.3821 | 0.22065 | 0.38209 | 0.46414 | 0.0005 | |
| BSMOTE | NS− | NS− | 0.2946 | 0.0250 | 0.0951 | 0.1446 | 0.07215 | 0.2327 | 0.22663 | 0.0015 | |
| ASY | S− | NS− | NS− | 0.0113 | 0.0571 | 0.0495 | 0.02118 | 0.15625 | 0.17361 | 0.0006 | |
| GOS | NS+ | S+ | S+ | S+ | 0.0793 | 0.3015 | 0.22363 | 0.26109 | 0.25463 | 0.0104 | |
| WKS | NS+ | NS+ | NS+ | NS+ | NS− | 0.3821 | 0.26763 | 0.34458 | 0.2946 | 0.0010 | |
| FCMS | NS+ | NS+ | NS+ | S+ | NS− | NS− | 0.39743 | 0.44828 | 0.37828 | 0.0099 | |
| MOSIG | NS+ | NS+ | NS+ | S+ | NS− | NS− | NS− | 0.09012 | 0.21186 | 0.00402 | |
| MLONC | NS− | NS+ | NS+ | NS+ | NS− | NS− | NS− | NS− | 0.3409 | 0.00695 | |
| MLOS | NS− | NS+ | NS+ | NS+ | NS− | NS− | NS− | NS− | NS− | 0.00695 | |
| CSKGO | S+ | S+ | S+ | S+ | S+ | S+ | S+ | S+ | S+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.4801 | 0.4920 | 0.4841 | 0.2389 | 0.4562 | 0.2061 | 0.3336 | 0.02938 | 0.01426 | 0.0016 | |
| RSMOTE | NS− | 0.1057 | 0.1867 | 0.2119 | 0.4641 | 0.4052 | 0.18141 | 0.06178 | 0.00539 | 0.0023 | |
| BSMOTE | NS+ | NS− | 0.3639 | 0.1736 | 0.1736 | 0.3707 | 0.4562 | 0.01321 | 0.01618 | 0.0008 | |
| ASY | NS+ | NS− | NS+ | 0.1057 | 0.3783 | 0.3264 | 0.38209 | 0.02169 | 0.01426 | 0.0054 | |
| GOS | NS+ | NS+ | NS+ | NS+ | 0.0951 | 0.3446 | 0.17361 | 0.15151 | 0.00466 | 0.0154 | |
| WKS | NS+ | NS+ | NS+ | NS+ | NS− | 0.3015 | 0.37828 | 0.04006 | 0.00139 | 0.0014 | |
| FCMS | NS+ | NS+ | NS+ | NS+ | NS− | NS+ | 0.14457 | 0.12302 | 0.00368 | 0.0409 | |
| MOSIG | NS− | NS− | NS− | NS− | NS− | NS− | NS− | 0.00714 | 0.00714 | 0.00494 | |
| MLONC | S+ | NS+ | S+ | S+ | NS+ | S+ | NS+ | S+ | 0.00062 | 0.16109 | |
| MLOS | S− | S− | S− | S− | S− | S− | S− | S− | S− | 0.00014 | |
| CSKGO | S+ | S+ | S+ | S+ | S+ | S+ | S+ | S+ | NS+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.2090 | 0.4522 | 0.4325 | 0.3409 | 0.2676 | 0.3821 | 0.08534 | 0.05705 | 0.00842 | 0.0029 | |
| RSMOTE | NS+ | 0.0475 | 0.0749 | 0.4404 | 0.4522 | 0.1762 | 0.01659 | 0.19489 | 0.00368 | 0.0132 | |
| BSMOTE | NS+ | S− | 0.2514 | 0.2546 | 0.1660 | 0.4801 | 0.22363 | 0.0268 | 0.01786 | 0.0011 | |
| ASY | NS+ | NS− | NS+ | 0.2810 | 0.2743 | 0.3015 | 0.119 | 0.02938 | 0.00939 | 0.0037 | |
| GOS | NS+ | NS+ | NS+ | NS+ | 0.2389 | 0.3228 | 0.06057 | 0.32276 | 0.00695 | 0.0239 | |
| WKS | NS+ | NS− | NS+ | NS+ | NS− | 0.4286 | 0.10565 | 0.13136 | 0.00159 | 0.0034 | |
| FCMS | NS− | NS− | NS+ | NS− | NS− | NS− | 0.14457 | 0.05262 | 0.00539 | 0.0212 | |
| MOSIG | NS− | S− | NS− | NS− | NS− | NS− | NS− | 0.00317 | 0.01321 | 0.00164 | |
| MLONC | NS+ | NS+ | S+ | S+ | NS+ | NS+ | NS+ | S+ | 0.00039 | 0.05155 | |
| MLOS | S− | S− | S− | S− | S− | S− | S− | S− | S− | 0.00019 | |
| CSKGO | S+ | S+ | S+ | S+ | S+ | S+ | S+ | S+ | NS+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.0427 | 0.0150 | 0.4247 | 0.0427 | 0.0183 | 0.0023 | 0.18673 | 0.12924 | 0.22663 | 0.0029 | |
| RSMOTE | S+ | 0.0021 | 0.0071 | 0.1230 | 0.4841 | 0.0001 | 0.00657 | 0.03288 | 0.017 | 0.0314 | |
| BSMOTE | S− | S− | 0.0287 | 0.0055 | 0.0010 | 0.0183 | 0.26109 | 0.07927 | 0.11507 | 0.0006 | |
| ASY | NS− | S− | S+ | 0.0202 | 0.0062 | 0.0016 | 0.16853 | 0.2177 | 0.26109 | 0.0048 | |
| GOS | S+ | NS+ | S+ | S+ | 0.1867 | 0.0001 | 0.00866 | 0.02872 | 0.02169 | 0.4920 | |
| WKS | S+ | NS− | S+ | S+ | NS− | 0.0000 | 0.0113 | 0.015 | 0.04457 | 0.0136 | |
| FCMS | S− | S− | S− | S− | S− | S− | 0.00587 | 0.00052 | 0.01923 | 0.0001 | |
| MOSIG | NS− | S− | NS+ | NS− | S− | S− | S+ | 0.05821 | 0.38974 | 0.00031 | |
| MLONC | NS− | S− | NS− | NS− | S− | S− | S+ | NS+ | 0.23885 | 0.00233 | |
| MLOS | NS− | S− | NS+ | NS− | S− | S− | S+ | NS− | NS− | 0.00131 | |
| CSKGO | S+ | S+ | S+ | S+ | NS− | S+ | S+ | S+ | S+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.1314 | 0.3192 | 0.0202 | 0.0016 | 0.0122 | 0.3974 | 0.2327 | 0.40905 | 0.00181 | 0.4168 | |
| RSMOTE | NS− | 0.2709 | 0.2743 | 0.1335 | 0.0217 | 0.1762 | 0.48006 | 0.4562 | 0.00695 | 0.2644 | |
| BSMOTE | NS+ | NS+ | 0.0336 | 0.0143 | 0.0104 | 0.2207 | 0.36393 | 0.46414 | 0.00539 | 0.1762 | |
| ASY | S− | NS− | S− | 0.0475 | 0.1210 | 0.0722 | 0.42858 | 0.26109 | 0.01539 | 0.1539 | |
| GOS | S− | NS− | S− | S− | 0.4168 | 0.0268 | 0.08851 | 0.17619 | 0.04093 | 0.0036 | |
| WKS | S− | S− | S− | NS− | NS+ | 0.0162 | 0.05821 | 0.26109 | 0.00939 | 0.0048 | |
| FCMS | NS+ | NS+ | NS+ | NS+ | S+ | S+ | 0.10935 | 0.32636 | 0.00097 | 0.1539 | |
| MOSIG | NS− | NS− | NS− | NS+ | NS+ | NS+ | NS− | 0.40905 | 0.01578 | 0.19489 | |
| MLONC | NS− | NS− | NS+ | NS+ | NS+ | NS+ | NS− | NS+ | 0.01743 | 0.46812 | |
| MLOS | S− | S− | S− | S− | S− | S− | S− | S− | S− | 0.00164 | |
| CSKGO | NS− | NS+ | NS− | NS+ | S+ | S+ | NS− | NS+ | NS+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.3897 | 0.2743 | 0.1379 | 0.0126 | 0.0162 | 0.4052 | 0.119 | 0.27425 | 0.00149 | 0.3085 | |
| RSMOTE | NS− | 0.5000 | 0.3557 | 0.0793 | 0.0192 | 0.4562 | 0.14007 | 0.18673 | 0.0044 | 0.3594 | |
| BSMOTE | NS+ | NS− | 0.0722 | 0.0183 | 0.0146 | 0.4247 | 0.21186 | 0.22663 | 0.00368 | 0.1949 | |
| ASY | NS− | NS− | NS− | 0.0446 | 0.0853 | 0.3974 | 0.3409 | 0.43644 | 0.00842 | 0.3783 | |
| GOS | S− | NS− | S− | S− | 0.4286 | 0.1251 | 0.17619 | 0.36393 | 0.01539 | 0.0329 | |
| WKS | S− | S− | S− | NS− | NS+ | 0.1057 | 0.18673 | 0.5 | 0.00554 | 0.0505 | |
| FCMS | NS− | NS− | NS− | NS+ | NS+ | NS+ | 0.25143 | 0.27425 | 0.00131 | 0.3557 | |
| MOSIG | NS− | NS− | NS− | NS− | NS+ | NS+ | NS− | 0.40517 | 0.00289 | 0.25143 | |
| MLONC | NS− | NS− | NS− | NS− | NS+ | NS+ | NS− | NS− | 0.00776 | 0.35569 | |
| MLOS | S− | S− | S− | S− | S− | S− | S− | S− | S− | 0.00149 | |
| CSKGO | NS− | NS− | NS− | NS+ | S+ | NS+ | NS− | NS+ | NS+ | S+ |
| SMOTE | RSMOTE | BSMOTE | ASY | GOS | WKS | FCMS | MOSIG | MLONC | MLOS | CSKGO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SMOTE | 0.1660 | 0.0174 | 0.3520 | 0.3409 | 0.3974 | 0.2946 | 0.38209 | 0.14007 | 0.1563 | 0.1515 | |
| RSMOTE | NS+ | 0.0031 | 0.0630 | 0.2776 | 0.2420 | 0.1401 | 0.13786 | 0.02938 | 0.12302 | 0.2358 | |
| BSMOTE | S− | S− | 0.1094 | 0.0537 | 0.0244 | 0.4052 | 0.18406 | 0.2177 | 0.48405 | 0.0044 | |
| ASY | NS− | NS− | NS+ | 0.2119 | 0.2877 | 0.2327 | 0.41294 | 0.42465 | 0.32636 | 0.0455 | |
| GOS | NS+ | NS− | NS− | NS+ | 0.2451 | 0.2514 | 0.46017 | 0.18141 | 0.28434 | 0.1446 | |
| WKS | NS+ | NS− | S+ | NS+ | NS− | 0.2546 | 0.31207 | 0.20045 | 0.24825 | 0.0735 | |
| FCMS | NS− | NS− | NS− | NS− | NS− | NS− | 0.28434 | 0.49202 | 0.46017 | 0.1131 | |
| MOSIG | NS− | NS− | NS+ | NS+ | NS+ | NS− | NS+ | 0.23885 | 0.26763 | 0.16853 | |
| MLONC | NS− | S− | NS+ | NS+ | NS− | NS− | NS+ | NS+ | 0.48405 | 0.0778 | |
| MLOS | NS− | NS− | NS− | NS− | NS− | NS− | NS+ | NS− | NS− | 0.2177 | |
| CSKGO | NS+ | NS+ | S+ | S+ | NS+ | NS+ | NS+ | NS+ | NS+ | NS+ |
| ID | Baseline | Ablation-I | Ablation-II | Ablation-III |
|---|---|---|---|---|
| fla | 19.43 | 20.66 | 18.40 | 21.37 |
| eco2 | 80.58 | 81.68 | 82.67 | 83.75 |
| eco1 | 79.26 | 80.85 | 78.67 | 81.87 |
| car | 34.64 | 33.97 | 50.91 | 54.38 |
| gla0 | 69.77 | 69.36 | 71.34 | 71.65 |
| har | 42.10 | 42.51 | 36.21 | 40.09 |
| ion | 73.18 | 73.82 | 75.19 | 75.69 |
| led | 72.51 | 71.93 | 73.03 | 74.94 |
| new | 89.77 | 91.10 | 91.62 | 91.21 |
| pag0 | 80.94 | 80.09 | 80.07 | 80.39 |
| pim | 57.07 | 56.33 | 57.78 | 58.45 |
| pok9 | 33.33 | 34.67 | 36.00 | 37.33 |
| seg0 | 91.65 | 92.57 | 95.47 | 95.74 |
| veh | 84.62 | 84.14 | 81.39 | 82.96 |
| vow0 | 100.00 | 100.00 | 100.00 | 100.00 |
| win | 7.08 | 6.39 | 12.11 | 14.37 |
| wis | 96.27 | 95.83 | 95.86 | 96.48 |
| yea1 | 57.97 | 55.20 | 58.06 | 59.50 |
| yea2 | 72.50 | 72.97 | 74.54 | 75.16 |
| yea3 | 29.58 | 33.86 | 33.81 | 34.12 |
| zoo | 38.00 | 40.00 | 30.00 | 43.33 |
| kdd | 94.29 | 97.14 | 100.00 | 97.14 |
| kr | 100.00 | 100.00 | 100.00 | 100.00 |
| pok8 | 9.36 | 19.31 | 14.09 | 18.89 |
| Ave. | 63.08 | 63.93 | 64.47 | 66.20 |
| ID | Baseline | Ablation-I | Ablation-II | Ablation-III |
|---|---|---|---|---|
| fla | 20.73 | 21.71 | 19.12 | 22.27 |
| eco2 | 81.70 | 82.53 | 83.45 | 84.56 |
| eco1 | 79.73 | 81.18 | 79.01 | 82.36 |
| car | 45.76 | 45.23 | 58.16 | 60.57 |
| gla0 | 70.70 | 70.27 | 72.16 | 72.01 |
| har | 42.39 | 42.94 | 36.49 | 40.72 |
| ion | 75.37 | 75.91 | 77.06 | 77.50 |
| led | 73.44 | 72.95 | 73.71 | 75.53 |
| new | 90.30 | 91.60 | 91.70 | 91.32 |
| pag0 | 80.96 | 80.15 | 80.11 | 80.43 |
| pim | 57.13 | 56.35 | 57.81 | 58.50 |
| pok9 | 35.69 | 37.07 | 39.42 | 40.47 |
| seg0 | 91.94 | 92.82 | 95.54 | 95.80 |
| veh | 84.72 | 84.27 | 81.47 | 83.13 |
| vow0 | 100.00 | 100.00 | 100.00 | 100.00 |
| win | 9.30 | 8.46 | 16.92 | 19.43 |
| wis | 96.28 | 95.85 | 95.88 | 96.48 |
| yea1 | 58.09 | 55.27 | 58.28 | 59.73 |
| yea2 | 73.65 | 73.81 | 75.18 | 75.95 |
| yea3 | 34.41 | 39.17 | 39.15 | 37.19 |
| zoo | 41.55 | 43.09 | 31.55 | 45.69 |
| kdd | 94.64 | 97.32 | 100.00 | 97.32 |
| kr | 100.00 | 100.00 | 100.00 | 100.00 |
| pok8 | 28.92 | 28.25 | 21.22 | 24.35 |
| Ave. | 65.31 | 65.68 | 65.97 | 67.55 |
| F-Measure | G-Mean | |
|---|---|---|
| Baseline | 63.08 | 65.31 |
| +K-means | 63.93 | 65.68 |
| +CSK-means | 64.47 | 65.97 |
| +SRA | 66.20 | 67.55 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xie, W.; Huang, X. Oversampling Algorithm Based on Improved K-Means and Gaussian Distribution. Information 2026, 17, 28. https://doi.org/10.3390/info17010028
Xie W, Huang X. Oversampling Algorithm Based on Improved K-Means and Gaussian Distribution. Information. 2026; 17(1):28. https://doi.org/10.3390/info17010028
Chicago/Turabian StyleXie, Wenhao, and Xiao Huang. 2026. "Oversampling Algorithm Based on Improved K-Means and Gaussian Distribution" Information 17, no. 1: 28. https://doi.org/10.3390/info17010028
APA StyleXie, W., & Huang, X. (2026). Oversampling Algorithm Based on Improved K-Means and Gaussian Distribution. Information, 17(1), 28. https://doi.org/10.3390/info17010028
