Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm
Abstract
1. Introduction
1.1. Research Background and Challenges
1.2. Gaps in Existing Research
- (1)
- Insufficient robustness of feature selection algorithms
- (2)
- Insufficient utilization of redundant features
1.3. Motivation and Contributions
1.4. Structure and Organization
2. Background
2.1. Feature Engineering
2.2. Regression and Predictive
3. Research Methodology
3.1. Principle of Standard Dolphin Swarm Algorithm
| Algorithm 1. Original Dolphin Swarm Algorithm |
| Input: Regression Dataset, Parameters, Population size, Maximum iterations, Objective function. Output: Optimal Feature Subset: Step 1: Each dolphin in the population is represented by a binary vector , where is the total number of features. indicates that the -th feature is selected, and indicates it is not selected. Step 2: Randomly generate initial individuals in the population. Step 3: Use the binary vector to extract the selected features subsets from . Step 4: Train a regression model (e.g., linear regression, random forest) using the selected features subsets and the target . Step 5: Use the objective function compute the fitness value. Step 6: Randomly update individuals based on a global best solution. Step 7: Simulate local search behavior by refining individual solutions, such as flipping specific feature selection states. Step 8: Update the global best features subsets based on fitness values. Step 9: Record the best feature subset found so far. Step 10: Terminate the algorithm when the maximum number of iterations is reached or when the fitness value stabilizes. |
3.2. Maximum Relevance and Minimal Redundancy (mRMR) for Regression
- (1)
- Relevance Metric: Replace mutual information with the F-statistic, which quantifies the linear dependence between a feature and the continuous target .
- (2)
- Redundancy Metric: Retain Pearson correlation to evaluate pairwise feature redundancy [45], ensuring computational efficiency and interpretability.
- (1)
- Feature Relevance via F-Statistic:
- (2)
- Feature Redundancy via Pearson Correlation:
- (3)
- mRMR Optimization Criterion:
3.3. Fitness Function
3.4. DSA–mRMR Algorithm for Feature Selection
| Algorithm 2. A novel feature selection algorithm (DSA–mRMR) |
| Input: An information system, initial values of various parameters. Output: A feature subset S 1. Initialize a population of dolphins , where each dolphin represents a binary vector , = 1 if the feature is selected else 0. 2. Compute Fitness for each dolphin using the Formula (4). 3. Calculate Adaptive Mutation Probability for the current iteration: |
| 4. Individual Exploration(Mutation): For each dolphin, mutate each bit in its vector with the current adaptive probability : |
| 5. Move toward the current best solution: For each dolphin and for each bit , update as: |
|
6. Update best solution: 7. Stop if maximum iterations reached or fitness improvement < ( = 10−5) and output the best subsets. |
3.5. Ensemble Learning for Redundant Features
| Algorithm 3. Redundant feature aggregation (RFA) |
| Input: A redundant feature subset . Output: A composite feature constructed from , or if no valid feature is generated. 1. Select n machine learning models (e.g., Liner Regression, Random Forest) suitable for regression tasks to perform modeling and training. 2. Input sub information systems based on redundant features from DSA–mRMR. For each redundant feature subset , train the n models independently to generate n reconstructed features subsets . 3. Calculate Pearson’s correlation coefficient between each and the target variable . If < threshold, return , Exit the loop. Else, proceed to Step 4. 4. Select the model with the highest (or lowest RMSE) as the optimal model M*. 5. Retrain on the entire to produce the final composite feature . 6. Return if valid, otherwise . |
3.6. The Overall Algorithm of Feature Engineering
| Algorithm 4. Enhanced feature engineering model (EFEM) |
| Input: An information system Output: , and 1. Using the DSA–mRMR algorithm, the feature subset and its redundant feature subset are obtained. 2. Evaluate whether the correlation coefficient of the redundant feature subset is greater than the threshold. If the correlation coefficient , proceed next step; otherwise, jump to step 5. 3. Input the redundant feature subset as an information system to algorithm RFA, generating a new feature . 4. Merge the feature subset selected by the DSA–mRMR with the feature constructed in Step 3 to form a new feature subset . 5. Select the best machine learning model from algorithm RFA to train the extracted feature subset . 6. Output the results of evaluation indicators, including , RMSE and MAPE. |
4. Experimental Analysis
4.1. Algorithm Comparison
4.2. Statistical Analysis
4.3. Algorithm Convergence and Runtime
4.4. Ablation Study
4.5. Parameter Sensitivity Analysis
4.6. Robustness Analysis
4.7. Application on Order Demand Prediction
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- An, J.; Kim, I.S.; Kim, K.-J.; Park, J.H.; Kang, H.; Kim, H.J.; Kim, Y.S.; Ahn, J.H. Efficacy of automated machine learning models and feature engineering for diagnosis of equivocal appendicitis using clinical and computed tomography findings. Sci. Rep. 2024, 14, 22658. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Zhang, W.; Yang, T.; Jiang, Y.; Huang, F.; Lim, W.Y.B. STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading. arXiv 2024, arXiv:2412.09468. [Google Scholar] [CrossRef]
- Siemens Senseye. The Transformative Role of Generative AI in Predictive Maintenance [White Paper]; Siemens Digital Industries: Plano, TX, USA, 2024. [Google Scholar]
- Kraev, E.; Koseoglu, B.; Traverso, L.; Topiwalla, M. Shap-Select: Lightweight feature selection using SHAP values and regression. arXiv 2024, arXiv:2410.06815. [Google Scholar] [CrossRef]
- Benítez-Peña, S.; Blanquero, R.; Carrizosa, E.; Ramírez-Cobo, P. Cost-sensitive feature selection for support vector machines. arXiv 2024, arXiv:2401.07627. [Google Scholar] [CrossRef]
- Abhyankar, N.; Shojaee, P.; Reddy, C.K. LLM-FE: Automated feature engineering for tabular data with LLMs as evolutionary optimizers. arXiv 2025, arXiv:2503.14434. [Google Scholar] [CrossRef]
- Wang, K.; Wang, P.; Xu, C. Toward efficient automated feature engineering. arXiv 2022, arXiv:2212.13152. [Google Scholar] [CrossRef]
- Verdonck, T.; Baesens, B.; Oskarsdottir, M.; van den Broucke, S. Special Issue on Advances in Feature Engineering. Mach. Learn. 2021, 113, 3917–3928. [Google Scholar] [CrossRef]
- Duan, Y.; Zhang, G.; Wang, S.; Peng, X.; Ziqi, W.; Mao, J.; Wu, H.; Jiang, X.; Wang, K. CaT-GNN: Enhancing credit card fraud detection via causal temporal graph neural networks. arXiv 2024, arXiv:2402.14708. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Yu, W.; Liu, Y.; Dillon, T.; Rahayu, W. Edge computing-assisted IIoT framework with an autoencoder for fault detection in manufacturing predictive maintenance. IEEE Trans. Ind. Inform. 2022, 19, 5701–5710. [Google Scholar] [CrossRef]
- da Silva, F.R.; Camacho, R.; Tavares, J.M.R. Federated learning in medical image analysis: A systematic survey. Electronics 2023, 13, 47. [Google Scholar] [CrossRef]
- Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774v6. [Google Scholar] [CrossRef]
- Yang, H.; Yuan, J.; Li, C.; Zhao, G.; Sun, Z.; Yao, Q.; Bao, B.; Vasilakos, A.V.; Zhang, J. BrainIoT: Brain-like productive services provisioning with federated learning in industrial IoT. IEEE Internet Things J. 2021, 9, 2014–2024. [Google Scholar] [CrossRef]
- Yang, H.; Yu, T.; Liu, W.; Yao, Q.; Meng, D.; Vasilakos, A.V.; Cheriet, M. PAINet: An integrated passive and active intent network for digital twins in automatic driving. IEEE Commun. Mag. 2024, 63, 32–38. [Google Scholar] [CrossRef]
- Yang, H.; Zhao, X.; Yao, Q.; Yu, A.; Zhang, J.; Ji, Y. Accurate fault location using deep neural evolution network in cloud data center interconnection. IEEE Trans. Cloud Comput. 2020, 10, 1402–1412. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, H.; Zhang, C.; Zhang, J.; Yao, Q.; Wang, Z.; Vasilakos, A.V. Federated cross-chain trust training for distributed smart grid in Web 3.0. Appl. Soft Comput. 2025, 180, 113313. [Google Scholar] [CrossRef]
- Yao, Q.; Yang, H.; Li, C.; Bao, B.; Zhang, J.; Cheriet, M. Federated transfer learning framework for heterogeneous edge IoT networks. China Commun. 2023. [Google Scholar] [CrossRef]
- Gulati, A.; Felahatpisheh, A.; Valderrama, C.E. Feature engineering through two-level genetic algorithm. Mach. Learn. Appl. 2025, 21, 100696. [Google Scholar] [CrossRef]
- Song, X.; Zhang, Y.; Gong, D.; Liu, H.; Zhang, W. Surrogate sample-assisted particle swarm optimization for feature selection on high-dimensional data. IEEE Trans. Evol. Comput. 2022, 27, 595–609. [Google Scholar] [CrossRef]
- Ma, W.; Zhou, X.; Zhu, H.; Li, L.; Jiao, L. A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recognit. 2021, 116, 107933. [Google Scholar] [CrossRef]
- Saheed, Y.K. A binary firefly algorithm based feature selection method on high dimensional intrusion detection data. In Illumination of Artificial Intelligence in Cybersecurity and Forensics; Springer International Publishing: Cham, Switzerland, 2022; pp. 273–288. [Google Scholar]
- Pethe, Y.S.; Gourisaria, M.K.; Singh, P.K.; Das, H. FSBOA: Feature selection using bat optimization algorithm for software fault detection. Discov. Internet Things 2024, 4, 6. [Google Scholar] [CrossRef]
- Arroba, P.; Risco-Martín, J.L.; Zapater, M.; Moya, J.M.; Ayala, J.L. Enhancing regression models for complex systems using evolutionary techniques for feature engineering. arXiv 2024, arXiv:2407.00001. [Google Scholar] [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
- Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Khatir, A.; Capozucca, R.; Khatir, S.; Magagnini, E.; Le Thanh, C.; Riahi, M.K. Advancements and emerging trends in integrating machine learning and deep learning for SHM in mechanical and civil engineering: A comprehensive review. J. Braz. Soc. Mech. Sci. Eng. 2025, 47, 419. [Google Scholar] [CrossRef]
- Mansouri, A.; Tiachacht, S.; Ait-Aider, H.; Khatir, S.; Khatir, A.; Cuong-Le, T. A novel Optimization-Based Damage Detection in Beam Systems Using Advanced Algorithms for Joint-Induced Structural Vibrations. J. Vib. Eng. Technol. 2025, 13, 486. [Google Scholar] [CrossRef]
- Khatir, A.; Capozucca, R.; Khatir, S.; Magagnini, E.; Cuong-Le, T. Enhancing damage detection using reptile search algorithm-optimized neural network and frequency response function. J. Vib. Eng. Technol. 2025, 13, 88. [Google Scholar] [CrossRef]
- Khatir, A.; Capozucca, R.; Khatir, S.; Magagnini, E.; Benaissa, B.; Le Thanh, C.; Wahab, M.A. A new hybrid PSO-YUKI for double cracks identification in CFRP cantilever beam. Compos. Struct. 2023, 311, 116803. [Google Scholar] [CrossRef]
- Khatir, A.; Capozucca, R.; Khatir, S.; Magagnini, E. Vibration-based crack prediction on a beam model using hybrid butterfly optimization algorithm with artificial neural network. Front. Struct. Civ. Eng. 2022, 16, 976–989. [Google Scholar] [CrossRef]
- Khatir, A.; Brahim, A.O.; Magagnini, E. An efficient computational system for defect prediction through neural network and bio-inspired algorithms. HCMCOU J. Sci. Adv. Comput. Struct. 2024, 14, 66–80. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, J.; Zhang, Y. A survey on automated feature engineering for machine learning. Comput. Appl. Softw. 2025, 42, 1–10,40. [Google Scholar]
- Tu, T.; Su, Y.; Tang, Y.; Tan, W.; Ren, S. A more flexible and robust feature selection algorithm. IEEE Access 2023, 11, 141512–141522. [Google Scholar] [CrossRef]
- Pau, S.; Perniciano, A.; Pes, B.; Rubattu, D. An evaluation of feature selection robustness on class noisy data. Information 2023, 14, 438. [Google Scholar] [CrossRef]
- Yi, S.; Liang, Y.; Lu, J.; Liu, W.; Hu, T.; Zhenyu, H.E. Robust feature selection method via joint low-rank reconstruction and projection reconstruction. Tongxin Xuebao 2023, 44, 209–219. [Google Scholar]
- Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: A survey of more than two decades of research. Knowl. Inf. Syst. 2024, 66, 1575–1637. [Google Scholar] [CrossRef]
- Patankar, A.; Patil, P.; Brahmane, M. Feature Forgetting: A Novel Approach to Redundant Feature Pruning in Automated Feature Engineering. 2025. Available online: https://www.researchsquare.com/article/rs-7130210/v1 (accessed on 26 July 2025).
- Li, J.; Wen, Y.; He, L. Scconv: Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
- Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
- Batista, J.E. Embedding domain-specific knowledge from LLMs into the feature engineering pipeline. arXiv 2025, arXiv:2503.21155. [Google Scholar] [CrossRef]
- Stewart, L.; Bach, F.; Berthet, Q. Building Bridges between Regression, Clustering, and Classification. arXiv 2025, arXiv:2502.02996. [Google Scholar] [CrossRef]
- Avelino, J.G.; Cavalcanti, G.D.C.; Cruz, R.M.O. Resampling strategies for imbalanced regression: A survey and empirical analysis. Artif. Intell. Rev. 2024, 57, 82. [Google Scholar] [CrossRef]
- Bennasar, M.; Sayadi, M.K.; Caiado, J.; Figueira, R.; Oliveira, E.; Suárez, J. Feature selection using joint mutual information maximization and correlation-based redundancy control. Expert Syst. Appl. 2021, 183, 115408. [Google Scholar] [CrossRef]
- Faletto, G.; Bien, J. Cluster Stability Selection for Feature Selection. arXiv 2022, arXiv:2201.00494. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/datasets (accessed on 26 April 2025).
- Guo, W.; Liu, T.; Dai, F.; Xu, P. An Improved Whale Optimization Algorithm for Feature Selection. Comput. Mater. Contin. 2020, 62, 337–354. [Google Scholar] [CrossRef]
- Ren, L.; Zhang, W.; Ye, Y.; Li, X. Hybrid Strategy to Improve the High-Dimensional Multi-Target Sparrow Search Algorithm and Its Application. Appl. Sci. 2023, 13, 3589. [Google Scholar] [CrossRef]
- Aghelpour, P.; Mohammadi, B.; Mehdizadeh, S.; Bahrami-Pichaghchi, H.; Duan, Z. A novel hybrid dragonfly optimization algorithm for agricultural drought prediction. Stoch. Environ. Res. Risk Assess. 2021, 35, 2459–2477. [Google Scholar] [CrossRef]
- Rostami, M.; Forouzandeh, S.; Berahmand, K.; Soltani, M. Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 2020, 112, 4370–4384. [Google Scholar] [CrossRef]
- Wang, X.; Wang, Y.; Wong, K.-C.; Li, X. A self-adaptive weighted differential evolution approach for large-scale feature selection. Knowl. Based Syst. 2022, 235, 107633. [Google Scholar] [CrossRef]
- Gao, F. Data for Enhanced Feature Engineering Symmetry Model base on a Novel Dolphin Swarm Algorithm [Order Dataset]. Baidu Netdisk. Note: This Is an Informal Resource Hosted on Baidu Netdisk 2025, a Personal Cloud Storage Service. Available online: https://pan.baidu.com/s/1vm6bv8sw0kyX0ATRsDTgkw?pwd=575z (accessed on 26 August 2025).







| Symbol | Definition |
|---|---|
| Number of subgroups for h-binning | |
| Sample size of subgroup | |
| Global mean of feature | |
| Redundancy penalty coefficient |
| ID | DataSet | Abbreviation | Instance | Feature | Source |
|---|---|---|---|---|---|
| 1 | Energydata | ED | 19,735 | 28 | UCI |
| 2 | WEC_Perth_49 | WECP | 36,043 | 149 | UCI |
| 3 | ValidationData | VD | 1111 | 529 | UCI |
| 4 | ProcessedDJI | PDJI | 1984 | 82 | UCI |
| 5 | CommViolPredUnnormalizedData | CVPUD | 2214 | 145 | UCI |
| 6 | Default_features_1059_tracks | DFT | 1059 | 70 | UCI |
| 7 | Slice_localization_data | SLD | 53,500 | 386 | UCI |
| 8 | Superconductivty | SD | 21,263 | 82 | UCI |
| 9 | UJIndoorLoc-trainingData | UJILD | 19,937 | 529 | UCI |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| ED | 7 | 8 | 4 | 5 | 5 | 3 |
| WECP | 52 | 68 | 74 | 79 | 72 | 34 |
| VD | 187 | 247 | 279 | 263 | 250 | 101 |
| PDJI | 29 | 24 | 19 | 30 | 23 | 26 |
| CVPUD | 57 | 78 | 76 | 72 | 77 | 38 |
| DFT | 31 | 23 | 17 | 23 | 25 | 19 |
| SLD | 198 | 196 | 202 | 217 | 196 | 109 |
| SD | 33 | 35 | 24 | 25 | 24 | 13 |
| UJILD | 189 | 275 | 245 | 268 | 275 | 88 |
| Average | 87.00 | 106.00 | 104.44 | 109.11 | 105.22 | 47.89 |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| ED | 71.43% | 67.86% | 82.14% | 78.57% | 78.57% | 85.71% |
| WECP | 64.19% | 53.38% | 49.32% | 45.95% | 50.68% | 76.35% |
| VD | 64.46% | 53.12% | 47.07% | 50.09% | 52.55% | 80.72% |
| PDJI | 63.41% | 69.51% | 75.61% | 62.20% | 70.73% | 67.07% |
| CVPUD | 60.00% | 45.52% | 46.90% | 49.66% | 46.21% | 73.10% |
| DFT | 54.29% | 65.71% | 74.29% | 65.71% | 62.86% | 71.43% |
| SLD | 48.45% | 48.96% | 47.41% | 43.52% | 48.96% | 71.50% |
| SD | 58.54% | 56.10% | 69.51% | 68.29% | 69.51% | 82.93% |
| UJILD | 64.08% | 47.83% | 53.50% | 49.15% | 47.83% | 83.18% |
| Average | 60.98% | 56.44% | 60.64% | 57.02% | 58.66% | 76.89% |
| Dataset | RAW | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|---|
| ED | 1 | 0.998 (6) | 0.9998 (3) | 0.9998 (3) | 0.9998 (3) | 0.9998 (3) | 0.9998 (3) |
| WECP | 0.9974 | 0.9932 (4.5) | 0.9934 (2) | 0.9933 (3) | 0.9931 (6) | 0.9932 (4.5) | 0.9972 (1) |
| VD | 0.8816 | 0.6994 (6) | 0.8397 (3) | 0.8412 (2) | 0.8048 (4) | 0.7124 (5) | 0.863 (1) |
| PDJI | 0.7813 | 0.7726 (1) | 0.7668 (4) | 0.7693 (2) | 0.6629 (6) | 0.6662 (5) | 0.7658 (3) |
| CVPUD | 0.955 | 0.7178 (5) | 0.6979 (6) | 0.9413 (2) | 0.7511 (3) | 0.7454 (4) | 0.9321 (1) |
| DFT | 0.7756 | 0.7208 (3) | 0.7164 (5) | 0.7142 (6) | 0.7267 (2) | 0.7206 (4) | 0.8569 (1) |
| SLD | 0.9915 | 0.949 (2) | 0.9474 (3) | 0.945 (4) | 0.9445 (5) | 0.9432 (6) | 0.997 (1) |
| SD | 0.9217 | 0.8751 (5) | 0.8702 (6) | 0.8765 (4) | 0.8874 (2) | 0.8767 (3) | 0.9256 (1) |
| UJILD | 0.9999 | 0.9682 (6) | 0.998 (4) | 0.9997 (1.5) | 0.9997 (1.5) | 0.9861 (5) | 0.9996 (3) |
| Average | 0.9227 | 0.8549 (5) | 0.8700 (3) | 0.8978 (2) | 0.8633 (4) | 0.8493 (6) | 0.9263 (1) |
| Dataset | RAW | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|---|
| ED | 0.0767 | 0.2203 (5) | 0.21 (1) | 0.2232 (6) | 0.2126 (3) | 0.2121 (2) | 0.2168 (4) |
| WECP | 6296.08 | 10,114.9622 (5) | 9952.0156 (2) | 10,048.0822 (3) | 10,200.03 (6) | 10,108.1293 (4) | 6450.2137 (1) |
| VD | 170,092.4411 | 272,350.3021 (6) | 198,663.4525 (3) | 197,137.7314 (2) | 218,482.4 (4) | 266,203.1996 (5) | 180,742.3109 (1) |
| PDJI | 0.4715 | 0.4807 (1) | 0.8266 (6) | 0.4861 (2) | 0.5875 (5) | 0.5855 (4) | 0.4877 (3) |
| CVPUD | 519.0804 | 1409.5336 (5) | 1460.5124 (6) | 617.5007 (1) | 1324.887 (3) | 1339.1683 (4) | 670.6632 (2) |
| DFT | 23.5548 | 26.3888 (4) | 26.5613 (5) | 26.7005 (6) | 26.0769 (2) | 26.3060 (3) | 18.7478 (1) |
| SLD | 2.0597 | 5.0445 (2) | 5.1216 (3) | 5.2383 (4) | 5.264 (5) | 5.3258 (6) | 1.2119 (1) |
| SD | 9.5736 | 12.0863 (6) | 12.0021 (3) | 12.0231 (5) | 11.9754 (2) | 12.0116 (4) | 9.3235 (1) |
| UJILD | 4575.9941 | 24,735.8124 (4) | 24,680.8435 (3) | 9414.0578 (6) | 28,421.17 (5) | 65,147.41 (1) | 10,312.0809 (2) |
| Average | 20,168.8147 | 34,294.9812 (5) | 26,089.0606 (3) | 24,140.2270 (2) | 28,719.1782 (4) | 38,093.5942 (6) | 22,022.8063 (1) |
| Dataset | RAW | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|---|
| ED | 0.0169 | 0.0312 (1) | 0.0313 (2) | 0.0337 (6) | 0.0315 (3) | 0.0317 (4) | 0.0325 (5) |
| WECP | 0.0009 | 0.0020 (3.5) | 0.0099 (6) | 0.0020 (3.5) | 0.0020 (3.5) | 0.0020 (3.5) | 0.0009 (1) |
| VD | 0.0001 | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) |
| PDJI | 3.04978 × 1012 | 3.247 × 1012 (3) | 4.052 × 1012 (4) | 2.256 × 1012 (1) | 5.13 × 1012 (6) | 4.395 × 1012 (5) | 2.673 × 1012 (2) |
| CVPUD | 0.0316 | 0.2366 (6) | 0.2347 (5) | 0.0603 (1) | 0.2198 (3) | 0.2241 (4) | 0.0654 (2) |
| DFT | 3.9544 | 4.98 (4) | 5.1666 (6) | 4.9009 (3) | 4.9874 (5) | 4.6674 (2) | 2.4589 (1) |
| SLD | 0.0375 | 0.0902 (2) | 0.0903 (4) | 0.0914 (5) | 0.0902 (2) | 0.093 (6) | 0.0209 (1) |
| SD | 6.4343 | 8.7032 (4) | 8.7154 (5) | 8.4673 (2) | 8.4825 (3) | 8.7652 (6) | 5.5687 (1) |
| UJILD | 0 | 0 (3.5) | 0 (3.5) | 0 (3.5) | 0 (3.5) | 0 (3.5) | 0 (3.5) |
| Average | 1.9211 (3) | 2.0334 (4) | 1.7568 (2) | 2.1048 (6) | 2.0198 (5) | 1.2023 (1) |
| Paired Variables | Median ± Standard Deviation | z | df | p | Cohen’s d | ||
|---|---|---|---|---|---|---|---|
| Paired 1 | Paired 2 | Paired Difference (Paired 1 − Paired 2) | |||||
| EFEM vs. IWOA-FS | 0.932 ± 0.083 | 0.875 ± 0.127 | 0.048 ± 0.08 | 2.31 | 8 | 0.01953 | 0.668 |
| EFEM vs. HDMT-SSA | 0.932 ± 0.083 | 0.87 ± 0.122 | 0.023 ± 0.081 | 2.521 | 8 | 0.01172 | 0.543 |
| EFEM vs. HDA-ADP | 0.932 ± 0.083 | 0.931 ± 0.104 | 0.004 ± 0.047 | 1.96 | 8 | 0.04995 | 0.317 |
| EFEM vs. MOPSO-NC | 0.932 ± 0.083 | 0.887 ± 0.13 | 0.052 ± 0.064 | 2.38 | 8 | 0.01729 | 0.579 |
| EFEM vs. SWDE-FS | 0.932 ± 0.083 | 0.877 ± 0.138 | 0.054 ± 0.069 | 2.521 | 8 | 0.01172 | 0.681 |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| DFT | 1149.98 (4) | 1019.76 (3) | 1287.56 (5) | 1319.82 (6) | 890.34 (1) | 1017.76 (2) |
| VD | 4.98 (4) | 3.19 (2) | 5.76 (5) | 7.1 (6) | 4.56 (3) | 2.27 (1) |
| PDJI | 2567.83 (3) | 2141.09 (1) | 8987.48 (6) | 3000.76 (5) | 2382.37 (2) | 2863.43 (4) |
| CVPUD | 57.21 (6) | 14.94 (2) | 20.12 (3) | 21.09 (4) | 36.37 (5) | 13.81 (1) |
| Average time | 945 (3) | 794.745 (1) | 2575.23 (6) | 1087.1925 (5) | 828.41 (2) | 974.318 (4) |
| Dataset | DSA | DSA–mRMR | RFA | AE | SSL | EFEM |
|---|---|---|---|---|---|---|
| ED | 0.9521 (6) | 0.9816 (4) | 0.9698 (5) | 0.9951 (2) | 0.9901 (3) | 0.9998 (1) |
| WECP | 0.9327 (6) | 0.9656 (5) | 0.9671 (4) | 0.9785 (3) | 0.9904 (2) | 0.9972 (1) |
| VD | 0.8115 (6) | 0.8471 (4) | 0.8237 (5) | 0.8527 (3) | 0.859 (2) | 0.863 (1) |
| PDJI | 0.6911 (6) | 0.7518 (4) | 0.7358 (5) | 0.7602 (3) | 0.7675 (1) | 0.7658 (2) |
| CVPUD | 0.8845 (6) | 0.9109 (4) | 0.9069 (5) | 0.9189 (3) | 0.9287 (2) | 0.9321 (1) |
| DFT | 0.7961 (6) | 0.8382 (4) | 0.8195 (5) | 0.8414 (3) | 0.8537 (2) | 0.8569 (1) |
| SLD | 0.9566 (6) | 0.9821 (4) | 0.9752 (5) | 0.9863 (3) | 0.9891 (2) | 0.997 (1) |
| SD | 0.8764 (6) | 0.9024 (4) | 0.8956 (5) | 0.9145 (3) | 0.9214 (2) | 0.9256 (1) |
| UJILD | 0.9435 (6) | 0.9877 (4) | 0.9702 (5) | 0.9927 (3) | 0.9986 (2) | 0.9996 (1) |
| Average | 0.8716 (6) | 0.9075 (4) | 0.8960 (5) | 0.9156 (3) | 0.9221 (2) | 0.9263 (1) |
| 0.6 | 0.7 | 0.8 | 0.7 | ||||
|---|---|---|---|---|---|---|---|
| Dataset | BLR | RF | BLR | RF | BLR | RF | CatBoost |
| ED | 0.9107 | 0.9395 | 0.9209 | 0.9439 | 0.9054 | 0.9239 | 0.9618 |
| WECP | 0.8876 | 0.9401 | 0.9165 | 0.9488 | 0.8947 | 0.9498 | 0.9521 |
| VD | 0.779 | 0.8001 | 0.7809 | 0.8131 | 0.7612 | 0.7823 | 0.8176 |
| PDJI | 0.6478 | 0.6976 | 0.6598 | 0.7002 | 0.6295 | 0.6809 | 0.7189 |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| WECP | 0.9785 (4) | 0.9859 (2) | 0.9841 (3) | 0.9527 (6) | 0.9672 (5) | 0.9933 (1) |
| VD | 0.5892 (6) | 0.7988 (2) | 0.7230 (3) | 0.6558 (4) | 0.6014 (5) | 0.8006 (1) |
| DFT | 0.7249 (3) | 0.6325 (5) | 0.5891 (6) | 0.7513 (2) | 0.6987 (4) | 0.7764 (1) |
| CVPUD | 0.6419 (5) | 0.5537 (6) | 0.7891 (2) | 0.7258 (4) | 0.7322 (3) | 0.7986 (1) |
| Average | 0.7336 (6) | 0.7427 (5) | 0.7713 (4) | 0.7714 (3) | 0.7499 (2) | 0.8422 (1) |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| WECP | 20,172.8539 (3) | 23,821.3829 (5) | 26,865.1093 (6) | 18,920.9453 (2) | 23,018.8279 (4) | 10,045.2812 (1) |
| VD | 386,459.2094 (6) | 240,875.0185 (2) | 246,910.7593 (3) | 309,285.7915 (4) | 372,391.8407 (5) | 220,817 (1) |
| DFT | 40.6071 (4) | 46.6051 (3) | 58.2379 (6) | 38.0857 (2) | 45.9170 (5) | 29.6055 (1) |
| CVPUD | 2891.1892 (5) | 3403.3978 (6) | 967.9319 (1) | 2098.3655 (3) | 2181.5067 (4) | 1187.0132 (2) |
| Average | 102,390.9649 (6) | 67,036.6011 (2) | 68,700.5096 (3) | 82,585.797 (4) | 99,409.5231 (5) | 58,019.725 (1) |
| Dataset | IWOA-FS | HDMT-SSA | HDA-ADP | MOPSO-NC | SWDE-FS | EFEM |
|---|---|---|---|---|---|---|
| WECP | 0.0040 (2) | 0.0258 (6) | 0.0090 (3) | 0.0180 (5) | 0.0120 (4) | 0.002 (1) |
| VD | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) | 0.0001 (3.5) |
| DFT | 12.4891 (5) | 15.8349 (6) | 12.1374 (3) | 12.5109 (4) | 9.0215 (2) | 5.436 (1) |
| CVPUD | 0.8961 (6) | 0.8752 (5) | 0.1970 (1) | 0.5898 (3) | 0.6544 (4) | 0.1588 (2) |
| Average | 3.3473 (5) | 4.1838 (6) | 3.0858 (3) | 3.279475 (4) | 2.422 (2) | 1.3992 (1) |
| Sales_Region | Item_Code | Jan 2019 | Feb 2019 | Mar 2019 |
|---|---|---|---|---|
| 101 | 20002 | 82.55 | 67.367 | 55.05 |
| 101 | 20003 | 181.40 | 152.09 | 140.16 |
| 101 | 20006 | 71.06 | 50.05 | 66.04 |
| … | … | … | … | … |
| 105 | 22066 | 1273.51 | 828.63 | 549.08 |
| 105 | 22072 | 380.45 | 225.94 | 200.52 |
| Model | Month | Week | Day | |||
|---|---|---|---|---|---|---|
| Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | |
| RF | 1172 | 729 | 429 | 291 | 159 | 131 |
| BLR | 1139 | 662 | 386 | 278 | 138 | 120 |
| SVR | 1119 | 569 | 447 | 246 | 134 | 122 |
| CatBoost | 1228 | 652 | 427 | 284 | 148 | 121 |
| Fusion | 295 | 91 | 44 | |||
| Model | Month | Week | Day | |||
|---|---|---|---|---|---|---|
| Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | |
| RF | 1460 | 1917 | 753 | 1206 | 451 | 699 |
| BLR | 1740 | 2328 | 709 | 1127 | 346 | 445 |
| SVR | 316 | 458 | 215 | 352 | 324 | 408 |
| CatBoost | 1772 | 2370 | 789 | 1261 | 357 | 457 |
| Fusion | 335 | 247 | 203 | |||
| Item_Code | Forecast Results |
|---|---|
| 20001 | 0 |
| 20002 | 32.035 |
| 20003 | 354.009 |
| … | … |
| 22081 | 682.225 |
| 22084 | 2.857 |
| = 505.143 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, F.; Abisado, M. Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm. Symmetry 2025, 17, 1736. https://doi.org/10.3390/sym17101736
Gao F, Abisado M. Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm. Symmetry. 2025; 17(10):1736. https://doi.org/10.3390/sym17101736
Chicago/Turabian StyleGao, Fei, and Mideth Abisado. 2025. "Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm" Symmetry 17, no. 10: 1736. https://doi.org/10.3390/sym17101736
APA StyleGao, F., & Abisado, M. (2025). Enhanced Feature Engineering Symmetry Model Based on Novel Dolphin Swarm Algorithm. Symmetry, 17(10), 1736. https://doi.org/10.3390/sym17101736

