Hybrid Gene Selection Algorithm for Cancer Classification Using Nuclear Reaction Optimization (NRO)
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset and Preprocessing
2.2. Filter-Based Dimensionality Reduction
2.2.1. Information Gain
2.2.2. F-Score
- be the mean of in the positive class;
- be the mean of in the negative class;
- be the global mean of across all samples;
- and be the number of samples in the positive and negative classes, respectively.
2.2.3. ReliefF
2.2.4. Minimum Redundancy Maximal Relevancy (mRMR)
2.3. Applying the Nuclear Reaction Optimization (NRO) Algorithm
2.3.1. Nuclear Fission Phase
- 5.
- the novel solution produced during fission;
- 6.
- the optimal solution identified to date;
- 7.
- a random variable produced around , with determining the dispersion;
- 8.
- : parameters regulating the extent of exploration (see Equations (7) and (8) below);
- 9.
- a random variable introducing variability in the solution;
- 10.
- mutation factors determining the scale of adjustments for subaltern and essential fission products, respectively;
- 11.
- heated neutron, calculated as where are two random solutions;
- 12.
- the probability governing whether subaltern or essential fission products are produced.
- 13.
- : current generation number; the term guarantees a reduction in step sizes as iterations advance;
- 14.
- the distance between the current solution and the best-known solution;
- 15.
- the distance between a random solution and the best-known solution.
- 16.
- a random number uniformly distributed between 0 and 1;
- 17.
- The integer rounding guarantees discrete adjustment levels for the mutation process.
2.3.2. Nuclear Fusion Phase
- 18.
- : components of two randomly selected fission solutions;
- 19.
- current solution;
- 20.
- random value for diversity.
- 21.
- : a scaling factor controlling the magnitude of jumps;
- 22.
- heavy-tailed random step size, introducing both small and large adjustments;
- 23.
- indicates element-wise multiplication;
- 24.
- : best-known solution in the dimension.
- 25.
- : refined solution after fusion;
- 26.
- best-known solution guiding the search;
- 27.
- ionized solutions selected for comparison;
- 28.
- random value for diversity.
Algorithm 1 Nuclear Reaction Optimization (NRO) Algorithm for Gene Selection |
Require: Dataset D, Population size N = 500, Max generations T = 30, Early stopping patience P = 5 Ensure: Optimized subset of gene |
▷ Preprocessing Phase |
1: Handle missing values (mean imputation) 2: Normalize features using Z-score 3: Encode categorical labels |
▷ Filter Evaluation Phase |
4: Apply filter methods: F-score, Information Gain, ReliefF, and mRMR 5: for each filter method do 6: for each subset size s ∈ {50, 100, …, 500} do 7: Select top s genes using the filter 8: Evaluate subset using SVM classifier with LOOCV 9: end for 10: end for 11: Select best filter method with its best-performing gene subset |
▷ NRO Optimization Phase |
12: for each subset size k ∈ {2, …, 25} do 13: Initialize population of binary vectors of size k over selected genes 14: Set bounds [0, 1], initialize global best solution 15: Compute initial fitness using SVM + LOOCV 16: Initialize no improve ← 0 |
▷ Fission Phase: Exploration via perturbation |
17: for g = 1 to T do 18: for each solution do 19: Generate new solutions as per Equation (6) 20: Adjust step sizes using Equations (7) and (8) 21: Apply mutation using Equations (9) and (10) 22: end for |
▷ Fusion Phase: Exploitation with embedded crossover |
23: for each solution do 24: Adjust using ionization (Equation (11)) 25: if solutions are similar then 26: Apply Lévy flight adjustment (Equation (12)) 27: end if 28: Fuse solutions (Equation (13)) 29: if solutions are still similar then 30: Apply Lévy flight adjustment (Equation (14)) 31: end if 32: end for |
▷ Fitness Evaluation and Best Solution Update |
33: for each solution do 34: Compute LOOCV classification accuracy 35: if > then 36: Update ← , reset no_improve ← 0 37: else 38: Increment no_improve ← no_improve + 1 39: end if 40: end for 41: if no_improve ≥ then 42: break 43: end if 44: end for 45: end for |
▷ Final Output |
46: return best gene subset |
2.4. Classification and Fitness Evaluation
3. Results and Discussion
3.1. Dimensionality Reduction
3.2. F-Score-Based Nuclear Reaction Optimization (F-NRO) Algorithm
3.3. Comparative Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fitzgerald, R.C.; Antoniou, A.C.; Fruk, L.; Rosenfeld, N. The future of early cancer detection. Nat. Med. 2022, 28, 666–677. [Google Scholar] [CrossRef] [PubMed]
- Krzyszczyk, P.; Acevedo, A.; Davidoff, E.J.; Timmins, L.M.; Marrero-Berrios, I.; Patel, M.; White, C.; Lowe, C.; Sherba, J.J.; Hartmanshenn, C.; et al. The growing role of precision and personalized medicine for cancer treatment. Technology 2018, 6, 79–100. [Google Scholar] [CrossRef]
- Lu, Y.; Han, J. Cancer classification using gene expression data. Inf. Syst. 2003, 28, 243–268. [Google Scholar] [CrossRef]
- Alkamli, S.S.; Alshamlan, H.M. Performance Evaluation of Hybrid Bio-Inspired and Deep Learning Algorithms in Gene Selection and Cancer Classification. IEEE Access 2025, 13, 59977–59990. [Google Scholar] [CrossRef]
- Wei, Z.; Huang, C.; Wang, X.; Han, T.; Li, Y. Nuclear Reaction Optimization: A Novel and Powerful Physics-Based Algorithm for Global Optimization. IEEE Access 2019, 7, 66084–66109. [Google Scholar] [CrossRef]
- Alkamli, S.; Alshamlan, H. Evaluating the Nuclear Reaction Optimization (NRO) Algorithm for Gene Selection in Cancer Classification. Diagnostics 2025, 15, 927. [Google Scholar] [CrossRef] [PubMed]
- Khan, J.; Wei, J.S.; Ringnér, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001, 7, 673–679. [Google Scholar] [CrossRef]
- Alizadeh, A.A.; Eisen, M.B.; Davis, R.E.; Ma, C.; Lossos, I.S.; Rosenwald, A.; Boldrick, J.C.; Sabet, H.; Tran, T.; Yu, X.; et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403, 503–511. [Google Scholar] [CrossRef]
- Armstrong, S.A.; Staunton, J.E.; Silverman, L.B.; Pieters, R.; den Boer, M.L.; Minden, M.D.; Sallan, S.E.; Lander, E.S.; Golub, T.R.; Korsmeyer, S.J. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 2002, 30, 41–47. [Google Scholar] [CrossRef] [PubMed]
- Beer, D.G.; Kardia, S.L.R.; Huang, C.-C.; Giordano, T.J.; Levin, A.M.; Misek, D.E.; Lin, L.; Chen, G.; Gharib, T.G.; Thomas, D.G.; et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 2002, 8, 816–824. [Google Scholar] [CrossRef]
- Alon, U.; Barkai, N.; Notterman, D.A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A.J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 1999, 96, 6745–6750. [Google Scholar] [CrossRef]
- Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286, 531–537. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M.; Pei, J. 8-Classification: Basic Concepts. In Data Mining, 3rd ed.; Han, J., Kamber, M., Pei, J., Eds.; The Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann: Boston, MA, USA, 2012; pp. 327–391. ISBN 978-0-12-381479-1. [Google Scholar]
- Chen, Y.-W.; Lin, C.-J. Combining SVMs with Various Feature Selection Strategies. In Feature Extraction: Foundations and Applications; Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 315–324. ISBN 978-3-540-35488-8. [Google Scholar]
- Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [CrossRef]
- Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Machine Learning, Proceedings of the ECML-94, Catania, Italy, 6–8 April 1994; Bergadano, F., De Raedt, L., Eds.; Bergadano, F., De Raedt, L., Eds.; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Zhuoran, Z.; Changqiang, H.; Hanqiao, H.; Shangqin, T.; Kangsheng, D. An optimization method: Hummingbirds optimization algorithm. J. Syst. Eng. Electron. 2018, 29, 386–404. [Google Scholar] [CrossRef]
- AlShamlan, H.; AlMazrua, H. Enhancing Cancer Classification through a Hybrid Bio-Inspired Evolutionary Algorithm for Biomarker Gene Selection. CMC 2024, 79, 675–694. [Google Scholar] [CrossRef]
- Nssibi, M.; Manita, G.; Chhabra, A.; Mirjalili, S.; Korbaa, O. Gene selection for high dimensional biological datasets using hybrid island binary artificial bee colony with chaos game optimization. Artif. Intell. Rev. 2024, 57, 51. [Google Scholar] [CrossRef]
- Lumumba, V.; Sang, D.; Mpaine, M.; Makena, N.; Musyimi, D. Kavita Comparative Analysis of Cross-Validation Techniques: LOOCV, K-folds Cross-Validation, and Repeated K-folds Cross-Validation in Machine Learning Models. Am. J. Theor. Appl. Stat. 2024, 13, 127–137. [Google Scholar] [CrossRef]
- Li, Z.-Z.; Wang, F.-L.; Qin, F.; Yusoff, Y.B.; Zain, A.M. Feature Selection of Gene Expression Data Using a Modified Artificial Fish Swarm Algorithm With Population Variation. IEEE Access 2024, 12, 72688–72706. [Google Scholar] [CrossRef]
- Almugren, N.; Alshamlan, H.M. New Bio-Marker Gene Discovery Algorithms for Cancer Gene Expression Profile. IEEE Access 2019, 7, 136907–136913. [Google Scholar] [CrossRef]
- Parhi, P.; Bisoi, R.; Dash, P.K. Influential Gene Selection From High-Dimensional Genomic Data Using a Bio-Inspired Algorithm Wrapped Broad Learning System. IEEE Access 2022, 10, 49219–49232. [Google Scholar] [CrossRef]
- Alshamlan, H.; Badr, G.; Alohali, Y. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. BioMed Res. Int. 2015, 2015, 604910. [Google Scholar] [CrossRef]
- Abdi, M.J.; Hosseini, S.M.; Rezghi, M. A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. Comput. Math. Methods Med. 2012, 2012, 320698. [Google Scholar] [CrossRef] [PubMed]
- El Akadi, A.; Amine, A.; El Ouardighi, A.; Aboutajdine, D. A New gene selection approach based on Minimum Redundancy-Maximum Relevance (MRMR) and Genetic Algorithm (GA). In Proceedings of the 2009 IEEE/ACS International Conference on Computer Systems and Applications, Rabat, Morocco, 10–13 May 2009; pp. 69–75. [Google Scholar] [CrossRef]
- Alshamlan, H.M. Co-ABC: Correlation artificial bee colony algorithm for biomarker gene discovery using gene expression profile. Saudi J. Biol. Sci. 2018, 25, 895–903. [Google Scholar] [CrossRef]
- Pirgazi, J.; Kallehbasti, M.M.P.; Sorkhi, A.G.; Kermani, A. An efficient hybrid filter-wrapper method based on improved Harris Hawks optimization for feature selection. BioImpacts 2024, 15, 30340. [Google Scholar] [CrossRef]
- Hameed, S.S.; Muhammad, F.F.; Hassan, R.; Saeed, F. Gene Selection and Classification in Microarray Datasets using a Hybrid Approach of PCC-BPSO/GA with Multi Classifiers. J. Comput. Sci. 2018, 14, 868–880. [Google Scholar] [CrossRef]
- Alkamli, S.; Alshamlan, H. GNR: Genetic-Embedded Nuclear Reaction Optimization with F-Score Filter for Gene Selection in Cancer Classification. Int. J. Mol. Sci. 2025, 26, 7587. [Google Scholar] [CrossRef] [PubMed]
Microarray Dataset | Classes | Samples | Total Genes |
---|---|---|---|
Colon [11] | 2 | 62 | 2000 |
Leukemia1 [12] | 2 | 72 | 7129 |
Leukemia2 [9] | 3 | 72 | 7129 |
Lung [10] | 2 | 96 | 7129 |
Lymphoma [8] | 3 | 62 | 4026 |
SRBCT [7] | 4 | 83 | 2308 |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
Colon | 2000 | 500 | 2 | 91.94% | 88.98% | 83.87% | 92.58% | 91.94% | 92.04% | [88.38%, 89.57%] |
3 | 91.94% | 90.59% | 87.10% | 92.83% | 91.94% | 91.97% | [90.20%, 90.98%] | |||
4 | 93.55% | 91.51% | 87.10% | 93.62% | 93.55% | 93.55% | [91.12%, 91.89%] | |||
5 | 95.16% | 92.53% | 87.10% | 95.26% | 95.16% | 95.18% | [92.04%, 93.01%] | |||
9 | 96.77% | 94.35% | 90.32% | 97.04% | 96.77% | 96.80% | [93.94%, 94.77%] | |||
22 | 98.39% | 94.84% | 91.94% | 98.43% | 98.39% | 98.38% | [94.38%, 95.30%] |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
Leukemia1 | 7129 | 500 | 2 | 98.61% | 98.29% | 94.44% | 98.64% | 98.61% | 98.60% | [98.06%, 98.51%] |
3 | 100% | 98.94% | 95.83% | 100% | 100% | 100% | [98.71%, 99.16%] | |||
4 | 100% | 99.40% | 95.83% | 100% | 100% | 100% | [99.14%, 99.66%] |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
Leukemia2 | 7129 | 500 | 2 | 95.83% | 92.36% | 84.72% | 95.93% | 95.83% | 95.84% | [91.89%, 92.83%] |
3 | 95.83% | 94.49% | 88.89% | 96.28% | 95.83% | 95.93% | [94.09%, 94.89%] | |||
4 | 98.61% | 96.06% | 91.67% | 98.75% | 98.61% | 98.64% | [95.59%, 96.54%] | |||
5 | 98.61% | 96.90% | 91.67% | 98.75% | 98.61% | 98.64% | [96.52%, 97.28%] | |||
6 | 98.61% | 97.45% | 93.06% | 98.75% | 98.61% | 98.64% | [97.18%, 97.73%] | |||
7 | 100% | 98.19% | 93.06% | 100% | 100% | 100% | [97.86%, 98.53%] |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
Lung | 7129 | 500 | 2 | 100% | 100% | 98.96% | 100% | 100% | 100% | 100% (constant) |
3 | 100% | 100% | 100% | 100% | 100% | 100% | 100% (constant) |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
Lymphoma | 4026 | 500 | 2 | 100% | 98.94% | 95.45% | 100% | 100% | 100% | [98.68%, 99.20%] |
3 | 100% | 99.95% | 96.97% | 100% | 100% | 100% | [99.85%, 100.05%] |
Dataset | Total Genes | Filtered Genes | Selected Genes | Accuracy | Precision | Recall | F1-Score | CI (95%) | ||
---|---|---|---|---|---|---|---|---|---|---|
Best | Average | Worst | ||||||||
SRBCT | 2308 | 500 | 2 | 86.75% | 81.29% | 73.49% | 87.42% | 86.75% | 86.84% | [80.66%, 81.91%] |
3 | 93.98% | 90.00% | 80.72% | 94.21% | 93.98% | 93.99% | [89.37%, 90.63%] | |||
4 | 96.39% | 93.21% | 83.13% | 96.77% | 96.39% | 96.39% | [92.64%, 93.79%] | |||
5 | 97.59% | 94.98% | 87.95% | 97.77% | 97.59% | 97.61% | [94.39%, 95.57%] | |||
6 | 98.80% | 95.86% | 89.16% | 98.90% | 98.80% | 98.81% | [95.40%, 96.33%] | |||
7 | 100% | 96.71% | 90.36% | 100% | 100% | 100% | [96.18%, 97.23%] |
Algorithm | Colon | Leukemia1 | Leukemia2 | Lung | Lymphoma | SRBCT |
---|---|---|---|---|---|---|
F-NRO | 98.39% (22) | 100% (3) | 100% (7) | 100% (2) | 100% (2) | 100% (7) |
F-FSAPV [22] | 96.9% (7) | 100% (3) | - | - | - | 100% (5) |
F-FF [23] | 94.3% (15) | 100% (5) | 97.8% (10) | 100% (2) | - | 100% (8) |
Relief-MBO [24] | 98.2% (3) | 99.45% (5) | - | - | 99.64% (3) | 99.87% (6) |
mRMR-ABC [25] | 96.77% (15) | 100% (14) | 100% (20) | 100% (8) | 100% (5) | 100% (20) |
mRMR-PSO [26] | 90.32% (10) | 100% (18) | - | - | - | - |
mRMR-GA [27] | - | - | - | 100% (15) | 95% (5) | - |
Co-ABC [28] | 96.77% (9) | 100% (3) | 100% (6) | 100% (2) | 100% (2) | 100% (4) |
HHO-GRASP [29] | 93.88% (7) | - | - | - | - | - |
PCC-GA [30] | 91.94% (29) | - | 100% (35) | 97.54% (42) | 100% (39) | 100% (20) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alkamli, S.; Alshamlan, H. Hybrid Gene Selection Algorithm for Cancer Classification Using Nuclear Reaction Optimization (NRO). Curr. Issues Mol. Biol. 2025, 47, 683. https://doi.org/10.3390/cimb47090683
Alkamli S, Alshamlan H. Hybrid Gene Selection Algorithm for Cancer Classification Using Nuclear Reaction Optimization (NRO). Current Issues in Molecular Biology. 2025; 47(9):683. https://doi.org/10.3390/cimb47090683
Chicago/Turabian StyleAlkamli, Shahad, and Hala Alshamlan. 2025. "Hybrid Gene Selection Algorithm for Cancer Classification Using Nuclear Reaction Optimization (NRO)" Current Issues in Molecular Biology 47, no. 9: 683. https://doi.org/10.3390/cimb47090683
APA StyleAlkamli, S., & Alshamlan, H. (2025). Hybrid Gene Selection Algorithm for Cancer Classification Using Nuclear Reaction Optimization (NRO). Current Issues in Molecular Biology, 47(9), 683. https://doi.org/10.3390/cimb47090683