Binary or Integer Chromosome: Which Is the Best Structure for Supervised Machine Learning Using Genetic Algorithms?
Abstract
:1. Introduction
2. GAs for Classification Task
2.1. Computational Evolutionary Environment (CEE)
- True Positive (tp): The predicted class matches C, and the case belongs to C;
- True Negative (tn): The predicted class does not match C, and the case does not belong to C;
- False Positive (fp): The predicted class matches C, but the case does not belong to C;
- False Negative (fn): The predicted class does not match C, but the case belongs to C.
2.2. Nonlinear Computational Evolutionary Environment (NLCEE)
3. Datasets
4. Binary Nonlinear Computational Evolutionary Environment (BIN-NLCEE)
4.1. Introduction
4.2. Individual Representation
4.3. Genetic Operators and Parameters
- Population size: 50 and 100;
- Generations: 50 and 100;
- Mutation rate (%): 40 and 50;
- Selection method: Roulette and Stochastic tournament;
- Weight field: 6, 7, and 8;
- Number of bits: 5, 6, 7, and 8.
5. Results
6. Final Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Goldberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley: New York, NY, USA, 1989. [Google Scholar]
- Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; University of Michigan Press: Ann Arbor, MI, USA, 1975. [Google Scholar]
- Tan, P.N.; Steinbach, M.; Kumar, V. Introduction to Data Mining; Pearson Education India: London, UK, 2016. [Google Scholar]
- Kocadagli, O.; Baygul, A.; Gokmen, N.; Incir, S.; Aktan, C. Clinical Prognosis Evaluation of COVID-19 Patients: An Interpretable Hybrid Machine Learning Approach. Curr. Res. Transl. Med. 2022, 70, 103319. [Google Scholar] [CrossRef] [PubMed]
- Rahimi, S.A.; Kolahdoozi, M.; Mitra, A.; Salmeron, J.L.; Navali, A.M.; Sadeghpour, A.; Mohammadi, S.A.M. Quantum-Inspired Interpretable AI-Empowered Decision Support System for Detection of Early-Stage Rheumatoid Arthritis in Primary Care Using Scarce Dataset. Mathematics 2022, 10, 496. [Google Scholar] [CrossRef]
- Mahya, P.; Fürnkranz, J. An Empirical Comparison of Interpretable Models to Post-Hoc Explanations. AI 2023, 4, 426–436. [Google Scholar] [CrossRef]
- Kelly, J.; Davis, L. A Hybrid Genetic Algorithm for Classification. In Proceedings of the Twelveth International Joint Conference on Artificial Intelligence, Sidney, Australia, 24–30 August 1991; pp. 645–650. [Google Scholar]
- Janikow, C.Z. A knowledge-intensive genetic algorithm for supervised learning. Mach. Learn. 1993, 13, 189–228. [Google Scholar] [CrossRef]
- Fidelis, M.V.; Lopes, H.S.; Freitas, A.A. Discovering Comprehensible Classification Rules a Genetic Algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation CEC00, La Jolla Marriott Hotel, La Jolla, CA, USA, 6–9 July 2000; Volume 1, pp. 805–810. [Google Scholar]
- do Amaral, L.R.; Hruschka, E.R. Gene ontology classification: Building high-level knowledge using genetic algorithms. In Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, 18–23 July 2010; pp. 1–7. [Google Scholar]
- do Amaral, L.R.; Junior, E.R.H. Never-ending learning principles in gene ontology classification using genetic algorithms. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation, Brisbane, Australia, 10–15 June 2012; pp. 1–8. [Google Scholar]
- do Amaral, L.R.; Hruschka, E.R., Jr. Transgenic: An evolutionary algorithm operator. Neurocomputing 2014, 127, 104–113. [Google Scholar] [CrossRef]
- Matos, M.D.S.; Do Amaral, L.R. Multiple disjunctions rule genetic algorithm (MDRGA): Inferring non-linear IF-THEN rules in non-linear datasets. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]
- Silva, R.G.O.; de Souza Ribeiro, M.W.; do Amaral, L.R. Building high level knowledge from high dimensionality biological dataset (NCI60) using genetic algorithms and feature selection strategies. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; pp. 578–583. [Google Scholar]
- do Amaral, L.R.; da Silva Alves, A.H.; de Lima Mendes, R.; de Souza Gomes, M.; Bertarini, P.L.L.; Hruschka, E.R. Applying Never-Ending Learning (NEL) Principles to Build a Gene Ontology (GO) Biocurator. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021; pp. 458–465. [Google Scholar]
- da Silva Alves, A.H.; de Lima Mendes, R.; de Souza Gomes, M.; Bertarini, P.L.L.; do Amaral, L.R. IG-CEE: An Embedded Information Gain approach to Genetic Algorithms. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021; pp. 1086–1092. [Google Scholar]
- Boone, K.; Wisdom, C.; Camarda, K.; Spencer, P.; Tamerler, C. Combining genetic algorithm with machine learning strategies for designing potent antimicrobial peptides. BMC Bioinform. 2021, 22, 239. [Google Scholar] [CrossRef] [PubMed]
- Alkhayyata, A.; Hewahi, N. A Novel Machine Learning Classifier Based on Genetic Algorithms and Data Importance Reformatting. arXiv 2024, arXiv:2412.13350. [Google Scholar]
- Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef] [PubMed]
- do Amaral, L.; Junior, E.H. Non-linear computational evolutionary environment (nlcee): Building high-level knowledge in complex biological databases. In Proceedings of the ECML/PKDD-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. No Workshop: Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions, Athens, Greece, 5–9 September 2011. [Google Scholar]
- Lopes, H.S.; Coutinho, M.S.; de Lima, W. An evolutionary approach to simulate cognitive feedback learning in medical domain. In Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives; World Scientific: Singapore, 1997; pp. 193–207. [Google Scholar]
- Little, M.; Mcsharry, P.; Roberts, S.; Costello, D.; Moroz, I. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. OnLine 2007, 6, 23. [Google Scholar] [CrossRef] [PubMed]
- Asuncion, A.; Newman, D. UCI Machine Learning Repository; School of Information and Computer Science, University of California: Irvine, CA, USA, 2007. [Google Scholar]
- Quinlan, J.R. C4. 5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
- Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef]
- Duda, R.O.; Hart, P.E. Pattern Classification and Scene Analysis; Wiley: New York, NY, USA, 1973; Volume 3. [Google Scholar]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J.; DATA, M. Practical machine learning tools and techniques. In Proceedings of the Data Mining, Las Vegas, NV, USA, 20–23 June 2005; Volume 2. [Google Scholar]
Dataset | Attributes | Samples | Classes |
---|---|---|---|
Breast-W | 32 | 569 | Malignant (M), Benign (B) |
Pima-Indians-diabetes | 9 | 768 | tested_negative, tested_positive |
Parkinson’s | 23 | 195 | Healthy, PD (Parkinson’s Disease) |
Mammography | 5 | 961 | Benign, Malignant |
COVID-19 | 7 | 3158 | tested_negative, tested_positive |
Class | # of Samples | % of Samples |
---|---|---|
M | 212 | 37.25% |
B | 357 | 62.75% |
Total | 569 | 100% |
Class | # of Samples | % of Samples |
---|---|---|
tested_negative | 500 | 65.11% |
tested_positive | 268 | 34.89% |
Total | 768 | 100% |
Class | # of Samples | % of Samples |
---|---|---|
healthy | 48 | 24.62% |
PD | 147 | 75.38% |
Total | 195 | 100% |
Class | # of Samples | % of Samples |
---|---|---|
benign | 516 | 53.69% |
malignant | 445 | 46.31% |
Total | 961 | 100% |
Class | # of Samples | % of Samples |
---|---|---|
tested_negative | 2229 | 70.58% |
tested_positive | 929 | 29.41% |
Total | 3158 | 100% |
BIN-NLCEE | NLCEE | CEE | |
---|---|---|---|
breast | 98.45 ± 0.28 | 98.09 ± 0.33 | 97.46 ± 0.33 |
(B) | |||
diabetes | 83.34 ± 0.65 | 82.00 ± 0.65 | 80.92 ± 0.66 |
(tested_negative) | |||
Parkinson’s | 99.27 ± 0.33 | 98.53 ± 0.42 | 97.92 ± 0.51 |
(healthy) | |||
mammography | 85.86 ± 0.65 | 85.64 ± 0.64 | 85.23 ± 0.63 |
(benign) | |||
COVID-19 | 64.56 ± 0.39 | 64.22 ± 0.41 | 63.52 ± 0.39 |
(tested_negative) |
BIN-NLCEE | NLCEE | CEE | |
---|---|---|---|
breast | 97.89 ± 0.35 | 97.18 ± 0.37 | 96.45 ± 0.42 |
(M) | |||
diabetes | 82.34 ± 0.64 | 80.93 ± 0.66 | 80.21 ± 0.72 |
(tested_positive) | |||
Parkinson’s | 97.34 ± 0.55 | 95.78 ± 1.12 | 96.13 ± 0.72 |
(PD) | |||
mammography | 84.23 ± 0.59 | 84.28 ± 0.64 | 83.78 ± 0.62 |
(malignant) | |||
COVID-19 | 65.06 ± 0.39 | 63.62 ± 0.34 | 62.91 ± 0.37 |
(tested_positive) |
J48 | IBK | Naive Bayes | SVM | |
---|---|---|---|---|
breast | 92.94 ± 0.17 | 95.07 ± 0.08 | 92.07 ± 0.06 | 97.02 ± 0.06 |
diabetes | 70.66 ± 0.25 | 66.65 ± 0.16 | 72.11 ± 0.11 | 71.38 ± 0.10 |
Parkinson’s | 79.70 ± 0.61 | 95.56 ± 0.22 | 76.80 ± 0.21 | 74.94 ± 0.21 |
mammography | 81.57 ± 0.11 | 74.68 ± 0.15 | 77.80 ± 0.10 | 79.65 ± 0.09 |
COVID-19 | 56.39 ± 0.13 | 54.89 ± 0.11 | 57.69 ± 0.09 | 50.15 ± 0.07 |
J48 | IBK | NB | SVM | |||||
---|---|---|---|---|---|---|---|---|
S | B | S | B | S | B | S | B | |
breast | 4.43 | 5.47 | 2.39 | 3.25 | 5.41 | 6.23 | 0.46 | 1.28 |
diabe. | 10.79 | 12.57 | 14.89 | 16.49 | 9.48 | 10.98 | 10.22 | 11.7 |
park. | 16.48 | 18.8 | 1.01 | 2.55 | 19.78 | 21.3 | 21.64 | 23.16 |
mam. | 1.96 | 3.36 | 8.81 | 10.29 | 5.74 | 7.12 | 3.9 | 5.26 |
cov. | 8.15 | 9.19 | 9.67 | 10.67 | 6.89 | 7.85 | 14.45 | 15.37 |
Dataset | Rule |
---|---|
Breast-w | IF((A1 < 0.0625 OR A1 ≥ 0.4219)AND(A2 ≥ 0.2657) AND(A3 ≥ 0.1719)AND(A7 ≥ 0.0625) AND(A8 ≥ 0.1563)AND(A18 < 0.5938) AND(A20 < 0.6875)AND(A21 < 0.7188) AND(A30 < 0.7813)) THEN M IF((A15 < 0.8438)AND(A16 < 0.3125) AND(A17 < 0.9375)AND(A20 ≥ 0.1875 OR A20 < 0.1719) AND(A21 < 0.1875)AND(A23 < 0.5938) AND(A26 < 0.5157)AND(A29 < 0.4688) AND(A30 < 0.75)AND(A31 < 0.7188 OR A31 ≥ 0.8282)) THEN B |
Diabetes | IF((V1 < 0.25 OR V1 ≥ 0.5)AND(V2 < 0.8125) AND(V3 ≥ 0.0313)AND(V4 < 0.4219) AND(V8 < 0.25)) THEN tested_negative IF((V1 ≥ 0.0625)AND(V5 < 0.2969 OR V5 ≥ 0.625) AND(V6 < 0.625)AND(V8 ≥ 0.1094)) THEN tested_positive |
Parkinson’s | IF((V1 < 0.5625)AND(V2 ≥ 0.375) AND(V3 < 0.375 OR V3 ≥ 0.875)AND(V16 ≥ 0.3125) AND(V20 < 0.4063)) THEN healthy IF((V1 < 0.6875 OR V1 ≥ 0.7969) AND(V18 ≥ 0.4375)) THEN PD |
Mammography | IF((V2 < 0.7032 OR V2 ≥ 0.8438)AND(V3 < 0.9063) AND(V5 ≥ 0.1875)) THEN benign IF((V3 ≥ 0.8282)AND(V4 < 0.0782 OR V4 ≥ 0.4844) AND(V5 < 0.9219)) THEN malignant |
COVID-19 | IF (A3 ≥ 0.25 AND A5 ≥ 0.5313) THEN 1 IF (A3 < 0.3594 AND A5 < 0.7813 AND A6 < 0.4375) THEN 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alves, A.H.d.S.; Neto, G.A.C.; Gomes, M.d.S.; Santos, L.D.; Bertarini, P.L.L.; do Amaral, L.R. Binary or Integer Chromosome: Which Is the Best Structure for Supervised Machine Learning Using Genetic Algorithms? Appl. Sci. 2025, 15, 2608. https://doi.org/10.3390/app15052608
Alves AHdS, Neto GAC, Gomes MdS, Santos LD, Bertarini PLL, do Amaral LR. Binary or Integer Chromosome: Which Is the Best Structure for Supervised Machine Learning Using Genetic Algorithms? Applied Sciences. 2025; 15(5):2608. https://doi.org/10.3390/app15052608
Chicago/Turabian StyleAlves, Alexandre Henrick da Silva, Guilherme Antonio Coelho Neto, Matheus de Souza Gomes, Líbia Diniz Santos, Pedro Luiz Lima Bertarini, and Laurence Rodrigues do Amaral. 2025. "Binary or Integer Chromosome: Which Is the Best Structure for Supervised Machine Learning Using Genetic Algorithms?" Applied Sciences 15, no. 5: 2608. https://doi.org/10.3390/app15052608
APA StyleAlves, A. H. d. S., Neto, G. A. C., Gomes, M. d. S., Santos, L. D., Bertarini, P. L. L., & do Amaral, L. R. (2025). Binary or Integer Chromosome: Which Is the Best Structure for Supervised Machine Learning Using Genetic Algorithms? Applied Sciences, 15(5), 2608. https://doi.org/10.3390/app15052608