A Preliminary Machine Learning Assessment of Oxidation-Reduction Potential and Classical Sperm Parameters as Predictors of Sperm DNA Fragmentation Index
Abstract
1. Introduction
2. Related Work
- Logistic Regression (LR) is a ML methodology which uses a sigmoid function (logistic) (Equation (1)) to predict the probability of an event ocurring. Essentially, we are computing a relationship between one or more independent variables and the dependent variable y.
- Support Vector Machine (SVM) is a technique where it is constructed a hyperplane in a high dimensional space [29]. The best hyperplane is the one that has the biggest separation, called as margin between the nearest data points. These data points are called support vectors, and we are trying to maximize their distances between the hyperplane based on a mathematical function, the kernel. The most popular kernel functions are the linear, polynomial, radial basis function (RBF) and the sigmoid (Equation (2)).
- The naive Bayes classifier is based on the Bayes’ theorem (Equation (3)), with the ‘naive’ assumption of feature independence given the value of a class variable. There are multiple classifiers depending on the data distribution, i.e., Gaussian Naive Bayes (Gaussian (normal) distribution), Multinomial Naive Bayes (Multinomial distribution), Bernoulli Naive Bayes (Bernoulli Distribution) and Complement Naive Bayes which is a Multinomial Naive Bayes classifier designed specifically for imbalanced datasets [30].
- Random forests (RF) are the combination of decision trees to find the best single outcome [31]. Decision trees are non-parametric supervised learning methods which are conceptualized like trees, consisting of a root node, branches, internal nodes and leaf nodes. It is essentially splitting until the identification of the optimal split point within it, in a top-down manner. The mathematical criteria for splitting can be expressed as Entropy and Gini index (Equation (4)).
3. Materials and Methods
3.1. Study Population and Sample Collection
3.2. Conventional Semen Analysis
3.3. ORP Measurement
3.4. DNA Fragmentation Assessment
3.5. Dataset Development
3.6. Machine Learning Workflow
- Robust scaler: Scales data by removing the median and dividing by the interquartile range (IQR). It is particularly useful for datasets with outliers, as it focuses on the central of the data, making it robust to extreme values.
- MinMax scaler: Scales the data to a fixed range of [0, 1], by subtracting the minimum value and dividing by the range (max − min). It is sensitive to outliers, as extreme values can skew the scaling, making it less effective when outliers are present.
- Standard scaler: Scales the data by removing the mean and dividing by the standard deviation, resulting in a distribution with a mean of 0 and a standard deviation of 1. It is commonly used when the data is assumed to be normally distributed or when algorithms require standardized data.
4. Results
4.1. Phase 1: Reference Dataset
4.2. Phase 2: ORP Dataset
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ART | Assisted Reproductive Technology |
| ML | Machine Learning |
| ORP | Oxidation—Reduction Potential |
| DFI | DNA Fragmentation Index |
| IVF | In Vitro Fertilization |
| OS | Oxidative Stress |
| ROS | Reactive Oxygen Species |
| MI | Male Infertility |
| IUI | Intrauterine Insemination Treatment |
| ICSI | Intracytoplasmic Sperm Injection |
| LR | Logistic Regression |
| SVM | Support Vector Machine |
| NB | Naive Bayes |
| RF | Randorm Forest |
| XGBoost | Extreme Gradient Boosting |
| MiOXSYS | Male Infertility Oxidative System |
References
- Mazzilli, R.; Rucci, C.; Vaiarelli, A.; Cimadomo, D.; Ubaldi, F.M.; Foresta, C.; Ferlin, A. Male factor infertility and assisted reproductive technologies: Indications, minimum access criteria and outcomes. J. Endocrinol. Investig. 2023, 46, 1079–1085. [Google Scholar] [CrossRef] [PubMed]
- Tanga, B.M.; Qamar, A.Y.; Raza, S.; Bang, S.; Fang, X.; Yoon, K.; Cho, J. Semen evaluation: Methodological advancements in sperm quality-specific fertility assessment. Animal Bioscience 2021, 34, 1253–1270. [Google Scholar] [CrossRef] [PubMed]
- LI, T.K. The Glutathione and Thiol Content of Mammalian Spermatozoa and Seminal Plasma. Biol. Reprod. 1975, 12, 641–646. [Google Scholar] [CrossRef] [PubMed]
- Antinozzi, C.; Di Luigi, L.; Sireno, L.; Caporossi, D.; Dimauro, I.; Sgrò, P. Protective Role of Physical Activity and Antioxidant Systems During Spermatogenesis. Biomolecules 2025, 15, 478. [Google Scholar] [CrossRef]
- Henkel, R.; Sandhu, I.S.; Agarwal, A. The excessive use of antioxidant therapy: A possible cause of male infertility? Andrologia 2018, 51, e13162. [Google Scholar] [CrossRef]
- Bouhadana, D.; Godin Pagé, M.H.; Montjean, D.; Bélanger, M.C.; Benkhalifa, M.; Miron, P.; Petrella, F. The Role of Antioxidants in Male Fertility: A Comprehensive Review of Mechanisms and Clinical Applications. Antioxidants 2025, 14, 1013. [Google Scholar] [CrossRef]
- Moustakli, E.; Zikopoulos, A.; Sakaloglou, P.; Bouba, I.; Sofikitis, N.; Georgiou, I. Functional association between telomeres, oxidation and mitochondria. Front. Reprod. Health 2023, 5, 1107215. [Google Scholar] [CrossRef]
- Takeshima, T.; Usui, K.; Mori, K.; Asai, T.; Yasuda, K.; Kuroda, S.; Yumura, Y. Oxidative stress and male infertility. Reprod. Med. Biol. 2020, 20, 41–52. [Google Scholar] [CrossRef]
- Walke, G.; Gaurkar, S.S.; Prasad, R.; Lohakare, T.; Wanjari, M. The Impact of Oxidative Stress on Male Reproductive Function: Exploring the Role of Antioxidant Supplementation. Cureus 2023, 15, e42583. [Google Scholar] [CrossRef]
- Yang, H.; Li, G.; Jin, H.; Guo, Y.; Sun, Y. The effect of sperm DNA fragmentation index on assisted reproductive technology outcomes and its relationship with semen parameters and lifestyle. Transl. Androl. Urol. 2019, 8, 356–365. [Google Scholar] [CrossRef]
- Li, F.; Duan, X.; Li, M.; Ma, X. Sperm DNA fragmentation index affect pregnancy outcomes and offspring safety in assisted reproductive technology. Sci. Rep. 2024, 14, 356. [Google Scholar] [CrossRef] [PubMed]
- Solanki, M.; Joseph, T.; Muthukumar, K.; Samuel, P.; Aleyamma, T.K.; Kamath, M.S. Impact of sperm DNA fragmentation in couples with unexplained recurrent pregnancy loss: A cross-sectional study. J. Obstet. Gynaecol. Res. 2024, 50, 1687–1696. [Google Scholar] [CrossRef] [PubMed]
- Agarwal, A.; Henkel, R.; Sharma, R.; Tadros, N.N.; Sabanegh, E. Determination of seminal oxidation-reduction potential (ORP) as an easy and cost-effective clinical marker of male infertility. Andrologia 2017, 50, e12914. [Google Scholar] [CrossRef] [PubMed]
- Panner Selvam, M.K.; Moharana, A.K.; Baskaran, S.; Finelli, R.; Hudnall, M.C.; Sikka, S.C. Current Updates on Involvement of Artificial Intelligence and Machine Learning in Semen Analysis. Medicina 2024, 60, 279. [Google Scholar] [CrossRef]
- Chu, K.Y.; Nassau, D.E.; Arora, H.; Lokeshwar, S.D.; Madhusoodanan, V.; Ramasamy, R. Artificial Intelligence in Reproductive Urology. Curr. Urol. Rep. 2019, 20, 52. [Google Scholar] [CrossRef]
- Mehrjerd, A.; Dehghani, T.; Jajroudi, M.; Eslami, S.; Rezaei, H.; Ghaebi, N.K. Ensemble machine learning models for sperm quality evaluation concerning success rate of clinical pregnancy in assisted reproductive techniques. Sci. Rep. 2024, 14, 24283. [Google Scholar] [CrossRef]
- Peng, T.; Liao, C.; Ye, X.; Chen, Z.; Li, X.; Lan, Y.; Fu, X.; An, G. Machine learning-based clustering to identify the combined effect of the DNA fragmentation index and conventional semen parameters on in vitro fertilization outcomes. Reprod. Biol. Endocrinol. 2023, 21, 26. [Google Scholar] [CrossRef]
- Sene, A.A.; Zandieh, Z.; Soflaei, M.; Torshizi, H.M.; Sheibani, K. Using artificial intelligence to predict the intrauterine insemination success rate among infertile couples. Middle East Fertil. Soc. J. 2021, 26, 46. [Google Scholar] [CrossRef]
- Shemshaki, G.; Murthy, A.S.N.; Malini, S.S. Assessment and Establishment of Correlation between Reactive Oxidation Species, Citric Acid, and Fructose Level in Infertile Male Individuals: A Machine-Learning Approach. J. Hum. Reprod. Sci. 2021, 14, 129–136. [Google Scholar] [CrossRef]
- Santi, D.; Spaggiari, G.; Casonati, A.; Casarini, L.; Grassi, R.; Vecchi, B.; Roli, L.; De Santis, M.C.; Orlando, G.; Gravotta, E.; et al. Multilevel approach to male fertility by machine learning highlights a hidden link between haematological and spermatogenetic cells. Andrology 2020, 8, 1021–1029. [Google Scholar] [CrossRef]
- Zhou, M.; Yao, T.; Li, J.; Hui, H.; Fan, W.; Guan, Y.; Zhang, A.; Xu, B. Preliminary prediction of semen quality based on modifiable lifestyle factors by using the XGBoost algorithm. Front. Med. 2022, 9, 811890. [Google Scholar] [CrossRef] [PubMed]
- Bachelot, G.; Lamaziere, A.; Czernichow, S.; Faure, C.; Racine, C.; Levy, R.; Dupont, C. Machine learning approach to assess the association between anthropometric, metabolic, and nutritional status and semen parameters. Asian J. Androl. 2024, 26, 349–355. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
- Kuo, F.; Sloan, I. Lifting the Curse of Dimensionality. Not. AMS 2005, 52, 1320–1328. [Google Scholar]
- Stavros, S.; Potiris, A.; Molopodi, E.; Mavrogianni, D.; Zikopoulos, A.; Louis, K.; Karampitsakos, T.; Nazou, E.; Sioutis, D.; Christodoulaki, C.; et al. Sperm DNA Fragmentation: Unraveling Its Imperative Impact on Male Infertility Based on Recent Evidence. Int. J. Mol. Sci. 2024, 25, 10167. [Google Scholar] [CrossRef]
- Sakkas, K.; Dimitriou, E.G.; Ntagka, N.E.; Giannakeas, N.; Kalafatakis, K.; Tzallas, A.T.; Glavas, E. Personalized Visualization of the Gestures of Parkinson’s Disease Patients with Virtual Reality. Future Internet 2024, 16, 305. [Google Scholar] [CrossRef]
- Oikonomou, E.D.; Karvelis, P.; Giannakeas, N.; Vrachatis, A.; Glavas, E.; Tzallas, A.T. How natural language processing derived techniques are used on biological data: A systematic review. Netw. Model. Anal. Health Informatics Bioinform. 2024, 13, 23. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India, 23–25 January 2013; pp. 1–9. [Google Scholar] [CrossRef]
- Yang, F.J. An Implementation of Naive Bayes Classifier. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 12–14 December 2018; pp. 301–306. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, D.R.; Stevens, J.R. Random Forests. In Ensemble Machine Learning; Springer: New York, NY, USA, 2012; pp. 157–175. [Google Scholar] [CrossRef]
- Sun, T.C.; Zhang, Y.; Li, H.T.; Liu, X.M.; Yi, D.X.; Tian, L.; Liu, Y.X. Sperm DNA fragmentation index, as measured by sperm chromatin dispersion, might not predict assisted reproductive outcome. Taiwan. J. Obstet. Gynecol. 2018, 57, 493–498. [Google Scholar] [CrossRef]
- Caliskan, Z.; Kucukgergin, C.; Aktan, G.; Kadioglu, A.; Ozdemirler, G. Evaluation of sperm DNA fragmentation in male infertility. Andrologia 2022, 54, e14587. [Google Scholar] [CrossRef]
- Zandbagleh, A.; Miltiadous, A.; Sanei, S.; Azami, H. Beta-to-Theta Entropy Ratio of EEG in Aging, Frontotemporal Dementia, and Alzheimer’s Dementia. Am. J. Geriatr. Psychiatry 2024, 32, 1361–1382. [Google Scholar] [CrossRef]
- de Amorim, L.B.; Cavalcanti, G.D.; Cruz, R.M. The choice of scaling technique matters for classification performance. Appl. Soft Comput. 2023, 133, 109924. [Google Scholar] [CrossRef]
- Agarwal, A.; Parekh, N.; Panner Selvam, M.K.; Henkel, R.; Shah, R.; Homa, S.T.; Ramasamy, R.; Ko, E.; Tremellen, K.; Esteves, S.; et al. Male Oxidative Stress Infertility (MOSI): Proposed Terminology and Clinical Practice Guidelines for Management of Idiopathic Male Infertility. World J. Men’s Health 2019, 37, 296. [Google Scholar] [CrossRef]
- Ewald, F.K.; Bothmann, L.; Wright, M.N.; Bischl, B.; Casalicchio, G.; König, G. A Guide to Feature Importance Methods for Scientific Inference. arXiv 2024, arXiv:2404.12862. [Google Scholar] [CrossRef]



| Data | Stat Metrics | Test_Accuracy | Train_Accuracy | Test_Precision | Train_Precision | Test_Recall |
|---|---|---|---|---|---|---|
| ORP | Mean | 0.84 | 0.86 | 0.87 | 0.89 | 0.92 |
| SD | 0.11 | 0.02 | 0.13 | 0.02 | 0.11 | |
| NoORP | Mean | 0.84 | 0.84 | 0.87 | 0.86 | 0.92 |
| SD | 0.11 | 0.02 | 0.13 | 0.03 | 0.11 | |
| Reference | Mean | 0.77 | 0.77 | 0.77 | 0.77 | 0.94 |
| SD | 0.08 | 0.02 | 0.05 | 0.01 | 0.07 | |
| Data | Stat Metrics | train_recall | test_f1 | train_f1 | test_roc_auc | train_roc_auc |
| ORP | Mean | 0.92 | 0.89 | 0.90 | 0.77 | 0.84 |
| SD | 0.03 | 0.08 | 0.02 | 0.21 | 0.03 | |
| NoORP | Mean | 0.92 | 0.89 | 0.89 | 0.76 | 0.80 |
| SD | 0.03 | 0.08 | 0.01 | 0.19 | 0.03 | |
| Reference | Mean | 0.94 | 0.85 | 0.85 | 0.66 | 0.66 |
| SD | 0.02 | 0.06 | 0.01 | 0.10 | 0.02 |
| Model | Dataset Subset | Scaler | Test_Accuracy | Train_Accuracy | Test_Precision | Train_Precision | Test_Recall |
|---|---|---|---|---|---|---|---|
| LR | X_all | Robust—MinMax | 0.70 | 0.70 | 0.87 | 0.85 | 0.69 |
| SVM | X_mot_morph | Robust-Standard | 0.72 | 0.75 | 0.77 | 0.78 | 0.85 |
| BNB | X_mot | Robust-Min_Max | 0.76 | 0.76 | 0.77 | 0.77 | 0.94 |
| RF | X_all | No scaling | 0.73 | 0.77 | 0.78 | 0.81 | 0.84 |
| Model | Dataset Subset | Scaler | Train_Recall | Test_F1 | Train_F1 | Test_AUC | Train_AUC |
| LR | X_all | Robust—MinMax | 0.70 | 0.75 | 0.77 | 0.75 | 0.78 |
| SVM | X_mot_morph | Robust-Standard | 0.87 | 0.81 | 0.83 | 0.77 | 0.81 |
| BNB | X_mot | Robust-Min_Max | 0.94 | 0.85 | 0.85 | 0.66 | 0.66 |
| RF | X_all | No scaling | 0.87 | 0.81 | 0.84 | 0.78 | 0.82 |
| Model | ORP | Dataset Subset | Scaler | Test_Accuracy | Train_Accuracy | Test_Precision | Train_Precision | Test_Recall |
|---|---|---|---|---|---|---|---|---|
| LR | ORP | X_all | Robust-MinMax | 0.78 | 0.82 | 0.85 | 0.86 | 0.88 |
| NoORP | X_mot_morph | Robust-MinMax | 0.78 | 0.79 | 0.84 | 0.84 | 0.88 | |
| SVM | ORP | X_all | No scaling | 0.84 | 0.87 | 0.89 | 0.94 | 0.88 |
| NoORP | X_all | No scaling | 0.84 | 0.87 | 0.89 | 0.94 | 0.88 | |
| BNB | ORP | X_mot | Robust-MinMax | 0.84 | 0.86 | 0.87 | 0.89 | 0.92 |
| NoORP | X_all | Robust-MinMax | 0.84 | 0.84 | 0.84 | 0.86 | 0.92 | |
| RF | ORP | X_all | No scaling | 0.76 | 0.83 | 0.82 | 0.87 | 0.88 |
| NoORP | X_all | No scaling | 0.79 | 0.82 | 0.84 | 0.86 | 0.88 | |
| Model | ORP | Dataset Subset | Scaler | train_Recall | test_f1 | train_f1 | test_AUC | train_AUC |
| LR | ORP | X_all | Robust-MinMax | 0.87 | 0.85 | 0.87 | 0.85 | 0.86 |
| NoORP | X_mot_morph | Robust-MinMax | 0.85 | 0.85 | 0.84 | 0.85 | 0.87 | |
| SVM | ORP | X_all | No scaling | 0.86 | 0.88 | 0.90 | 0.86 | 0.89 |
| NoORP | X_all | No scaling | 0.87 | 0.88 | 0.90 | 0.86 | 0.89 | |
| BNB | ORP | X_mot | Robust-MinMax | 0.92 | 0.89 | 0.90 | 0.77 | 0.84 |
| NoORP | X_all | Robust-MinMax | 0.92 | 0.89 | 0.89 | 0.76 | 0.80 | |
| RF | ORP | X_all | No scaling | 0.88 | 0.84 | 0.88 | 0.85 | 0.91 |
| NoORP | X_all | No scaling | 0.88 | 0.85 | 0.87 | 0.88 | 0.90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Oikonomou, E.D.; Moustakli, E.; Zikopoulos, A.; Dafopoulos, S.; Prapa, E.; Gkountis, A.-M.; Zachariou, A.; Pantou, A.; Giannakeas, N.; Pantos, K.; et al. A Preliminary Machine Learning Assessment of Oxidation-Reduction Potential and Classical Sperm Parameters as Predictors of Sperm DNA Fragmentation Index. DNA 2026, 6, 3. https://doi.org/10.3390/dna6010003
Oikonomou ED, Moustakli E, Zikopoulos A, Dafopoulos S, Prapa E, Gkountis A-M, Zachariou A, Pantou A, Giannakeas N, Pantos K, et al. A Preliminary Machine Learning Assessment of Oxidation-Reduction Potential and Classical Sperm Parameters as Predictors of Sperm DNA Fragmentation Index. DNA. 2026; 6(1):3. https://doi.org/10.3390/dna6010003
Chicago/Turabian StyleOikonomou, Emmanouil D., Efthalia Moustakli, Athanasios Zikopoulos, Stefanos Dafopoulos, Ermioni Prapa, Antonis-Marios Gkountis, Athanasios Zachariou, Agni Pantou, Nikolaos Giannakeas, Konstantinos Pantos, and et al. 2026. "A Preliminary Machine Learning Assessment of Oxidation-Reduction Potential and Classical Sperm Parameters as Predictors of Sperm DNA Fragmentation Index" DNA 6, no. 1: 3. https://doi.org/10.3390/dna6010003
APA StyleOikonomou, E. D., Moustakli, E., Zikopoulos, A., Dafopoulos, S., Prapa, E., Gkountis, A.-M., Zachariou, A., Pantou, A., Giannakeas, N., Pantos, K., Tzallas, A. T., & Dafopoulos, K. (2026). A Preliminary Machine Learning Assessment of Oxidation-Reduction Potential and Classical Sperm Parameters as Predictors of Sperm DNA Fragmentation Index. DNA, 6(1), 3. https://doi.org/10.3390/dna6010003

