Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Sources
- Address each sample type independently:
- -
- Dataset for oral samples.
- -
- Dataset for fecal samples.
- -
- Dataset for endometrial fluid.
- -
- Dataset for endometrial biopsies.
- -
- Dataset for the vaginal samples.
- Group some sample types:
- -
- The dataset for the GIT merges the oral–fecal datasets.
- -
- The dataset for the FRT merges the endometrial fluid, endometrial biopsy, and vaginal microbiomes.
- -
- The dataset for FRT2 merges the endometrial biopsy and vaginal microbiomes.
2.1.1. Data Source for Oral Region
2.1.2. Data Source for Fecal Region
2.1.3. Data Source for Endometrial Fluid Region
2.1.4. Data Source for Endometrial Biopsy Region
2.1.5. Data Source for Endometrial Vaginal Region
2.2. Preprocessing Data
2.2.1. Filtering
2.2.2. Scaling
2.2.3. Class Imbalance
2.3. Dataset from One Area
2.4. Machine Learning Models
Algorithm 1: classify(X, Y, k, classifier) |
Algorithm 2: filter_non_zero_division(precision_zd0, precision_zd1, recall_zd0, recall_zd1) |
Algorithm 3: get_f1(precision, recall) |
2.4.1. Logistic Regression Classification
Algorithm 4: classify_by_logistic_regression(X, Y, k) |
X: Input. X matrix. Y: Input. Y matrix. k: Input. The number of folds for cross-validation. Result: This method returns a dictionary with the tuples of the mean, standard deviation, and length of the cross-validation scores for each metric. The keys of the dictionary are “accuracy”, “precision”, “recall”, and “f1”.
|
2.4.2. Decision Tree Classification
Algorithm 5: classify_by_decision_tree(X, Y, k) |
X: Input. X matrix. Y: Input. Y matrix. k: Input. The number of folds for cross-validation. Result: This method returns a dictionary with the tuples of the mean, standard deviation, and length of the cross-validation scores for each metric. The keys of the dictionary are “accuracy”, “precision”, “recall”, and “f1”.
|
2.4.3. Support Vector Classification
Algorithm 6:classify_by_svc(X, Y, k, kernel, C) |
2.5. Dataset from a Single Sample Type
2.6. Dataset from Multiple Sample Types
Algorithm 7: merge(dataset, features, sample_types=[ “EF”, “EB”, “Vaginal” ]) |
Algorithm 8: classify_frt(dataset, features, k, classifier) |
dataset : Input. A dataset where the rows are the relative abundances, and each column represents a feature. At least the columns “IsEndometriosis”, “PatientID”, “SampleType”, and all the features should be present. features : Input. A list of column names that represent the features of the input data. k : Input. The number of folds for cross-validation. classifier: Input. The specific classifier for the chosen means of classification. Result: This method returns a dictionary with the tuples of the mean and standard deviation of each cross-validation score of each metric. The keys of the dictionary are “accuracy”, “precision”, “recall”, and “f1”. //Merge feaures
|
2.7. Hyperparameter Optimization
3. Results
3.1. Logistic Regression Classification
3.2. Decision Tree Classification
3.3. Support Vector Classification with the linear kernel
3.4. Support Vector Classification with the rbf kernel and C = 100
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ASVs | Amplicon Sequence Variants |
CSV | Comma-Separated Values |
DADA2 | Divisive Amplicon Denoising Algorithm 2 |
EB | Endometrial biopsy |
EF | Endometrial fluid |
EM | Endometriosis |
FRT | Female reproductive tract |
GIT | Gastrointestinal tract |
MRI | Magnetic Resonance Imaging |
OTUs | Operational Taxonomic Units |
RBF | Radial basis function |
ROC | Receiver operating characteristic curve |
rRNA | Ribosomal Ribonucleic Acid |
SDK | Software Development Kit |
SVC | Support vector classifier |
SVM | Support vector machine |
References
- Bullon, P.; Navarro, J.M. Inflammasome as a Key Pathogenic Mechanism in Endometriosis. Curr. Drug Targets 2017, 18. [Google Scholar] [CrossRef]
- Zondervan, K.T.; Becker, C.M.; Missmer, S.A. Endometriosis. N. Engl. J. Med. 2020, 382, 1244–1256. [Google Scholar] [CrossRef]
- Moreno, I.; Codoñer, F.M.; Vilella, F.; Valbuena, D.; Martinez-Blanch, J.F.; Jimenez-Almazán, J.; Alonso, R.; Alamá, P.; Remohí, J.; Pellicer, A.; et al. Evidence that the endometrial microbiota has an effect on implantation success or failure. Am. J. Obstet. Gynecol. 2016, 215, 684–703. [Google Scholar] [CrossRef] [PubMed]
- Riganelli, L.; Iebba, V.; Piccioni, M.; Illuminati, I.; Bonfiglio, G.; Neroni, B.; Calvo, L.; Gagliardi, A.; Levrero, M.; Merlino, L.; et al. Structural Variations of Vaginal and Endometrial Microbiota: Hints on Female Infertility. Front. Cell. Infect. Microbiol. 2020, 10, 350. [Google Scholar] [CrossRef]
- Moreno, I.; Garcia-Grau, I.; Perez-Villaroya, D.; Gonzalez-Monfort, M.; Bahçeci, M.; Barrionuevo, M.J.; Taguchi, S.; Puente, E.; Dimattina, M.; Lim, M.W.; et al. Endometrial microbiota composition is associated with reproductive outcome in infertile patients. Microbiome 2022, 10, 1. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, K.; Dutta, S.; Sengupta, P.; Bagchi, S. Reproductive tract microbiome and therapeutics of infertility. Middle East Fertil. Soc. J. 2023, 28, 11. [Google Scholar] [CrossRef]
- Mitchell, T. Machine Learning; McGraw-Hill Education: New York, NY, USA, 1997. [Google Scholar]
- Rabcan, J.; Levashenko, V.; Zaitseva, E.; Kvassay, M. EEG Signal Classification Based on Fuzzy Classifiers. IEEE Trans. Ind. Inform. 2022, 18, 757–766. [Google Scholar] [CrossRef]
- Bonissone, P.; Cadenas, J.M.; Carmen Garrido, M.; Andrés Díaz-Valladares, R. A fuzzy random forest. Int. J. Approx. Reason. 2010, 51, 729–747. [Google Scholar] [CrossRef]
- Visalaxi, S.; Punnoose, D.; Muthu, T.S. An Analogy of Endometriosis Recognition Using Machine Learning Techniques. In Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021. [Google Scholar] [CrossRef]
- Esfandiari, N.; Babavalian, M.R.; Moghadam, A.M.E.; Tabar, V.K. Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 2014, 41, 4434–4463. [Google Scholar] [CrossRef]
- Wang, L.; Zheng, W.; Ding, X.; Yu, J.; Jiang, W.; Zhang, S. Identification biomarkers of eutopic endometrium in endometriosis using artificial neural networks and protein fingerprinting. Fertil. Steril. 2010, 93, 2460–2462. [Google Scholar] [CrossRef]
- Praiss, A.M.; Huang, Y.; St.Clair, C.M.; Tergas, A.I.; Melamed, A.; Khoury-Collado, F.; Hou, J.Y.; Hu, J.; Hur, C.; Hershman, D.L.; et al. Using machine learning to create prognostic systems for endometrial cancer. Gynecol. Oncol. 2020, 159, 744–750. [Google Scholar] [CrossRef]
- Bhardwaj, V.; Sharma, A.; Parambath, S.V.; Gul, I.; Zhang, X.; Lobie, P.E.; Qin, P.; Pandey, V. Machine Learning for Endometrial Cancer Prediction and Prognostication. Front. Oncol. 2022, 12. [Google Scholar] [CrossRef]
- Chen, X.; Wang, Y.; Shen, M.; Yang, B.; Zhou, Q.; Yi, Y.; Liu, W.; Zhang, G.; Yang, G.; Zhang, H. Deep learning for the determination of myometrial invasion depth and automatic lesion identification in endometrial cancer MR imaging: A preliminary study in a single institution. Eur. Radiol. 2020, 30, 4985–4994. [Google Scholar] [CrossRef] [PubMed]
- Nisenblat, V.; Prentice, L.; Bossuyt, P.M.; Farquhar, C.; Hull, M.L.; Johnson, N. Combination of the non-invasive tests for the diagnosis of endometriosis. Cochrane Database Syst. Rev. 2016, 2016, CD012281. [Google Scholar] [CrossRef] [PubMed]
- Anastasiu, C.V.; Moga, M.A.; Elena Neculau, A.; Bălan, A.; Scârneciu, I.; Dragomir, R.M.; Dull, A.M.; Chicea, L.M. Biomarkers for the Noninvasive Diagnosis of Endometriosis: State of the Art and Future Perspectives. Int. J. Mol. Sci. 2020, 21, 1750. [Google Scholar] [CrossRef]
- Mukhamediev, R.I.; Popova, Y.; Kuchin, Y.; Zaitseva, E. Review of Artificial Intelligence and Machine Learning Technologies: Classification, Restrictions, Opportunities and Challenges. Mathematics 2022, 10, 2552. [Google Scholar] [CrossRef]
- Rychnovská, D. Anticipatory Governance in Biobanking: Security and Risk Management in Digital Health. Sci. Eng. Ethics 2021, 27, 30. [Google Scholar] [CrossRef] [PubMed]
- Nuñez, H.; Gonzalez-Abril, L.; Angulo, C. Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. J. Classif. 2017, 34, 427–443. [Google Scholar] [CrossRef]
- Gonzalez-Abril, L.; Angulo, C.; Nuñez, H.; Leal, Y. Handling binary classification problems with a priority class by using Support Vector Machines. Appl. Soft Comput. 2017, 61, 661–669. [Google Scholar] [CrossRef]
- Bolyen, E.; Rideout, J.R.; Dillon, M.R.; Bokulich, N.A.; Abnet, C.C.; Al-Ghalith, G.A.; Alexander, H.; Alm, E.J.; Arumugam, M.; Asnicar, F.; et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 2019, 37, 852–857. [Google Scholar] [CrossRef]
- Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson, A.J.A.; Holmes, S.P. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef] [PubMed]
- McDonald, D.; Price, M.N.; Goodrich, J.; Nawrocki, E.P.; DeSantis, T.Z.; Probst, A.; Andersen, G.L.; Knight, R.; Hugenholtz, P. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 2012, 6, 610–618. [Google Scholar] [CrossRef]
- Bokulich, N.A.; Kaehler, B.D.; Rideout, J.R.; Dillon, M.; Bolyen, E.; Knight, R.; Huttley, G.A.; Gregory Caporaso, J. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 2018, 6. [Google Scholar] [CrossRef]
- Rognes, T.; Flouri, T.; Nichols, B.; Quince, C.; Mahé, F. VSEARCH: A versatile open source tool for metagenomics. PeerJ 2016, 4, e2584. [Google Scholar] [CrossRef]
- Barandela, R.; Sánchez, J.; García, V.; Rangel, E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
- Chen, R.J.; Lu, M.Y.; Chen, T.Y.; Williamson, D.F.K.; Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 2021, 5, 493–497. [Google Scholar] [CrossRef]
- Azizi, Z.; Zheng, C.; Mosquera, L.; Pilote, L.; El Emam, K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021, 11, e043497. [Google Scholar] [CrossRef] [PubMed]
- Esteban Lasso, A.; Martínez Toledo, C.; Perosanz Amarillo, S. Diseño de un Modelo Para Generar Datos Sintéticos en Investigación Médica; Universidad de Alcalá: Alcalá de Henares, Spain, 2023; Volume 12. [Google Scholar]
- Reiner Benaim, A.; Almog, R.; Gorelik, Y.; Hochberg, I.; Nassar, L.; Mashiach, T.; Khamaisi, M.; Lurie, Y.; Azzam, Z.S.; Khoury, J.; et al. Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies. JMIR Med. Inform. 2020, 8, e16492. [Google Scholar] [CrossRef]
- Chawla, N. Data Mining and Knowledge Discovery Handbook. In Data Mining and Knowledge Discovery Handbook; Chapter Data Mining for Imbalanced Datasets: An Overview; Springer: New York, NY, USA, 2010; pp. 875–886. [Google Scholar] [CrossRef]
- Murtaza, H.; Ahmed, M.; Khan, N.F.; Murtaza, G.; Zafar, S.; Bano, A. Synthetic data generation: State of the art in health care domain. Comput. Sci. Rev. 2023, 48, 100546. [Google Scholar] [CrossRef]
- Spaczynski, R.Z.; Duleba, A.J. Diagnosis of Endometriosis. Semin. Reprod. Med. 2003, 21, 193–208. [Google Scholar] [CrossRef]
- Hsu, A.L.; Khachikyan, I.; Stratton, P. Invasive and non-invasive methods for the diagnosis of endometriosis. Clin. Obstet. Gynecol. 2010, 53, 413–419. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Ramezan, C.A.; Warner, T.A.; Maxwell, A.E. Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
- Santos, M.S.; Soares, J.P.; Abreu, P.H.; Araujo, H.; Santos, J. Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches [Research Frontier]. IEEE Comput. Intell. Mag. 2018, 13, 59–76. [Google Scholar] [CrossRef]
- Wong, T.T.; Yeh, P.Y. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2020, 32, 1586–1594. [Google Scholar] [CrossRef]
- Simon, R. Supervised Analysis When the Number of Candidate Features (p) Greatly Exceeds the Number of Cases (n). SIGKDD Explor. Newsl. 2003, 5, 31–36. [Google Scholar] [CrossRef]
- Cawley, G.; Talbot, N. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. J. Mach. Learn. Res. 2010, 11, 2079–2107. [Google Scholar]
- Gonzalez-Abril, L.; Nuñez, H.; Angulo, C.; Velasco, F. GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems. Appl. Soft Comput. 2014, 17, 23–31. [Google Scholar] [CrossRef]
- Peng, C.; Lee, K.; Ingersoll, G. An Introduction to Logistic Regression Analysis and Reporting. J. Educ. Res. 2002, 96, 3–14. [Google Scholar] [CrossRef]
- Quinlan, J. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Gonzalez-Abril, L.; Angulo, C.; Velasco, F.; Català, A. Dual unification of bi-class support vector machine formulations. Pattern Recognit. 2006, 39, 1325–1332. [Google Scholar] [CrossRef]
- Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2016, 14, 1502. [Google Scholar] [CrossRef]
- Falomir, Z.; Museros, L.; Sanz, I.; Gonzalez-Abril, L. Categorizing paintings in art styles based on qualitative color descriptors, quantitative global features and machine learning (QArt-Learn). Expert Syst. Appl. 2018, 97, 83–94. [Google Scholar] [CrossRef]
Bacteria Name | (%) |
---|---|
Firmicutes Streptococcus | 49.14% |
Proteobacteria Neisseria | 13.14% |
Proteobacteria Haemophilus | 4.58% |
Actinobacteria Actinomyces | 3.63% |
Firmicutes Veillonella | 3.51% |
Bacteria Name | (%) |
---|---|
Firmicutes Streptococcusm | 45.36% |
Proteobacteria Haemophilusm | 22.53% |
Fusobacteria Fusobacteriumm | 3.28% |
Proteobacteria Pasteurellaceae | 3.06% |
Proteobacteria Neisseriam | 2.69% |
Bacteria Name | (%) |
---|---|
Firmicutes Lachnospiraceae | 35.52% |
Proteobacteria Enterobacteriaceae | 21.14% |
Firmicutes Streptococcus | 13.89% |
Firmicutes Ruminococcaceae | 6.23% |
Firmicutes Ruminococcus | 3.98% |
Firmicutes Blautia | 3.59% |
Firmicutes Faecalibacterium | 3.27% |
Bacteria Name | (%) |
---|---|
Firmicutes Lachnospiraceae | 24.78% |
Firmicutes Ruminococcus | 17.33% |
Firmicutes Faecalibacterium | 8.47% |
Proteobacteria Enterobacteriaceae | 8.46% |
Firmicutes Ruminococcaceae | 7.27% |
Firmicutes Blautia | 6.57% |
Firmicutes Bacillus | 3.91% |
Firmicutes Streptococcus | 3.47% |
Firmicutes Erysipelotrichaceae | 3.34% |
Firmicutes Enterococcus | 3.17% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacillus | 67.61% |
Actinobacteria Gardnerella | 10.83% |
Proteobacteria Rhizobiaceae | 3.05% |
Firmicutes Lachnospiraceae | 2.43% |
Firmicutes Megasphaera | 1.83% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacillus | 61.20% |
Actinobacteria Gardnerella | 14.89% |
Proteobacteria Enterobacteriaceae | 6.78% |
Proteobacteria Rhizobiaceae | 2.22% |
Proteobacteria Vibrio | 1.62% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacillus | 82.63% |
Proteobacteria Rhizobiaceae | 3.41% |
Actinobacteria Gardnerella | 2.62% |
Firmicutes Mogibacteriaceae | 2.44% |
Firmicutes Lachnospiraceae | 1.94% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacillus | 49.62% |
Firmicutes Lachnospiraceae | 8.65% |
Proteobacteria Enterobacteriaceae | 5.49% |
Proteobacteria Rhizobiaceae | 3.51% |
Firmicutes Streptococcus | 3.05% |
Bacteroidetes Bacteroides | 2.89% |
Actinobacteria Gardnerella | 2.79% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacillus | 82.63% |
Firmicutes Lactobacilluss | 82.93% |
Actinobacteria Gardnerellas | 7.43% |
Firmicutes Mogibacteriaceae | 2.38% |
Bacteroidetes Prevotella | 1.94% |
Bacteria Name | (%) |
---|---|
Firmicutes Lactobacilluss | 70.54% |
Actinobacteria Gardnerellas | 9.92% |
Proteobacteria Enterobacteriaceaes | 7.35% |
Firmicutes Enterococcuss | 2.93% |
Firmicutes Lactobacillaceae | 1.68% |
Region | Number of Columns | Number of Discarded Columns |
---|---|---|
EB | 256 (58.45%) | 182 (41.55%) |
EF | 230 (52.52%) | 208 (47.48%) |
Vaginal | 230 (58.45%) | 208 (47.48%) |
Oral | 144 (32.88%) | 294 (67.12%) |
Fecal | 124 (28.31%) | 314 (71.69%) |
Actual Classification | |||
Positive | Negative | ||
Predicted classification | Positive | True positive (TP) | False positive (FP) |
Negative | False negative (FN) | True negative (TN) |
Region | Number of Patients | Patients with EM |
---|---|---|
EB | 21 | 7 (33.33%) |
EF | 21 | 7 (33.33%) |
Vaginal | 21 | 7 (33.33%) |
Oral | 12 | 3 (25.00%) |
Fecal | 13 | 4 (30.76%) |
Region | Number of Patients | Patients with EM |
---|---|---|
GIT (oral + fecal) | 08 | 3 (37.50%) |
FRT (EB + EF + vaginal) | 21 | 7 (33.33%) |
FRT2 (EB + vaginal) | 21 | 7 (33.33%) |
Name | Value | Description |
---|---|---|
Classifier | LogisticRegression | Python classifier from sklearn.linear_model |
solver | liblinear | Algorithm to use in optimization problem |
max_iter | 1,000,000 | Maximum number of iterations |
C | 1.0 | Inverse of regularization strength |
n_repeats | 100 | Number of repetitions in RepeatedKFold |
Metric | Statistic | EB | EF | Vaginal | Oral | Fecal | FRT | FRT2 | |
---|---|---|---|---|---|---|---|---|---|
# | 536 | 469 | 625 | 179 | 335 | 610 | 608 | ||
Accuracy | { | 38.43 | 37.67 | 29.97 | 80.07 | 59.38 | 39.78 | 46.38 | |
0.25 | 0.26 | 0.25 | 0.29 | 0.25 | 0.25 | 0.26 | |||
Precision | { | 23.54 | 22.28 | 20.27 | 71.97 | 40.72 | 30.16 | 35.8 | |
0.33 | 0.32 | 0.26 | 0.4 | 0.33 | 0.3 | 0.32 | |||
Recall | { | 36.29 | 33.48 | 38.8 | 81.01 | 69.25 | 54.75 | 61.84 | |
0.46 | 0.45 | 0.47 | 0.39 | 0.46 | 0.48 | 0.47 | |||
F1 | { | # | 215 | 175 | 267 | 145 | 232 | 360 | 398 |
(%) | 26.46 | 25.04 | 25.31 | 74.77 | 49.72 | 36.54 | 42.87 | ||
0.34 | 0.34 | 0.3 | 0.39 | 0.36 | 0.32 | 0.34 |
Name | Value | Description |
---|---|---|
Classifier | DecisionTreeClassifier | The Python classifier from sklearn.tree. |
max_features | sqrt | The number of features to consider when looking for the best split: this parameter affected the performance and complexity of the decision tree. |
n_repeats | 100 | The number of repetitions in RepeatedKFold. |
Metric | Statistic | EB | EF | Vaginal | Oral | Fecal | FRT | FRT2 | |
---|---|---|---|---|---|---|---|---|---|
# | 456 | 510 | 452 | 178 | 270 | 534 | 547 | ||
Accuracy | { | 40.86 | 29.61 | 38.2 | 73.6 | 58.73 | 51.56 | 46.86 | |
0.29 | 0.25 | 0.25 | 0.26 | 0.31 | 0.28 | 0.28 | |||
Precision | { | 27.89 | 13.01 | 21.2 | 61.42 | 39.85 | 39.61 | 33.24 | |
0.36 | 0.28 | 0.32 | 0.38 | 0.41 | 0.34 | 0.36 | |||
Recall | { | 41.67 | 20.69 | 32.3 | 80.34 | 54.81 | 63.39 | 50.18 | |
0.48 | 0.4 | 0.45 | 0.4 | 0.5 | 0.46 | 0.48 | |||
F1 | { | # | 205 | 112 | 160 | 143 | 148 | 356 | 294 |
(%) | 31.06 | 14.93 | 23.81 | 67.6 | 44.52 | 46.49 | 37.68 | ||
0.36 | 0.3 | 0.33 | 0.37 | 0.43 | 0.36 | 0.37 |
Name | Value | Description |
---|---|---|
Classifier | SVC | Python classifier from sklearn.svm |
kernel | linear | Kernel type to be used |
C | 1.0 | Regularization parameter |
n_repeats | 100 | Number of repetitions in RepeatedKFold |
Metric | Statistic | EB | EF | Vaginal | Oral | Fecal | FRT | FRT2 | |
---|---|---|---|---|---|---|---|---|---|
# | 476 | 491 | 667 | 154 | 278 | 572 | 642 | ||
Accuracy | { | 38.31 | 35.17 | 25.69 | 79.44 | 69.9 | 47.03 | 44.29 | |
0.25 | 0.25 | 0.24 | 0.29 | 0.24 | 0.27 | 0.26 | |||
Precision | { | 21.46 | 20.03 | 18.04 | 70.78 | 53.99 | 35.81 | 33.52 | |
0.31 | 0.31 | 0.24 | 0.41 | 0.34 | 0.33 | 0.30 | |||
Recall | { | 34.77 | 30.86 | 36.21 | 79.87 | 81.65 | 59.97 | 60.44 | |
0.46 | 0.44 | 0.46 | 0.4 | 0.39 | 0.47 | 0.47 | |||
F1 | { | # | 181 | 169 | 264 | 123 | 227 | 363 | 412 |
(%) | 25.18 | 22.52 | 22.88 | 73.59 | 62.91 | 42.37 | 40.93 | ||
0.34 | 0.32 | 0.29 | 0.39 | 0.34 | 0.35 | 0.33 |
Name | Value | Description |
---|---|---|
Classifier | SVC | Python classifier from sklearn.svm |
kernel | rbf | Kernel type to be used |
gamma | scale | Kernel coefficient |
C | 100 | Regularization parameter |
Metric | Statistic | EB | EF | Vaginal | Oral | Fecal | FRT | FRT2 | |
---|---|---|---|---|---|---|---|---|---|
# | 623 | 349 | 574 | 124 | 261 | 575 | 613 | ||
Accuracy | { | 67.58 | 25.02 | 54.59 | 76.34 | 66.03 | 62.72 | 63.4 | |
0.27 | 0.14 | 0.28 | 0.32 | 0.26 | 0.27 | 0.26 | |||
Precision | { | 60.54 | 1.72 | 44.37 | 68.82 | 47.19 | 54.84 | 55.85 | |
0.34 | 0.07 | 0.37 | 0.4 | 0.39 | 0.36 | 0.32 | |||
Recall | { | 84.27 | 5.16 | 64.55 | 80.65 | 68.2 | 76.17 | 85.24 | |
0.34 | 0.22 | 0.45 | 0.4 | 0.47 | 0.4 | 0.33 | |||
F1 | { | # | 547 | 18 | 397 | 100 | 178 | 466 | 540 |
(%) | 67.05 | 2.58 | 49.56 | 72.45 | 53.96 | 60.23 | 64.13 | ||
0.31 | 0.11 | 0.36 | 0.39 | 0.4 | 0.33 | 0.29 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Caballero, P.; Gonzalez-Abril, L.; Ortega, J.A.; Simon-Soro, Á. Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset. Algorithms 2024, 17, 108. https://doi.org/10.3390/a17030108
Caballero P, Gonzalez-Abril L, Ortega JA, Simon-Soro Á. Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset. Algorithms. 2024; 17(3):108. https://doi.org/10.3390/a17030108
Chicago/Turabian StyleCaballero, Pablo, Luis Gonzalez-Abril, Juan A. Ortega, and Áurea Simon-Soro. 2024. "Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset" Algorithms 17, no. 3: 108. https://doi.org/10.3390/a17030108
APA StyleCaballero, P., Gonzalez-Abril, L., Ortega, J. A., & Simon-Soro, Á. (2024). Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset. Algorithms, 17(3), 108. https://doi.org/10.3390/a17030108