Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms
Abstract
1. Introduction
- (1)
- The determination of the best fifteen predictors for bacterial vaginosis diagnosis using feature selection algorithms.
- (2)
- Comparison of the results obtained in this research to those obtained in Beck and Foster [7].
- (3)
- The determination of a highly promising combination of SVM as a classification algorithm and decision trees as a feature selector for bacterial vaginosis diagnosis.
2. Materials and Methods
2.1. Dataset
2.2. Feature Selection Algorithms
2.2.1. Decision Trees
2.2.2. Relief
| Algorithm 1. Relief pseudocode | 
| Input: a vector of attribute values and the class value for each training instance Output: the vector W of estimations of the qualities of attributes 1. set all weights W[A]: = 0.0 2. for i: = 1 to m do begin 3. randomly select an instance Ri; 4. find nearest hit H and nearest miss M; 5. for A: = 1 to a do 6. W[A]: = W[A] − diff(A, Ri, H)/m + diff(A, Ri, M)/m; 7. end; | 
2.3. Feature Ranking and Cutoff
2.4. Classification Algorithms
2.4.1. Support Vector Machine
2.4.2. Logistic Regression
2.5. K-Folds Cross-Validation
2.6. Performance Measures
2.6.1. Balanced Accuracy
2.6.2. Sensitivity
2.6.3. Specificity
2.7. Experimental Studies
3. Results
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Lannon, S.M.R.; Waldorf, K.A.; Fiedler, T.; Kapur, R.P.; Agnew, K.; Rajagopal, L.; Gravett, M.G.; Fredricks, D. Parallel detection of lactobacillus and bacterial vaginosis-associated bacterial DNA in the chorioamnion and vagina of pregnant women at term. JMFNM 2019, 32, 2702–2710. [Google Scholar] [CrossRef]
- Jones, A. Bacterial Vaginosis: A review of treatment, recurrence, and disparities. JNP 2019, 15, 420–423. [Google Scholar] [CrossRef]
- Hilbert, D.W.; Smith, W.L.; Chadwick, S.G.; Toner, G.; Mordechai, E.; Adelson, M.E.; Gygax, S.E. Development and validation of a highly accurate quantitative real-time PCR assay for diagnosis of bacterial vaginosis. JCMB 2016, 54, 1017–1024. [Google Scholar] [CrossRef]
- Liang, H.; Tsui, B.Y.; Ni, H.; Valentim, C.C.S.; Baxter, S.; Liu, G.; Cai, W.; Kermany, D.S.; Sun, X.; Chen, J.; et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat. Med. 2019, 25, 433. [Google Scholar] [CrossRef]
- Bramer, M. Principles of Data Mining; Springer: London, UK, 2007; Volume 180. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. JMLR 2003, 3, 1157–1182. [Google Scholar] [CrossRef][Green Version]
- Beck, D.; Foster, J.A. Machine learning classifiers provide insight into the relationship between microbial communities and bacterial vaginosis. BioData Min. 2015, 8, 23. [Google Scholar] [CrossRef][Green Version]
- Baker, Y.S.; Beck, D.; Agrawal, R.; Dozier, G.; Foster, J.A. Detecting Bacterial Vaginosis using machine learning. In Proceedings of the 2014 ACM Southeast Regional Conference, Kennesaw, GA, USA, 28–29 March 2014. [Google Scholar] [CrossRef]
- Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, The University of Waikato, Hamilton, New Sealand, 1999. [Google Scholar]
- Ravel, J.; Gajer, P.; Abdo, Z.; Schneider, G.M.; Koenig, S.S.K.; McCulle, S.L.; Karlebach, S.; Gorle, R.; Russell, J.; Tacket, C.O.; et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 2010, 108 (Suppl. 1), 4680–4687. [Google Scholar] [CrossRef]
- Lee, M.Y.; Yang, C.S. Entropy-based feature extraction and decision tree induction for breast cancer diagnosis with standardized thermograph images. CMPBM 2010, 100, 269–282. [Google Scholar] [CrossRef] [PubMed]
- Kuhn, M. Building predictive models in R using the caret package. JSS 2008, 28, 1–26. [Google Scholar] [CrossRef]
- Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
- ROMANSKI, P.; KOTTHOFF, L.; KOTTHOFF, M.L. FSelector: Selecting Attributes. R Package Version 0.31. 2018. Available online: https://CRAN.R-project.org/package=FSelector (accessed on 13 January 2020).
- Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. JBMI 2018, 85, 189–203. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques, 3rd. ed.; University of Illinois at Urbana-Champaign: Champaign, IL, USA; Simon Fraser University: Burnaby, BC, Canada; Elsevier: Amsterdam, The Netherlands, 2011; ISBN 978-0-12-381479-1. [Google Scholar]
- Aggarwal, C.C. Data Classification: Algorithms and Applications; eBook; Chapman & Hall/CRC: Boca Raton, FL, USA, 2014; ISBN 978-1-4665-8675-8. [Google Scholar]
- Wang, H.; Zheng, B.; Yoon, S.W.; Ko, H.S. A support vector machine-based ensemble algorithm for breast cancer diagnosis. EJOR 2018, 267, 687–699. [Google Scholar] [CrossRef]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 27. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for cox’s proportional hazards model via coordinate descent. JSS 2011, 39, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Torgo, L. Data Mining with R: Learning with Case Studies; eBook; Chapman and Hall/CRC: New York, NY, USA, 2010; ISBN 9780429292859. [Google Scholar]
- Witten, I.H.; Frank, E. Data mining: Practical machine learning tools and techniques with Java implementations. ACM Sigmod. Record. 2002, 31, 76–77. [Google Scholar] [CrossRef]


| Features | Represents | 
|---|---|
| BV | 1 = positive or 2 = negative for bacterial vaginosis. | 
| EthnicGroup | Ethnic group to which the test subject belongs. It can take on the values of 1 = Asian, 2 = African American, 3 = Hispanic, and 4 = White. | 
| pH | Degree of alkalinity or acidity of a sample. | 
| NugentScore | Scoring system for vaginal swabs to diagnose bacterial vaginosis (BV): 7 to 10 is consistent with BV+. | 
| NugentScore_Cat | Nugent score grouping according to their values. | 
| CommunityGroup | Microbial community to which the test subject belongs. | 
| Megasphaera, Eggerthella, Lactobacillus crispatus, and others (247 features) | Count of microorganisms in the vaginal analysis obtained using the qPCR technique on the 16S rRNA gene. | 
| Features | Run1 | Run2 | Run3 | … | Run300 | MIV | 
|---|---|---|---|---|---|---|
| Var1 | Importance | Importance | Importance | … | Importance | MIV_Var1 | 
| Var2 | Importance | Importance | Importance | … | Importance | MIV_Var2 | 
| Var3 | Importance | Importance | Importance | … | Importance | MIV_Var3 | 
| … | … | … | … | … | … | … | 
| Var252 | Importance | Importance | Importance | … | Importance | MIV_Var252 | 
| Relief | Beck and Foster | Decision Trees | ||
|---|---|---|---|---|
| Features | MIV | Features | Features | MIV | 
| Nugent_score_catb b | 0.8113 | Prevotellaa | Nugent_score b | 100 | 
| Nugent_score b | 0.5565 | Dialistera | Nugent_score_catb b | 100 | 
| Prevotellaa | 0.2277 | Gardnerellab | Prevotellaa | 86.12 | 
| Megasphaeraa | 0.1899 | pH a | Dialistera | 86.09 | 
| CommunityGroupc b | 0.145 | Megasphaeraa | Gardnerellab | 78.53 | 
| pH a | 0.1414 | Atopobiuma | Megasphaeraa | 75.4 | 
| Sneathiaa | 0.1045 | Eggerthellaa | pH a | 74.62 | 
| Dialistera | 0.1 | Sneathiaa | Atopobiuma | 74.47 | 
| Eggerthellaa | 0.0977 | Peptoniphilusa | Eggerthellaa | 73.41 | 
| Ruminococcaceae3 a | 0.0925 | Parvimonasb | Sneathiaa | 72.3 | 
| Lachnospiraceae_8 | 0.0673 | Ruminococcaceae3 a | CommunityGroupc b | 69.79 | 
| Atopobiuma | 0.0565 | Lactobacillus crispatus | Parvimonas | 67.76 | 
| Peptoniphilusa | 0.0501 | Aerococcus | Ruminococcaceae3 a | 66.92 | 
| Bulleidia | 0.044 | Ruminococcaceae sedis | Peptoniphilusa | 63.85 | 
| Coriobacteriaceae_2 | 0.0401 | Lactobacillus iners | Prevotellaceae_2 | 60.68 | 
| Classifier | 252 Features | 15 Best Features Using Relief | 15 Best Features Using DTs | 15 Best Features From Beck and Foster | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Acc | Sens | Spec | Acc | Sens | Spec | Acc | Sens | Spec | |
| SVM | 0.958 | 0.988 | 0.928 | 1 | 1 | 1 | 1 | 1 | 1 | 0.881 | 0.957 | 0.805 | 
| LR | 0.713 | 0.803 | 0.623 | 0.999 | 0.999 | 1 | 1 | 1 | 1 | 0.878 | 0.955 | 0.801 | 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pérez-Gómez, J.F.; Canul-Reich, J.; Hernández-Torruco, J.; Hernández-Ocaña, B. Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms. Appl. Sci. 2020, 10, 3291. https://doi.org/10.3390/app10093291
Pérez-Gómez JF, Canul-Reich J, Hernández-Torruco J, Hernández-Ocaña B. Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms. Applied Sciences. 2020; 10(9):3291. https://doi.org/10.3390/app10093291
Chicago/Turabian StylePérez-Gómez, Jesús F., Juana Canul-Reich, José Hernández-Torruco, and Betania Hernández-Ocaña. 2020. "Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms" Applied Sciences 10, no. 9: 3291. https://doi.org/10.3390/app10093291
APA StylePérez-Gómez, J. F., Canul-Reich, J., Hernández-Torruco, J., & Hernández-Ocaña, B. (2020). Predictor Selection for Bacterial Vaginosis Diagnosis Using Decision Tree and Relief Algorithms. Applied Sciences, 10(9), 3291. https://doi.org/10.3390/app10093291
 
        

 
       