A Survey on Machine Learning Approaches for Personalized Coaching with Human Digital Twins
Abstract
1. Introduction
2. Motivation
2.1. The Virtual Fitness Coach, an Example of a Personalized Application
2.2. Preliminary Results
2.3. Research Goals
3. Context, Research Questions, and Search Strategy
3.1. Context
3.2. Research Questions
- What is the general approach used when selecting and evaluating ML algorithms?
- Which ML algorithms are relevant in personalized HDTs?
- How is the performance of ML implementations evaluated in HDTs?
- What do we know about the influence of the characteristics of human (behavior) datasets on the performance of different ML algorithms?
3.3. Search Strategy
- Defining search terms and inclusion criteria by evaluating the research questions.
- Collecting articles by selecting the right sources and performing search queries.
- Screening and selection of relevant articles by applying the inclusion criteria.
- Extending the search through the inclusion of relevant references. Repeat step 3.
- Analyzing the findings to answer the research questions and presenting the results, this is conducted in Section 5.
- Discussing and interpreting the results, this is conducted in Section 6.
3.3.1. Definition
3.3.2. Collection, Screening, and Extension
4. Related Works
4.1. Surveys on Machine Learning Applications
4.2. Surveys on Digital Twins
5. Principal Results
5.1. Findings on Machine Learning
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
5.1.1. Algorithm Selection and Classification
5.1.2. Evaluation and Improvement of Performance
5.2. Findings on Machine Learning Algorithm Selection in Human Digital Twins
5.3. Findings on Human Digital Twin Performance Evaluation
5.4. Findings on Human Digital Twin Dataset Considerations
6. Conclusions, Perspectives, and Discussion
Further Research
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DT | Digital Twin |
HDT | Human Digital Twin |
AI | Artificial Intelligence |
ML | Machine Learning |
IoT | Internet of Things |
VFC | Virtual Fitness Coach |
SL | Supervised Learning |
USL | Unsupervised Learning |
SSL | Semi-Supervised Learning |
RL | Reinforcement Learning |
AUC | Area under the curve |
CCI | correctly classified instances |
ICI | incorrectly classified instances |
RMSE | Root Mean Square Error |
MSE | Mean Squared Error |
MAE | Mean Absolute Error |
ROC | Receiver Operating Characteristics |
R2 | Coefficient of Determination |
ADA | AdaBoost Classifier |
ANN | Artificial Neural Network |
CNN | Convolutional Neural Network |
DNN | Deep Neural Network |
DTC | Decision Tree Classifier |
DTR | Decision Tree Regression |
ESN | Echo State Network |
FCN | Fully Convolutional Neural Networks |
GBA | Gradient Boost Algorithm |
GMM | Gaussian Mixture Model |
GRU | Gated Recurrent Units |
KNN | k-Neighbors Classifier |
LR | Logistic Regression Classifier |
LSPI | Least-Squares Policy Iteration |
LSTM | Long Short-Term Memory networks |
MLP | Multi Layer Perceptron |
NB | Naive Bayes |
NLR | Non-linear regression |
NN | Neural Networking Classifier |
RBM | Restricted Boltzmann machine |
RF | Random Forest Classifier |
RFR | Random Forest Regression |
RIDOR | Ripple Down Rule Learner |
RIPPER | Repeated Incremental Pruning to Produce Error Reduction |
RNN | Recurrent Neural Network |
SARIMAX | Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors |
SGD | Stochastic Gradient Descent Classifier |
SMOTE | Synthetic Minority Over-sampling Technology |
SVC | Support Vector Classifier |
WNN | Wavelet Neural Networks |
Appendix A. List of Features
Publication Date | Dataset Type Dataset Size | Algorithms Evaluation Method/Metric | Results |
---|---|---|---|
[21] 2018-02 | Step data collected over 12 weeks, resulting in 349,920 measurements. | ADA, DTC, KNN, LR, NN, RF, SGD, SVC. | On average, the personalized models outperform the general models, with the RF algorithm achieving the highest average accuracy of 93%. But there is a significant spread in optimal model when looking at an individual level. |
48 participants. | Accuracy and F1-score of classification using generalized or personal models. | ||
[71] 2018-07 | N/A | N/A | This survey focuses on the different wearable devices available for self-health tracking. |
[77] 2018-07 | Per patient 275 feature maps based on MRI scans were created, of which 52 were selected. | CNN, RF. | In this study, an RF-based segmentation method is compared to more commonly used CNN models and shown to provide good results. |
257 patients. | Confusion matrix. | ||
[72] 2018-10 | Simulated activity data with 3 different user profiles. | Q-learning, LSPI, combined with K-medoids clustering. | This study compares batch learning to online learning but additionally shows that a clustering approach can achieve good results compared to individual and non-personalized learning approaches. |
Simulated data for 100 users. | Cumulative reward. | ||
[48] 2019-03 | 25 features describing blood values. | NB, KNN, RF. | RF classifier performs best on the given dataset, but there was only a single point of evaluation, and no investigation of time development. |
400 instances. | Accuracy, precision, recall, F-measures, and execution time. | ||
[53] 2019-07 | Ten phone log datasets, with 55,105 phone call activities and metadata. | ZeroR, NB, DTC, RF, SVM, KNN, ADA, LR, RIPPER, RIDOR, ANN. | The results of this study show that tree-based models yield higher prediction results for context-aware smartphone usage models. In comparison, neural network-based models do not achieve the same prediction accuracy. One reason for this could be the limited number of samples in the individual phone usage data. |
10 individuals. | Precision, recall, F-measures, kappa, CCI, ICI, ROC, MAE and RMSE. | ||
[64] 2019-08 | ECG data stream. | CNN. | In this article, a proof of concept is examined. Although the authors discuss the options of other models that the CNN model used, no evaluation of other models is performed. |
Data from 200 patients. | Accuracy. |
Publication Date | Dataset Type Dataset Size | Algorithms Evaluation Method/Metric | Results |
---|---|---|---|
[42] 2019-12 | Physical information such as gender, blood pressure, and cholesterol was used with the RF classifier. Facial image data was used with K-Means and 2 CNN classifiers. | RF, CNN, K-Means. | No evaluation or justification of the selected algorithms is given. |
10 participants. | Accuracy. | ||
[70] 2020-01 | Various datasets concerning anomalous behavior detection for elderly care. | Wide range of classification techniques. | This survey presents a list of pros and cons for investigated methods but no explicit comparison. |
Various. | Accuracy, precision, recall and F-scores. | ||
[11] 2020-02 | 22 features, containing athletes’ activity, mood, and energy intake data. | KNN, SVM. | SVM classifiers are more robust but achieve worse performance than KNN classifiers. |
11 participants, 10 measurements consisting of data over 3 days per participant, resulting in 110 data vectors. | Classification loss. | ||
[28] 2020-04 | N/A | RL algorithms. | This survey provides an overview of work that employs RL for personalization. |
N/A | |||
[69] 2020-04 | MRI scan. | N/A | The article proposes an architecture for a digital twin of the behavior of lung cancer in patients but provides no technical details. |
N/A | |||
[37] 2020-10 | N/A | SVM, CNN, KNN, Trees, RNN, LR, GMM, NN, LSTM, ESN, WNN, non-linear regression, FCN, NB. | This survey gives an overview of the different models used in current studies, and also the types of wearable devices used. |
N/A | |||
[67] 2020-10 | N/A | N/A | This study describes the architecture for a DT aimed at providing precision medicine for MS patients. |
Publication Date | Dataset Type Dataset Size | Algorithms Evaluation Method/Metric | Results |
---|---|---|---|
[56] 2021-12 | Electroencephalography, one channel vertical electro-oculogram and one channel chin electro-myogram. | RF, LR, SVM, C5.0. | The study validated the proof of concept of a Digital Twin using an EEG headset. The SVM classifier was found to achieve the highest accuracy, but no in-depth evaluation of the different algorithms was provided. |
Data from 48 stroke survivors. | Accuracy, sensitivity, specificity, precision, AUC. | ||
[76] 2021-12 | MIT-BIH Arrhythmia Database. | CNN, LSTM, MLP, SVC, LR. | The metrics of the different classifiers are compared with each other, showing that NN-based classifiers outperformed the others for some metrics, but the LSTM classifier had the best results for macro and weighted average for precision, recall, and F1-score. |
48 half-hour excerpts of two-channel ambulatory ECG recordings. | Most metrics applicable to classification. | ||
[49] 2022-02 | Electrocardiogram data. | Wide range of unsupervised methods. | Several clustering techniques are compared, and different applications of the clustering results are discussed. The different algorithms are compared, but only general use cases are suggested. No specific selection is proposed. |
Ranging between 2 and 500. | Accuracy. | ||
[73] 2022-03 | Age and 7 risk factors consisting of a combination of physiological data and behavior data. | LR. | The article focuses on the implementation of combining mechanical models with ML models and does not provide detailed information on the ML models used. |
N/A | F1-Score, AUC and accuracy. | ||
[74] 2022-03 | 16 biomarkers collected from serum and urine. Extra information, including assessment of radiographs of knees and hands, MRIs, and CT scans of the knees, and outcomes of physical examinations and questionnaires. | K-Means, followed by RF. | The combination of clustering followed by RF classification makes it possible to determine which variables determine the cluster membership. |
297 patients. | Cluster stability by expanding the number of clusters from 3 to 5. | ||
[15] 2022-05 | N/A | N/A | In this survey, the distinction between DT and HDT is identified, and several additional design requirements are introduced. |
Publication Date | Dataset Type Dataset Size | Algorithms Evaluation Method/Metric | Results |
---|---|---|---|
[23] 2022-07 | N/A | N/A | This study provides key design features and an architectural framework to implement HDTs. It also highlights some technical challenges. |
[75] 2023-01 | 51 plasma EV-miRNAs. | K-Means. | In this study, clustering is used to identify lung health. The focus lies on the usability of the given dataset to achieve this goal, but it does not expand on the ML techniques used. |
656 patients. | N/A | ||
[68] 2023-02 | Weight, activity and diet. | SARIMAX, LSTM, GRU, Transformer. | Based on computational time and the RMSE results, this study concludes that GRU or LSTM is best to be used in a production environment. |
10 participants, 100 days. | RMSE and computational time. | ||
[66] 2023-03 | Blood samples, demographic, anthropometric, and clinical data. | LR, DTC, RFR, GBA. | Although the results for the individual models are presented by the authors, rather than selecting one optimal algorithm, the approach in this article is to combine the results of the four different models. |
116 participants, 52 healthy women and 64 with breast cancer. | MSE, RMSE, MAE, and R2. | ||
[65] 2023-06 | Facial imaging and body movement data. | Bagged trees, KNN, SVM (linear and cubic). | The authors perform an in-depth analysis of the data structure and qualities of the models and conclude Due to the nature of the data, the non-linear algorithms produced consistent findings. All the classification methods consistently performed worse than the Bagged Trees and k-NN. |
17 participants, performing 6 emotional states. | Accuracy, precision, recall, and F1-score. | ||
[78] 2024-09 | Step data. | Manual clustering and RF. | This study demonstrates that a clustering approach to personalization is a viable concept that can reduce computational time without significant loss of accuracy. |
43 | Computational time, accuracy, and F1-score. | ||
[50] 2024-09 | N/A | Several clustering, dimensionality reduction, and anomaly detection techniques. | This review demonstrates the applicability of UL techniques to improve precision medicine applications. |
Accuracy. |
References
- Rosen, R.; von Wichert, G.; Lo, G.; Bettenhausen, K.D. About The Importance of Autonomy and Digital Twins for the Future of Manufacturing. IFAC-PapersOnLine 2015, 48, 567–572. [Google Scholar] [CrossRef]
- Fuller, A.; Fan, Z.; Day, C.; Barlow, C. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE Access 2020, 8, 108952–108971. [Google Scholar] [CrossRef]
- Barricelli, B.R.; Casiraghi, E.; Fogli, D. A survey on digital twin: Definitions, characteristics, applications, and design implications. IEEE Access 2019, 7, 167653–167671. [Google Scholar] [CrossRef]
- Wang, B.; Zhou, H.; Yang, G.; Li, X.; Yang, H. Human Digital Twin (HDT) Driven Human-Cyber-Physical Systems: Key Technologies and Applications. Chin. J. Mech. Eng. (Engl. Ed.) 2022, 35, 11. [Google Scholar] [CrossRef]
- Mihai, S.; Yaqoob, M.; Hung, D.V.; Davis, W.; Towakel, P.; Raza, M.; Karamanoglu, M.; Barn, B.; Shetve, D.; Prasad, R.V.; et al. Digital Twins: A Survey on Enabling Technologies, Challenges, Trends and Future Prospects. IEEE Commun. Surv. Tutor. 2022, 24, 2255–2291. [Google Scholar] [CrossRef]
- Guo, J.; Lv, Z. Application of Digital Twins in multiple fields. Multimed. Tools Appl. 2022, 81, 26941–26967. [Google Scholar] [CrossRef]
- Adeniyi, A.O.; Arowoogun, J.O.; Okolo, C.A.; Chidi, R.; Babawarun, O. Ethical considerations in healthcare IT: A review of data privacy and patient consent issues. World J. Adv. Res. Rev. 2024, 21, 1660–1668. [Google Scholar] [CrossRef]
- Rajput, D.; Wang, W.J.; Chen, C.C. Evaluation of a decided sample size in machine learning applications. BMC Bioinform. 2023, 24, 48. [Google Scholar] [CrossRef]
- Ding, X.; Gan, Q.; Bahrami, S. A systematic survey of data mining and big data in human behavior analysis: Current datasets and models. Trans. Emerg. Telecommun. Technol. 2022, 33, e4574. [Google Scholar] [CrossRef]
- Tucker, A.; Wang, Z.; Rotalinti, Y.; Myles, P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 2020, 3, 147. [Google Scholar] [CrossRef]
- Barricelli, B.R.; Casiraghi, E.; Gliozzo, J.; Petrini, A.; Valtolina, S. Human Digital Twin for Fitness Management. IEEE Access 2020, 8, 26637–26664. [Google Scholar] [CrossRef]
- Lin, Y.; Chen, L.; Ali, A.; Nugent, C.; Ian, C.; Li, R.; Gao, D.; Wang, H.; Wang, Y.; Ning, H. Human Digital Twin: A Survey. arXiv 2022, arXiv:2212.05937. [Google Scholar] [CrossRef]
- Shengli, W. Is Human Digital Twin possible? Comput. Methods Programs Biomed. Update 2021, 1, 100014. [Google Scholar] [CrossRef]
- Alazab, M.; Khan, L.U.; Koppu, S.; Ramu, S.P.; M, I.; Boobalan, P.; Baker, T.; Maddikunta, P.K.R.; Gadekallu, T.R.; Aljuhani, A. Digital Twins for Healthcare 4.0—Recent Advances, Architecture, and Open Challenges. IEEE Consum. Electron. Mag. 2022, 12, 29–37. [Google Scholar] [CrossRef]
- Lauer-Schmaltz, M.W.; Cash, P.; Hansen, J.P.; Maier, A. Designing Human Digital Twins for Behaviour-Changing Therapy and Rehabilitation: A Systematic Review. Proc. Des. Soc. 2022, 2, 1303–1312. [Google Scholar] [CrossRef]
- Ahmadi-Assalemi, G.; Al-Khateeb, H.; Maple, C.; Epiphaniou, G.; Alhaboby, Z.A.; Alkaabi, S.; Alhaboby, D. Digital Twins for Precision Healthcare. In Cyber Defence in the Age of AI, Smart Societies and Augmented Humanity; Jahankhani, H., Kendzierskyj, S., Chelvachandran, N., Ibarra, J., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 133–158. [Google Scholar] [CrossRef]
- Blok, J.; Dol, A.; Dijkhuis, T. Toward a Generic Personalized Virtual Coach for Self-management: A Proposal for an Architecture. In Proceedings of the 9th International Conference on eHealth, Telemedicine, and Social Medicine 2017, Hanze University of Applied Sciences, Nice, France, 19–23 March 2017. [Google Scholar]
- Dijkhuis, T.; Blok, J.; Velthuijsen, H. Virtual Coach: Predict Physical Activity Using A Machine Learning Approach. In Proceedings of the eTELEMED 2018: The Tenth International Conference on eHealth, Telemedicine, and Social Medicine, Hanze University of Applied Sciences, Rome, Italy, 25–29 March 2018. [Google Scholar]
- Schoeppe, S.; Alley, S.; Van Lippevelde, W.; Bray, N.A.; Williams, S.L.; Duncan, M.J.; Vandelanotte, C. Efficacy of interventions that use apps to improve diet, physical activity and sedentary behaviour: A systematic review. Int. J. Behav. Nutr. Phys. Act. 2016, 13, 1–26. [Google Scholar] [CrossRef]
- Hardeman, W.; Houghton, J.; Lane, K.; Jones, A.; Naughton, F. A systematic review of just-in-time adaptive interventions (JITAIs) to promote physical activity. Int. J. Behav. Nutr. Phys. Act. 2019, 16, 31. [Google Scholar] [CrossRef]
- Dijkhuis, T.B.; Blaauw, F.J.; van Ittersum, M.W.; Velthuijsen, H.; Aiello, M. Personalized physical activity coaching: A machine learning approach. Sensors 2018, 18, 623. [Google Scholar] [CrossRef]
- Chen, J.; Yi, C.; Okegbile, S.D.; Cai, J.; Shen, X. Networking Architecture and Key Supporting Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2023, 26, 706–746. [Google Scholar] [CrossRef]
- Okegbile, S.D.; Cai, J.; Yi, C.; Niyato, D. Human Digital Twin for Personalized Healthcare: Vision, Architecture and Future Directions. IEEE Netw. 2022, 37, 262–269. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, L.; Yang, Y.; Zhou, L.; Ren, L.; Wang, F.; Liu, R.; Pang, Z.; Deen, M.J. A Novel Cloud-Based Framework for the Elderly Healthcare Services Using Digital Twin. IEEE Access 2019, 7, 49088–49101. [Google Scholar] [CrossRef]
- Alzubi, J.; Nayyar, A.; Kumar, A. Machine Learning from Theory to Algorithms: An Overview. J. Phys. Conf. Ser. 2018, 1142, 012012. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
- den Hengst, F.; Grua, E.M.; el Hassouni, A.; Hoogendoorn, M. Reinforcement learning for personalization: A systematic literature review. Data Sci. 2020, 3, 107–147. [Google Scholar] [CrossRef]
- Yoo, I.; Alafaireet, P.; Marinov, M.; Pena-Hernandez, K.; Gopidi, R.; Chang, J.F.; Hua, L. Data Mining in Healthcare and Biomedicine: A Survey of the Literature. J. Med. Syst. 2012, 36, 2431–2448. [Google Scholar] [CrossRef]
- Semeraro, C.; Lezoche, M.; Panetto, H.; Dassisti, M. Digital twin paradigm: A systematic literature review. Comput. Ind. 2021, 130, 103469. [Google Scholar] [CrossRef]
- Miller, M.E.; Spatz, E. A unified view of a human digital twin. Hum.-Intell. Syst. Integr. 2022, 4, 23–33. [Google Scholar] [CrossRef]
- El Saddik, A. Digital Twins: The Convergence of Multimedia Technologies. IEEE Multimed. 2018, 25, 87–92. [Google Scholar] [CrossRef]
- Kamali, M.E.; Angelini, L.; Caon, M.; Carrino, F.; Röcke, C.; Guye, S.; Rizzo, G.; Mastropietro, A.; Sykora, M.; Elayan, S.; et al. Virtual Coaches for Older Adults’ Wellbeing: A Systematic Review. IEEE Access 2020, 8, 101884–101902. [Google Scholar] [CrossRef]
- Bruynseels, K.; de Sio, F.S.; van den Hoven, J. Digital Twins in health care: Ethical implications of an emerging engineering paradigm. Front. Genet. 2018, 9, 31. [Google Scholar] [CrossRef] [PubMed]
- Chatterjee, A.; Prinz, A.; Gerdes, M.; Martinez, S. Digital Interventions on Healthy Lifestyle Management: Systematic Review. J. Med. Internet Res. 2021, 23, e26931. [Google Scholar] [CrossRef] [PubMed]
- De Maeyer, C.; Markopoulos, P. Are Digital Twins Becoming Our Personal (Predictive) Advisors?: ‘Our Digital Mirror of Who We Were, Who We Are and Who We Will Become’. In Proceedings of the Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Copenhagen, Denmark, 19–24 July 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 250–268. [Google Scholar] [CrossRef]
- Gámez Díaz, R.; Yu, Q.; Ding, Y.; Laamarti, F.; El Saddik, A. Digital Twin Coaching for Physical Activities: A Survey. Sensors 2020, 20, 5936. [Google Scholar] [CrossRef]
- Minerva, R.; Lee, G.M.; Crespi, N. Digital Twin in the IoT Context: A Survey on Technical Features, Scenarios, and Architectural Models. Proc. IEEE 2020, 108, 1785–1824. [Google Scholar] [CrossRef]
- Samuel, A.L. Some Studies in Machine Learning Using the Game of Checkers. IBM J. Res. Dev. 1959, 3, 210–229. [Google Scholar] [CrossRef]
- Mitchell, T.M.T.M. Machine Learning; McGraw-Hill Science: New York, NY, USA, 1997; pp. 1–432. [Google Scholar]
- Huijzer, R.; Blaauw, F.; den Hartigh, R.J.R. SIRUS.jl: Interpretable Machine Learning via Rule Extraction. J. Open Source Softw. 2023, 8, 5786. [Google Scholar] [CrossRef]
- Abeydeera, S.S.; Bandaranayake, M.; Karunarathna, H.U.; Pallewatta, S.; Dharmasiri, P.; Gunathilake, B.; Saparamadu, S.; Senanayake, B.; Jayawardena, C. Smart Mirror with Virtual Twin. In Proceedings of the 2019 International Conference on Advancements in Computing, ICAC 2019, Malabe, Sri Lanka, 5–7 December 2019; pp. 238–243. [Google Scholar] [CrossRef]
- Bouchlaghem, Y.; Akhiat, Y.; Amjad, S. Feature Selection: A Review and Comparative Study. In Proceedings of the E3S Web of Conferences, Istanbul, Turkey, 12–14 May 2022; EDP Sciences: Paris, France, 2022; Volume 351. [Google Scholar] [CrossRef]
- Lötsch, J.; Ultsch, A. Enhancing Explainable Machine Learning by Reconsidering Initially Unselected Items in Feature Selection for Classification. BioMedInformatics 2022, 2, 701–714. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M.; Pei, J. Cluster Analysis: Basic Concepts and Methods. Data Min. 2012, 443–495. [Google Scholar] [CrossRef]
- Min, Q.; Lu, Y.; Liu, Z.; Su, C.; Wang, B. Machine Learning based Digital Twin Framework for Production Optimization in Petrochemical Industry. Int. J. Inf. Manag. 2019, 49, 502–519. [Google Scholar] [CrossRef]
- Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
- Devika, R.; Avilala, S.V.; Subramaniyaswamy, V. Comparative Study of Classifier for Chronic Kidney Disease prediction using Naive Bayes, KNN and Random Forest. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 27–29 March 2019; pp. 679–684. [Google Scholar] [CrossRef]
- Nezamabadi, K.; Sardaripour, N.; Haghi, B.; Forouzanfar, M. Unsupervised ECG Analysis: A Review. IEEE Rev. Biomed. Eng. 2022, 16, 208–224. [Google Scholar] [CrossRef] [PubMed]
- Trezza, A.; Visibelli, A.; Roncaglia, B.; Spiga, O.; Santucci, A. Unsupervised Learning in Precision Medicine: Unlocking Personalized Healthcare through AI. Appl. Sci. 2024, 14, 9305. [Google Scholar] [CrossRef]
- Chatterjee, A.; Pahari, N.; Prinz, A.; Riegler, M. Machine learning and ontology in eCoaching for personalized activity level monitoring and recommendation generation. Sci. Rep. 2022, 12, 19825. [Google Scholar] [CrossRef] [PubMed]
- Wu, N.; Green, B.; Ben, X.; O’Banion, S. Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. arXiv 2020. [Google Scholar] [CrossRef]
- Sarker, I.H.; Kayes, A.S.; Watters, P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J. Big Data 2019, 6, 57. [Google Scholar] [CrossRef]
- Jianshan Sun Zhiqiang Tian, Y.F.J.G.; Liu, C. Digital twins in human understanding: A deep learning-based method to recognize personality traits. Int. J. Comput. Integr. Manuf. 2021, 34, 860–873. [Google Scholar] [CrossRef]
- Lee, M.C.; Lin, J.C.; Gan, E.G. ReRe: A Lightweight Real-Time Ready-to-Go Anomaly Detection Approach for Time Series. In Proceedings of the 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain, 13–17 July 2020; pp. 322–327. [Google Scholar] [CrossRef]
- Hussain, I.; Hossain, M.A.; Park, S.J. A Healthcare Digital Twin for Diagnosis of Stroke. In Proceedings of the 2021 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), Dhaka, Bangladesh, 4–5 December 2021; pp. 18–21. [Google Scholar] [CrossRef]
- Villamizar, H.; Kalinowski, M.; Lopes, H.; Mendez, D. Identifying concerns when specifying machine learning-enabled systems: A perspective-based approach. J. Syst. Softw. 2024, 213, 112053. [Google Scholar] [CrossRef]
- Hancer, E.; Xue, B.; Zhang, M. A survey on feature selection approaches for clustering. Artif. Intell. Rev. 2020, 53, 4519–4545. [Google Scholar] [CrossRef]
- Yakovyna, V.; Shakhovska, N.; Szpakowska, A. A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction. Sci. Rep. 2024, 14, 9782. [Google Scholar] [CrossRef]
- Singh, A.; Thakur, N.; Sharma, A. A review of supervised machine learning algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [Google Scholar]
- Monteiro, J.P.; Ramos, D.; Carneiro, D.; Duarte, F.; Fernandes, J.M.; Novais, P. Meta-learning and the new challenges of machine learning. Int. J. Intell. Syst. 2021, 36, 6240–6272. [Google Scholar] [CrossRef]
- Brazdil, P.; Gama, J.; Henery, B. Characterizing the applicability of classification algorithms using meta-level learning. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 1994, 784, 83–102. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
- Martinez-Velazquez, R.; Gamez, R.; El Saddik, A. Cardio Twin: A Digital Twin of the human heart running on the edge. In Proceedings of the 2019 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Istanbul, Turkey, 26–28 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Amara, K.; Kerdjidj, O.; Ramzan, N. Emotion Recognition for Affective Human Digital Twin by Means of Virtual Reality Enabling Technologies. IEEE Access 2023, 11, 74216–74227. [Google Scholar] [CrossRef]
- Moztarzadeh, O.; Jamshidi, M.B.; Sargolzaei, S.; Jamshidi, A.; Baghalipour, N.; Malekzadeh Moghani, M.; Hauer, L. Metaverse and Healthcare: Machine Learning-Enabled Digital Twins of Cancer. Bioengineering 2023, 10, 455. [Google Scholar] [CrossRef]
- Petrova-Antonova, D.; Spasov, I.; Krasteva, I.; Manova, I.; Ilieva, S. A digital twin platform for diagnostics and rehabilitation of multiple sclerosis. In Proceedings of the Computational Science and Its Applications–ICCSA 2020: 20th International Conference, Cagliari, Italy, 1–4 July 2020; pp. 503–518. [Google Scholar] [CrossRef]
- Abeltino, A.; Bianchetti, G.; Serantoni, C.; Riente, A.; De Spirito, M.; Maulucci, G. Putting the Personalized Metabolic Avatar into Production: A Comparison between Deep-Learning and Statistical Models for Weight Prediction. Nutrients 2023, 15, 1199. [Google Scholar] [CrossRef]
- Angulo, C.; Gonzalez-Abril, L.; Raya, C.; Ortega, J.A. A Proposal to Evolving Towards Digital Twins in Healthcare. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Granada, Spain, 6–8 May 2020; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12108 LNBI, pp. 418–426. [Google Scholar] [CrossRef]
- Deep, S.; Zheng, X.; Karmakar, C.; Yu, D.; Hamey, L.G.C.; Jin, J. A Survey on Anomalous Behavior Detection for Elderly Care Using Dense-Sensing Networks. IEEE Commun. Surv. Tutor. 2020, 22, 352–370. [Google Scholar] [CrossRef]
- Dias, D.; Cunha, J.P.S. Wearable Health Devices—Vital Sign Monitoring, Systems and Technologies. Sensors 2018, 18, 2414. [Google Scholar] [CrossRef]
- el Hassouni, A.; Hoogendoorn, M.; van Otterlo, M.; Barbaro, E. Personalization of Health Interventions using Cluster-Based Reinforcement Learning. In Proceedings of the PRIMA 2018: Principles and Practice of Multi-Agent Systems: 21st International Conference, Tokyo, Japan, 29 October–2 November 2018; pp. 467–475. [Google Scholar] [CrossRef]
- Herrgårdh, T.; Hunter, E.; Tunedal, K.; Örman, H.; Amann, J.; Navarro, F.A.; Martinez-Costa, C.; Kelleher, J.D.; Cedersund, G. Digital Twins and Hybrid Modelling for Simulation of Physiological Variables and Stroke Risk; Cold Spring Harbor Laboratory: Laurel Hollow, NY, USA, 2022. [Google Scholar] [CrossRef]
- Angelini, F.; Widera, P.; Mobasheri, A.; Blair, J.; Struglics, A.; Uebelhoer, M.; Henrotin, Y.; Marijnissen, A.C.; Kloppenburg, M.; Blanco, F.J.; et al. Osteoarthritis endotype discovery via clustering of biochemical marker data. Ann. Rheum. Dis. 2022, 81, 666–675. [Google Scholar] [CrossRef]
- Eckhardt, C.M.; Gambazza, S.; Bloomquist, T.R.; De Hoff, P.; Vuppala, A.; Vokonas, P.S.; Litonjua, A.A.; Sparrow, D.; Parvez, F.; Laurent, L.C.; et al. Extracellular Vesicle-Encapsulated microRNAs as Novel Biomarkers of Lung Health. Am. J. Respir. Crit. Care Med. 2023, 207, 50–59. [Google Scholar] [CrossRef]
- Elayan, H.; Aloqaily, M.; Guizani, M. Digital Twin for Intelligent Context-Aware IoT Healthcare Systems. IEEE Internet Things J. 2021, 8, 16749–16757. [Google Scholar] [CrossRef]
- Bonte, S.; Goethals, I.; Van Holen, R. Machine learning based brain tumour segmentation on limited data using local texture and abnormality. Comput. Biol. Med. 2018, 98, 39–47. [Google Scholar] [CrossRef] [PubMed]
- Van Buren, A.; Kwan, A.; Rietdijk, H.H.; Dijkhuis, T.B.; Conde-Cespedes, P.; Oldenhuis, H.; Trocan, M. A Clustering Approach for Personalized Coaching Applications. In Proceedings of the Advances in Computational Collective Intelligence, Leipzig, Germany, 9–11 September 2024; Nguyen, N.-T., Franczyk, B., Ludwig, A., Treur, J., Vossen, G., Kozierkiewicz, A., Eds.; pp. 351–363. [Google Scholar] [CrossRef]
- Konsolakis, K.; Banos, O.; Cabrita, M.; Hermens, H. COVID-BEHAVE dataset: Measuring human behaviour during the COVID-19 pandemic. Sci. Data 2022, 9, 754. [Google Scholar] [CrossRef] [PubMed]
- Tatti, N. Distances between Data Sets Based on Summary Statistics. J. Mach. Learn. Res. 2007, 8, 131–154. [Google Scholar]
- Banaee, H.; Ahmed, M.U.; Loutfi, A. Data mining for wearable sensors in health monitoring systems: A review of recent trends and challenges. Sensors 2013, 13, 17472–17500. [Google Scholar] [CrossRef]
- Park, Y.; Ho, J.C. Tackling Overfitting in Boosting for Noisy Healthcare Data. IEEE Trans. Knowl. Data Eng. 2021, 33, 2995–3006. [Google Scholar] [CrossRef]
- Liu, F.; Demosthenes, P. Real-world data: A brief review of the methods, applications, challenges and opportunities. BMC Med. Res. Methodol. 2022, 22, 287. [Google Scholar] [CrossRef]
- Singh, A.; Halgamuge, M.N.; Lakshmiganthan, R. Impact of Different Data Types on Classifier Performance of Random Forest, Naïve Bayes, and K-Nearest Neighbors Algorithms. Int. J. Adv. Comput. Sci. Appl. 2017, 8. [Google Scholar] [CrossRef]
- Oh, S. A new dataset evaluation method based on category overlap. Comput. Biol. Med. 2011, 41, 115–122. [Google Scholar] [CrossRef]
- Parmezan, A.R.S.; Lee, H.D.; Spolaôr, N.; Wu, F.C. Automatic recommendation of feature selection algorithms based on dataset characteristics. Expert Syst. Appl. 2021, 185, 115589. [Google Scholar] [CrossRef]
- Rietdijk, H.H.; Strijbos, D.O.; Conde-Cespedes, P.; Dijkhuis, T.B.; Oldenhuis, H.K.E.; Trocan, M. Feature Selection with Small Data Sets: Identifying Feature Importance for Predictive Classification of Return-to-Work Date after Knee Arthroplasty. Appl. Sci. 2024, 14, 9389. [Google Scholar] [CrossRef]
- Oreski, D.; Oreski, S.; Klicek, B. Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 2017, 52, 109–119. [Google Scholar] [CrossRef]
- Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef] [PubMed]
- Remeseiro, B.; Bolon-Canedo, V. A review of feature selection methods in medical applications. Comput. Biol. Med. 2019, 112, 103375. [Google Scholar] [CrossRef]
- Wang, L.; Han, M.; Li, X.; Zhang, N.; Cheng, H. Review of Classification Methods on Unbalanced Data Sets. IEEE Access 2021, 9, 64606–64628. [Google Scholar] [CrossRef]
- Al Masud, A.; Hossain, S.; Rifa, M.; Akter, F.; Zaman, A.; Farid, D.M. Meta-Learning in Supervised Machine Learning. In Proceedings of the 2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Phnom Penh, Cambodia, 2–4 December 2022; pp. 222–227. [Google Scholar] [CrossRef]
- Song, Q.; Wang, G.; Wang, C. Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognit. 2012, 45, 2672–2689. [Google Scholar] [CrossRef]
- Pio, P.B.; Rivolli, A.; Carvalho, A.C.P.L.F.d.; Garcia, L.P.F. A review on preprocessing algorithm selection with meta-learning. Knowl. Inf. Syst. 2023, 66, 1–28. [Google Scholar] [CrossRef]
Question | Search Terms |
---|---|
1 | machine learning algorithms followed by overview, selecting, optimizing, performance evaluation metrics. |
2 | Digital Twin healthcare, Human Digital Twin followed by algorithms, personalized healthcare, coaching, machine learning. |
3 | Human Digital Twin followed by performance evaluation, optimization. |
4 | human Digital Twin meta learning, machine learning followed by human behavior data sets, data set characteristics evaluation, data set meta learning. |
Category | Publications | |
---|---|---|
Machine learning algorithms | general | [39,40,41,42,43,44,45] |
selection | [2,25,26,27,46,47,48,49,50] | |
performance | [8,29,51,52,53,54,55,56,57,58,59] | |
both | [28,60,61,62,63] |
Paradigm | Description |
---|---|
Supervised Learning | In Supervised Learning, a set of labeled training data is used to learn a mapping between the input and output variables. The predictions made by the algorithm are based on historical data. |
Unsupervised Learning | In Unsupervised Learning, the aim is to discover hidden structure in unlabeled data with minimal human supervision. It is used to identify existing patterns in the data that have not been previously identified and to create rules based on them. |
Semi-Supervised Learning | In Semi-Supervised Learning, both labeled and unlabeled data are utilized to learn from the structure present in the unlabeled data and use it to improve the model’s accuracy. It is used when a large amount of unlabeled data is available and only a small amount of labeled data is available. |
Reinforcement Learning | In Reinforcement Learning, rewards and punishments are used to teach an AI agent how to behave in an environment. Learning results from interactions with the environment, rather than from labeled datasets. By taking actions that lead to the highest reward, while avoiding actions that lead to punishment, the AI agent maximizes the reward it receives from its environment. |
Objective | Type |
---|---|
Data | Dimensionality Reduction and Feature Learning |
Association Rule Learning | |
Solution | Classification |
Regression | |
Clustering | |
Reinforcement Learning | |
Anomaly Detection | |
Mixed | Deep Learning |
Category | Publications | |
---|---|---|
Machine Learning and Human Digital Twins | general | [4,5,6,12,13,14,16,22,24,35,36,38] |
single case example | [11,42,49,56,64,65,66,67] | |
personalization | [21,23,28,37,50,53,68,69,70,71,72,73,74,75] | |
performance | [15,28,48,53,68,76,77,78] |
Type | Example |
---|---|
Classification | Condition score in fitness management [11], emotion recognition in healthcare [42,65], user behavior modeling [53], disease classification [48,77], diagnose heart disease and detect heart problems [56,73,76], coaching applications [21] |
Anomaly Detection | Ischemic heart diseases and stroke detection [64], anomalous behavior detection for elderly care [70] |
Regression | Diagnosis and progression of cancer [66], metabolism models [68] |
Reinforcement Learning | Personalized medicine prescription [28], personalized health interventions [72], coaching applications [21] |
Clustering | Osteoarthritis endotype discovery [74], identifying biomarkers of lung health [75], unsupervised ECG analysis [49], personalized coaching applications [78] |
Category | Algorithms | Model Source | |
---|---|---|---|
Anomaly Detection | [64] | CNN. | Group |
[70] | Wide range of classification techniques. | N/A | |
Classification | [21] | ADA, DTC, KNN, LR, NN, RF, SGD, SVC. | Group and Individual |
[77] | CNN, RF. | Group | |
[48] | NB, KNN, RF. | Group | |
[53] | ZeroR, NB, DTC, RF, SVM, KNN, ADA, LR, RIPPER, RIDOR, ANN. | Individual | |
[42] | RF, CNN, K-Means. | Group | |
[11] | KNN, SVM. | Individual | |
[56] | RF, LR, SVM, C5.0. | Group | |
[76] | CNN, LSTM, MLP, SVC, LR. | Group | |
[73] | LR. | 4 Age Groups | |
[65] | Bagged trees, KNN, SVM (Linear and Cubic). | Group | |
Clustering | [49] | Wide range of unsupervised methods. | N/A |
[75] | K-Means. | Group | |
Clustering and Classification | [74] | K-Means, followed by RF | Group and 3 Clusters |
[78] | Manual clustering and RF. | 3 Clusters | |
Regression | [68] | SARIMAX, LSTM, GRU, Transformer. | Group |
[66] | LR, DTR, RFR, GBA. | Group | |
Reinforcement Learning | [72] | Q-learning, LSPI, combined with K-medoids clustering. | Group, Clusters, and Individual |
Category | Evaluation Metric | |
---|---|---|
Anomaly Detection | [64] | Accuracy. |
[70] | Accuracy, precision, recall and F-scores. | |
Classification | [21] | Accuracy and F1-score of classification using generalized or personal models. |
[77] | Confusion matrix. | |
[48] | Accuracy, precision, recall, F-measures, and execution time. | |
[53] | Precision, recall, F-measures, kappa, CCI, ICI, ROC, MAE, and RMSE. | |
[42] | Accuracy. | |
[11] | Classification loss. | |
[56] | Accuracy, sensitivity, specificity, precision, AUC. | |
[76] | Most metrics applicable to classification. | |
[73] | F1-score, AUC and accuracy. | |
[65] | Accuracy, precision, recall and F1-score. | |
Clustering | [49] | Accuracy. |
[75] | N/A | |
Clustering and Classification | [74] | Cluster stability by expanding the number of clusters from 3 to 5. |
[78] | Computational time, accuracy and F1-score. | |
Regression | [68] | RMSE and computational time. |
[66] | MSE, RMSE, MAE and R2. | |
Reinforcement Learning | [72] | Cumulative reward. |
Category | Publications | |
---|---|---|
Dataset considerations | General | [2,9,11,79,80,81,82,83,84] |
Feature evaluation | [70,85,86,87,88,89,90,91] | |
Meta-learning | [61,62,78,92,93,94] |
Category | Individuals | Features | Frequency | Data Type | |
---|---|---|---|---|---|
Anomaly Detection | [64] | 200 | Signal | Stream | Physiological |
[70] | N/A | N/A | Stream | Physiological+Behavior | |
Classification | [21] | 48 | 4 | Periodical | Behavior |
[77] | 257 | 52 | Single | Physiological | |
[48] | 400 | 25 | Single | Physiological | |
[53] | 10 | N/A | Periodical | Behavior | |
[42] | 10 | Image | Single | Physiological | |
[11] | 11 | 22 | Periodical | Behavior+Context | |
[56] | 48 | 3 | Stream | Physiological | |
[76] | 48 | Signal | Stream | Physiological | |
[73] | N/A | 8 | Single | Physiological+Behavior | |
[65] | 17 | 94 | Stream | Behavior+Context | |
Clustering | [49] | 2 to 500 | Signal | Stream | Physiological |
[75] | 656 | 51 | Single | Physiological | |
Clustering and Classification | [74] | 297 | 16 | Single | Physiological |
[78] | 43 | 4 | Periodical | Behavior | |
Regression | [68] | 10 | 5 | Periodical | Physiological+Behavior |
[66] | 116 | 11 | Single | Physiological+Behavior | |
Reinforcement Learning | [72] | 100 | N/A | Periodical | Behavior |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rietdijk, H.H.; Conde-Cespedes, P.; Dijkhuis, T.B.; Oldenhuis, H.K.E.; Trocan, M. A Survey on Machine Learning Approaches for Personalized Coaching with Human Digital Twins. Appl. Sci. 2025, 15, 7528. https://doi.org/10.3390/app15137528
Rietdijk HH, Conde-Cespedes P, Dijkhuis TB, Oldenhuis HKE, Trocan M. A Survey on Machine Learning Approaches for Personalized Coaching with Human Digital Twins. Applied Sciences. 2025; 15(13):7528. https://doi.org/10.3390/app15137528
Chicago/Turabian StyleRietdijk, Harald H., Patricia Conde-Cespedes, Talko B. Dijkhuis, Hilbrand K. E. Oldenhuis, and Maria Trocan. 2025. "A Survey on Machine Learning Approaches for Personalized Coaching with Human Digital Twins" Applied Sciences 15, no. 13: 7528. https://doi.org/10.3390/app15137528
APA StyleRietdijk, H. H., Conde-Cespedes, P., Dijkhuis, T. B., Oldenhuis, H. K. E., & Trocan, M. (2025). A Survey on Machine Learning Approaches for Personalized Coaching with Human Digital Twins. Applied Sciences, 15(13), 7528. https://doi.org/10.3390/app15137528