2. Methodology
This study represents a systematic review of the literature, with the main aim of analyzing and synthesizing the applications of ML algorithms in the field of geriatrics. The scientific review is based on a research process structured in several stages. Initially, the ML models used in the diagnosis, monitoring, prevention, and treatment of geriatric patients are identified. Initially, the authors identify the ML algorithms used in the field of geriatrics and analyze them in their applications. The analysis targets the performance indicators identified at the level of each study. Subsequently, a search is conducted for all selected scientific articles from the WOS database, using a series of domain-specific keywords. The period analyzed spans from 1 January 2020 to 31 May 2025. In this way, the most recent contributions in the field are highlighted. Both scientific contributions and reviews were included in the analysis, as they provide an overview of the applications used in the geriatric field that are integrated with ML components. The inclusion criteria for the study were as follows: (1) the application of ML techniques for classification or prediction in the geriatric context, (2) the identification of the performance of the models used, (3) detailing the datasets used, and (4) the clinical characteristics analyzed. Theoretical works without practical involvement, as well as articles targeting age categories other than those specifically geriatric, were excluded. The selected works were analyzed both qualitatively, from the authors’ perspective, through performance indicators, and quantitatively. For the qualitative analysis, the types of algorithms used, the clinical applicability domain, the objective pursued, and the datasets utilized were identified, all to create a technical synthesis regarding the performance indicators. For the qualitative analysis, the metrics reported in research articles were compared, with a focus on accuracy, sensitivity, and F1-score.
The quantitative analysis is ensured through detailed searching in the WOS, PubMed, Scopus, and Institute of Electrical and Electronics Engineers (IEEE) Explore databases, according to the schematically structured logic represented in
Figure 1. This diagram progressively illustrates the manner in which the Boolean expressions used in thematic searches were constructed. The four directions included the following:
Generic terms represented by “geriatr*”, which will extract all research with derivatives of this expression. These terms aim to identify all studies in the literature on this topic within the 1 January 2020–31 May 2025 range. The search aims to identify the total number of papers, which will later be compared to the number specifically targeting ML components. This initial stage builds a general corpus related to GERIATRICS in the current scientific literature.
Subsequently, thematic filtering is applied to subdomains. The filters are DIET, NUTRITION, ELDERLY, and ML. These filters capture studies that address the interaction between nutrition and geriatric health, the elderly, and geriatrics, as well as ML and geriatrics.
The final search combines a common Boolean expression, the intersection of these components: geriatrics, diet, nutrition, elderly, and ML. The syntax is used to identify articles that trace the intersection of these four research development directions. The methodology considers a comprehensive approach by gradually combining the terms. In this way, comprehensive coverage of the literature is ensured. The other, the second role, is thematic focus through filtering based on technological terms, which allows for the extraction of works that explicitly address the application of ML in geriatrics, eliminating generalist literature. This query strategy builds an extraction of scientific data based on the distribution of ML algorithms in the subdomains of geriatrics.
The process of building the search equation was carried out using a progressive approach, employing the Boolean operators “AND” and “OR”, and the wildcard symbol “*”. In the first stage, generic terms that define the field of geriatrics were selected. In the second stage, specific terms were added to refine the search in the thematic areas of nutrition, elderly care, and ML algorithms. In the final stage, the resulting expressions were combined into a unified logical query that extracts the common articles of these domains.
The quality assessment of the studies included in the analysis was conducted using the authors’ expertise. Each article was evaluated using the dataset characteristics, performance metrics, and clinical applicability. The studies were scored on a grid with scores from 0 to 2 for each criterion, resulting in a total score that serves as a guide in the comparative qualitative analysis.
The bibliometric analysis compares these directions to highlight the following aspects:
The distribution of publication years illustrates academic interest in a general versus technological context;
The types of publications (articles, reviews, and proceedings) demonstrate the predominant nature of the research (empirical or theoretical);
WOS categories and research areas identify the interdisciplinarity of the field;
The countries and institutions involved identify regions with intense activity in geriatric research;
Scientific publishers and open-access reflect the capacity for the broad dissemination of knowledge.
The overlap of the results from the two searches highlights the differences between traditional and modern approaches in geriatrics. The first query should reflect a mature, well-developed research area. The second highlights an emerging niche focused on the applicability of ML in the nutritional assessment of elderly individuals. These comparisons will lead to a series of recommendations in the discussion section regarding future research directions. Thus, this paper is addressed to PhD students, researchers, university faculty, and the industrial sector that wishes to invest in technological development.
In this paper, the methodology focuses on the selection of articles from the WOS database. This is recognized for its multidisciplinary nature, which filters publications using specific criteria for bibliometric analysis. Additionally, comparative bibliometric analyses were added using complementary databases such as PubMed, which specializes in biomedical literature; Scopus, which covers technical fields; and IEEE, which is associated with specialized articles. For each database, all the methodological searches presented in
Figure 1 were conducted.
4. Bibliometric Analysis
4.1. Analysis of Publication-Based Metrics
From the analysis of the distribution of publications by year, an annual increase is noted, culminating in 2024 with 162 publications in the field of geriatrics that integrate ML techniques. From
Figure 3, an upward trend is observed, this increase being driven by the digitalization of medical services with the help of AI technologies in elderly care.
The analysis of the type of publication is presented in
Figure 4 and shows a dominance of original contributions, given that approximately 86.69% of all papers are original articles. These values show the researchers’ involvement in discovering new techniques that provide results through the application of clinical studies within ML algorithms. Review articles accounted for 48 out of the total 819 articles, which represents a small number of syntheses highlighting future research directions. Their presence indicates a maturation of the field of geriatrics in the context of ML. The proceedings and conference abstracts indicate a low number, suggesting a greater focus on peer-reviewed publications than on conference presentations.
In the WOS categories, there are 174 articles in Geriatrics gerontology, 91 in General internal medicine, 86 in Gerontology, and 55 in Medical informatics, with the rest distributed in other categories, as shown in
Figure 5.
The research areas overlap with the WOS categories, as can be seen in
Figure 6, where it can be observed that 177 articles were published in Geriatrics gerontology, 102 in General internal medicine, 55 in Medical informatics, 50 in Computer science, and 46 in Engineering. These values show that the two fields, geriatrics and gerontology, represent a quarter of the total publications, demonstrating the relevance of the central theme. The publications intersect between medicine and computer science due to the way geriatrics is approached through the use of ML algorithms.
The geographical distribution shows that at the top of the countries are the USA with 177 articles, China with 142, Japan with 66, and Germany with 58. The geographical distribution by country of the 932 articles is presented in
Figure 7. The fact that 40% of the total publications come from the USA and China indicates the major resources invested in medical research through a technological approach. The institutions from which these articles originate include the University of California System with 24 articles, Harvard University with 22, Harvard University Medical Affiliates with 18 articles, and the US Department of Veterans Affairs with 18 articles, as well as other research institutes and universities.
Figure 8 shows the distribution of the number of articles in relation to the institutions, highlighting the interest of prestigious universities in the studied field.
The publishers that disseminated the articles are Springer Nature with 162, Elsevier with 130, MDPI with 59, and Wiley with 55. Springer Nature and Elsevier have published over half of the articles, which demonstrates their position at the top of the academic publishers (
Figure 9).
In
Figure 10, it can be observed that most of the papers are freely accessible, which promotes the dissemination of knowledge in the medical and research communities. The predominance of all open-access (550 papers) and gold model papers, with 379 publications, suggests a preference for direct publication in journals, while the green model is represented with 333 articles, indicating researchers’ concern for depositing articles in institutional repositories or public archives.
For the 394 results from WOS, the co-occurrence map of keywords was generated using the VosViewer 1.6.20 tool. This map shows the relationships between key terms in the field of geriatrics and ML based on a repetition of 30 times, generating a grouping of three distinct clusters (
Figure 11).
The red cluster is associated with diagnosis and prediction using AI technologies. The blue cluster is associated with terms related to the management of elderly patients, while the green cluster corresponds to their health, frailty, dementia, and chronic conditions. The red cluster is the central cluster of the map and is based on advanced technologies for diagnosis and prediction, validated by results in the context of geriatrics. Terms such as dementia, older adults, and aging indicate an interest in developing systems that can anticipate early common conditions in elderly individuals. The important correlations in this cluster are ML, artificial intelligence, and validation. The blue cluster addresses the practical aspects of elderly patient care, and this cluster includes surgeries, risk management, mortality, and therapy outcomes. Terms such as hip fracture and surgery indicate the attention given to medical issues frequently encountered in elderly individuals. The important correlations in this cluster are elderly patients, mortality, and outcomes, highlighting the importance of monitoring the results of surgical therapies. The green cluster explores the overall health aspects of elderly individuals, including frailty, chronic conditions such as chronic kidney disease, and depression. The important correlations are frailty, chronic kidney disease, depression, and prevalence, suggesting that studies focus on the frequency of these diseases in the elderly. At the center of the map is the term ML, which is a central element connected with the other terms of the three clusters. This demonstrates that the notion of ML is a dominant concept and acts as a key tool in geriatric research. The map demonstrates the current research priorities, which include automated diagnosis, risk management, the impact of chronic conditions, and surgical outcomes. This information is useful for researchers in identifying unexplored areas, so they can prioritize future research directions in interdisciplinary approaches that will be disseminated in upcoming papers.
4.2. Comparative Analysis in WOS, PubMed, IEEE, and Scopus
To expand the research, a comparative analysis was conducted between the PubMed, IEEE, Scopus, and WOS databases. The nine search categories separate the field of geriatrics from the one where the field is approached from an ML perspective.
Table 8 presents the comparative analysis graph across all nine search types.
The first search included a simple search for the “geriatr*” query, which yielded 116,692 articles in PubMed, 51,911 in Scopus, 31,951 in WOS, and 1381 in IEEE. Thus, PubMed has the highest number of articles associated with the field of geriatrics. This can be explained by the nature of the database, which primarily includes articles in the medical field. At the opposite end, IEEE has a relatively small number of articles because it is focused on the technological field.
The “geriatr* AND diet” query yielded 2967 articles in PubMed, 1562 in Scopus, 403 in WOS, and 5 in IEEE. Thus, PubMed continues to offer the most results, demonstrating the variety in the medical field, unlike IEEE, which is not sufficiently well represented in the technical literature.
Similarly, the “geriatr* AND nutrition” query offered the highest values in PubMed, with 10,322 articles, followed by 3346 in Scopus, 1501 in WOS, and 16 in IEEE.
Surprisingly, the “geriatr* AND elderl*” query yielded 26,806 results in Scopus, 17,397 in PubMed, 9469 in WOS, and only 427 in IEEE, and generates the largest number of results in this category. The surprising factor stems from the large volume of results associated with this database. The search strategy of including “geriatr*” and “elderl*” in the same query was to narrow the search coverage to studies that use the specific medical terminology “geriatr*” and the more general phrase “elderl*”, focusing on those that explicitly target older people.
The “diet AND geriatr* AND machine learning” query yielded 20 articles in PubMed, 15 in WOS, 9 in Scopus, and 1 in IEEE. The category combines three fields, which explains the small number of articles. Although PubMed once again yields a large number of results, it is considerably smaller and suggests the need for intensified research.
The “nutrition AND geriatr* AND machine learning” query yielded 113 results in PubMed, 34 in Scopus, 34 in WOS, and 6 in IEEE. PubMed dominates this field due to the large number of articles related to elderly nutrition addressed with ML. However, the number of articles is very small.
The “elderl* AND geriatr* AND machine learning” query generated 327 results in Scopus, 234 articles in PubMed, 282 in WOS, and 65 in IEEE. And, this time, Scopus surprises with a larger number of results compared to other databases. The “geriatr* AND machine learning” query yielded 1800 results in PubMed, 788 in WOS, 764 in Scopus, and 234 in IEEE. Thus, PubMed has the most articles related to geriatrics and ML, followed by WOS with an almost equal number. This time, Scopus and IEEE have lower results.
The “geriatr* AND elderl* AND nutrition AND machine learning” query yielded 29 studies in PubMed, 18 in Scopus, 12 in WOS, and 2 in IEEE. This category is limited, generating only a few dozen papers. PubMed has the most results, but the number is very small.
Following a systematic search process conducted across four major databases (PubMed, Scopus, WOS, and IEEE), a total of 60 articles were initially identified, using the search terms geriatrics, elderly, nutrition, and machine learning, limited to the period 2020–2025. After removing duplicates (n = 10), 50 unique articles remained and were subjected to the title and abstract screening process (
Figure 12).
Of these, 15 articles were excluded because they did not meet the inclusion criteria, particularly due to the lack of an application of machine-learning techniques or explicit correlation with the field of nutrition in the elderly population. The remaining 35 articles were evaluated, and 10 of them were excluded because they did not provide sufficient data, had a poor design, had insufficiently described datasets, or did not directly address the relationship between nutrition, the elderly, and machine learning.
Ultimately, a total of 25 studies were included in the qualitative analysis. This approach represents the basis for the systematic evaluation of how ML methods are used to investigate nutrition and health-related aspects among the geriatric population.
4.3. Overview of ML Studies in Geriatrics
Table 9 presents a summary of scientific studies that use ML algorithms in the field of geriatrics. For each study, information was extracted regarding the dataset size and type, the validation method applied, the performance metrics obtained during the training and testing phases, and comments made on the possibility of overfitting. The synthesis provides an overview of how learning models are developed and validated in the context of an aging population.
The authors note that this comparison is difficult because the datasets are heterogeneous. Essentially, these include different data sources, with major variations in both volume and content. The clinical goals are also distinct because some studies aim to predict mortality, while others focus on malnutrition, sarcopenia, frailty, or depression. The fact that these studies target different predictions implies differences in the input variables and target labels. The lack of metric standardization also makes a direct comparison between studies impossible [
74]. Not all studies report the same metrics, and some completely omit the intermediate results. Under these conditions,
Table 9 provides a comparative overview, but interpreting the differences between the studies is not straightforward because the methodological and clinical context of each article varies.
5. Discussions
In the discussion section, the results obtained from the systematic analysis, the practical and theoretical implications of the obtained values, future research directions, a series of authors’ observations regarding the results from the literature review, and the target audience of the research will be presented.
In this paper, articles published between 1 January 2020 and 31 May 2025 were analyzed to highlight the level of research on ML algorithms in both theoretical and applied descriptive areas. The paper emphasizes a series of performances obtained in scientific contribution articles, based on the values of the performance indicators of ML algorithms.
The comparative evaluation of existing research results included documentation on the WOS, PubMed, Scopus, and IEEE Explore databases. A first search, for the “geriatr*” query, generated 116,692 papers in PubMed. Compared to this value, Scopus yielded 51,911 results, WOS 31,951 results, and IEEE 1381 results. This high value associated with the PubMed database is explained by the fact that this database predominantly contains publications from the medical field. The second search based on the “geriatr* AND diet” query yielded 2967 articles in PubMed, 1562 in Scopus, 403 in WOS, and 5 in IEEE. The third search criterion was conducted on the “geriatr* AND nutrition” query. In PubMed, this search yielded 10,332 results, in Scopus 3346, in WOS 1501, and in IEEE 16. High values are identified in PubMed Scopus, and WOS, with IEEE having the lowest values because they are associated with technical publications and include sensors, IoT technology, area-specific wearable device technologies, and technical applications that integrate ML. The fourth search on the “geriatr* AND elderl*” query yielded the following results: Scopus generated 26,806 results, PubMed 17,397, WOS 9469, and IEEE 427. These results have high values, which suggests increased interest from researchers in this field.
Furthermore, the following searches add specific ML constraints. Thus, the fifth search based on the “diet AND geriatr* AND machine learning” query provides 20 results in PubMed, 15 in WOS, 9 in Scopus, and 1 in IEEE. The sixth search on the “nutrition AND geriatr* AND machine learning” query generated 113 results in PubMed, 34 in Scopus, 34 in WOS, and 6 in IEEE. The seventh search on the “elderl* AND geriatr* AND machine learning” query yielded 327 papers in Scopus, 234 articles in PubMed, 282 in WOS, and 65 in IEEE. The eighth search on the “geriatr* AND machine learning” query showed 1800 results in PubMed, 788 in WOS, 764 in Scopus, and 234 in IEEE. The ninth search based on the “geriatr* AND elderl* AND nutrition AND machine learning” query generated 29 papers in PubMed, 18 in Scopus, 12 in WOS, and 2 in IEEE. These results suggest the need to intensify the research in the field of geriatrics, specifically regarding ML components, as the number of results remains limited regardless of the database used.
Table 10 presents a summary of the studies disseminated in the Multidisciplinary Digital Publishing Institute (MDPI). These studies focus on the field of elderly health through an approach using ML methods. These articles cover issues such as sarcopenia, frailty, delirium, mood disorders, and locomotor syndrome. These works come from open-access journals such as Sensors, Journal of Clinical Medicine, and International Journal of Environmental Research and Public Health. The fact that these articles are disseminated in such journals reflects the trend of combining sensor technology with ML models for assessment, prevention, prediction, and treatment suggestions. A common aspect of these works is the use of moderately sized datasets. Some studies use medical imaging, with an emphasis on non-invasive methods that can be easily implemented in clinical practice. The ML models are predominantly RF and LR. More complex architectures like MLP, CatBoost, or Stacking achieve accuracies exceeding 95%. The reported metrics include, in addition to accuracy, the F1-score, AUC, and specialized metrics such as Hamming Loss for classification or R
2 for regression.
The results presented in
Table 10 show that ML algorithms are tools that enable continuous monitoring, early detection, the prevention of certain behaviors, and the personalization of investigations in geriatrics. However, the lack of detailed metrics in some studies indicates the need for standardization in evaluating these models. Moreover, the lack of dataset standardization, and, on the other hand, the need to increase the volume of research, as can be seen in
Table 10, which shows a very small number of results, suggest the need to encourage researchers to explore this area. Overall, this research outlines a direction for integrating ML methods into prevention, assistance, and detection systems, as well as personalization in healthy aging.
The factors influencing the field of geriatrics are analyzed in various processes targeting environmental factors, also addressed through ML methods, such as water quality [
86,
87] or air quality [
88]. On the other hand, digital systems that introduce data scaling methods at the sensory level [
89], implemented with embedded microcontrollers [
90], allow for the remote monitoring of the elderly. Thus, two research directions are defined. The first direction focuses on research related to environmental factors that influence the quality of life of the elderly, while the second direction directly impacts how the elderly are monitored.
Thus, from the analysis of the papers, the following are observed:
The most frequently studied ML models are RF, XGBoost, and SVM. These are predominantly analyzed in the geriatric context due to their ability to operate with incomplete data. This behavior provides accurate results in order to manage complex clinical scenarios.
The subdomains of geriatrics applicable in ML include multiple applicable areas such as predicting postoperative mortality, classifying the level of frailty, identifying fall risk, monitoring nutritional status, analyzing pain, the early detection of dementia, managing patients with renal or cardiovascular insufficiency, and analyzing depression among the elderly.
The performance metrics of ML algorithms, predominantly analyzed in contribution articles, are accuracy, precision, F1-score, and AUC-ROC. In the analyzed literature, the reported values exceed 80% in most applications, which confirms the utility of the models in clinical practice. Additionally, these values also suggest the possibility of improving the metrics if specialists continue to investigate other supplementary algorithms, in addition to those preferred so far.
The geographical distribution of research indicates a concentration of scientific activity in developed countries such as the USA, China, Japan, and Germany, but also a gradual openness towards interdisciplinary international collaborations.
The type of publications shows an increased interest in empirical articles, as opposed to purely theoretical ones, an aspect highlighted by the maturation of the field and the focus on concrete clinical applications.
In relation to the RQ, the study responds as follows:
RQ1: ML algorithms are applied in various subfields of geriatrics. In the specialized literature, the postoperative mortality prediction, fall risk identification, hospitalization duration estimation, frailty level classification, nutritional status assessment, early dementia detection, and analysis of depression and overall functional status in the elderly are highlighted. These directions reflect the need for personalized care in a complex clinical context.
RQ2: The purposes of using ML algorithms are divided between classification, with directions such as differential diagnosis, functional status labeling, and prediction in applications for disease progression, recurrence risk, or post-intervention survival. Out of the total number of analyzed works, 40% aimed at classification, while 60% focused on prediction, indicating a trend towards anticipatory clinical applicability.
RQ3: The most commonly used metrics for performance evaluation are accuracy, precision, sensitivity, the F1-score, and AUC-ROC. These are directly correlated with the potential for the practical integration of applications in real-world contexts. The reported performances show that the models can support clinical decision-making, provided they are used for assistance and not for automatic decision-making in existing healthcare systems.
From the analyzed articles, 10 papers were selected that the authors consider representative of the field of geriatrics. These were qualitatively evaluated based on the dataset details, performance metrics, and clinical applicability. The scores ranged from 0 to 2, with 0 representing vague, 1 partially detailed, and 2 well-detailed (
Table 11).
The qualitative analysis is supported by 10 papers that evaluate the three reference indicators for ML tools based on the authors’ expertise.
Table 11 summarizes this evaluation based on the total score obtained for each criterion individually. This approach helps in interpreting the results regarding the need for the current systematic review.
The differences between the performance indicator values for AUC range from 0.57 to 0.98. These variations are justified by the type and quality of datasets used, the class imbalance, the degree of preprocessing applied before model training, or the complexity of the tasks addressed. Models that handle simple tasks, such as binary classification with clinical variables, perform better. Tasks involving heterogeneous variables, incomplete data, insufficient data volume, or highly complex clinical scenarios, such as predicting rare or multidimensional events, often result in lower performance. Therefore, variations in AUC reflect the quality of the model and the specific challenges inherent to each clinical application.
The paper is addressed to researchers in the field of ML and AI, who can identify, in this work, the future directions they should focus on in the study of algorithms in the geriatric field. They can also understand the technical challenges specific to this data. The paper is addressed to healthcare professionals and medical staff, who can study how these technologies can be integrated into their practice to improve clinical decisions. University faculty and PhD students can use this study as a systematic database for developing future research papers or interdisciplinary projects. In the private sector and the medical industry, ML solutions with the potential for scaling and implementation in portable devices, mobile applications, modern devices, or integrated digital health systems can be identified.
The authors believe that ML models are used in the field of geriatrics to improve diagnosis, prognosis, prevention, and the personalization of care for aging patients. Transparency regarding how these models provide suggestions is a major challenge at the moment. In the context of clinical decision-making, integrating tools like Explainable AI (XAI) into the prediction process represents a necessary future research direction to enable the understanding of the factors influencing model predictions. Implementing such mechanisms will increase clinicians’ confidence in using ML technologies. This process will facilitate its widespread adoption in medical practice.
The gaps identified in the literature represent starting points for future research. These address the following aspects:
The lack of standardization of datasets makes it impossible to replicate studies and objectively compare the performance between the models studied.
The explainability of models refers to the fact that, in most cases, the papers predominantly address quantitative metrics without providing an insight into the interpretability of the models for medical staff. In the context of geriatrics, where decisions often involve high risks, explainability should be a benchmark for future research.
Longitudinal studies are validated on cross-sectional data sets. Few studies follow patients in the long term, limiting the models’ ability to provide predictions over time. A future research direction should address the longitudinal issues of research in relation to ML.
Integration into real clinical practice represents another future direction for research, given that, although studies report good performance on limited datasets, in practice, they may encounter situations that were not considered in the datasets used for research.
Ethics and data protection in geriatrics is an extremely sensitive topic, especially in the context of AI usage, which raises a series of ethical questions related to consent, data access, algorithmic bias, and equity.
Considering these gaps, the research directions recommended by the authors of this paper are as follows:
The creation of standardized open-access databases in which the collected and labeled data are explained, so that as many studies as possible related to ML can be provided in the geriatric field.
Studying the explainability of algorithms through methods such as SHAP or Locally Interpretable Model-Agnostic Explanations (LIME), XAI, or other ML algorithms.
The implementation of prospective studies that validate the models over time and track the impact on patients’ quality of life, thus extending the studies generated by ML algorithms over a span of decades.
Interdisciplinary collaborations between engineers, programmers, geriatricians, psychologists, and medical ethics experts.
Designing integrated AI systems with interfaces that can be used by medical staff without requiring expertise in programming or data science.
Synthesizing the results analyzed in this paper, it is found that ML algorithms impact the way health policies for the elderly are approached. Thus, ML models triage patients in emergency units in the context of healthcare system overload. With the help of these algorithms, the monitoring of patients with chronic diseases is ensured through wearable devices and sensors connected to these devices [
91]. Additionally, these algorithms allocate medical resources, and, by these resources, the authors refer to beds, personnel, and equipment, with allocation being based on predictions of the patients’ health status evolution. Additionally, these algorithms personalize patient treatment by taking into account historical data, laboratory tests, lifestyle, and patient comorbidities. For these benefits to materialize, ML algorithms need to be integrated into the national electronic health systems by developing clinical guidelines that include the use of these algorithms. Additionally, medical personnel require proper training in interpreting results obtained through AI. For all these things to materialize, it is necessary that we ensure a legal and ethical framework regarding data protection.
Some ML applications in the literature reviewed have demonstrated quantifiable clinical benefits, such as reducing the hospital stay duration. For example, in the study conducted by Tian et al. [
27] on geriatric patients with hip fractures, the XGBoost model predicted the length of hospital stay with an accuracy of 92.4% and an AUC of 98.8%. These results optimize the allocation of medical resources. Another example is provided by Früh et al. [
42], who correlated ML scores with the probability of prolonged hospitalization. This approach supports early intervention, which prevents complications from arising. Such results as ML models impact clinical workflows at the geriatric level.
In the field of geriatrics, some patients suffer from cognitive impairments, which render them unable to provide informed consent regarding the use of AI. For these reasons, the patient or their relatives need to be informed that personal data will be processed through ML models. On the other hand, algorithms do not replace human clinical decision-making. These models amplify unintentional discrimination if they are trained on historical data with bias, such as treatment inequalities based on age or gender. The lack of transparency in the models, as the algorithms represent black boxes, reduces the trust of the doctor or the patient in the final decision. For this reason, it is important to mention that the professional is the one who must always make the final decision in the case of ML algorithms.
Regarding patient personal data, in Europe, the General Data Protection Regulation (GDPR) restricts the processing of sensitive data, especially if it is of a medical nature. ML models that use such data must adhere to the principle of data minimization, provide explanations for algorithmic decisions through the right to an explanation according to Article 22 of GDPR, and maintain their absolute confidentiality. In addition to the GDPR, the AI Act is being adopted, representing the first legal framework for AI that classifies medical applications as high-risk. According to this act, ML applications in geriatrics must be transparent, allow for auditability, and be subject to continuous clinical validation. Given these constraints, the authors’ ethical recommendations include explaining the algorithms to doctors, ensuring that decision support systems have logs and a decision history in a format accessible to the doctor, accompanying each ML model with formal ethical evaluations, such as Clinical Ethics Committees, and, last but not least, obtaining the patient’s or legal representative’s consent, who must be informed about how the ML algorithms will process their data.
GDPR-integrated ML applications intended for geriatrics should benefit from technical measures that enable developers of these algorithms to pseudonymize patient data, making it impossible to identify patients based on the analyzed data. Another provision should be the limitation of access to patient datasets. These should only be accessed by authorized researchers, as determined by permission control. The periodic auditing of algorithmic decisions should be another measure with which to identify potential forms of bias within the context of GDPR compliance. Another norm should be ensuring the right to an explanation under the GDPR. Methods like SHAP or LIME provide medical staff with access to the algorithm’s decision-making reasoning. This is associated with the transparency of the algorithm’s decision-making and should be a mandatory norm in the development of ML algorithms. Moreover, obtaining consent accompanied by a description of how the algorithms process data should be another mandatory norm in the development of these algorithms.
In this systematic review, both internal and external validity were established to ensure the importance of the conclusions. In this paper, a study selection methodology and a content analysis conducted by the authors based on their own expertise were used. In this section, an explicit discussion will be conducted on the validity dimension, highlighting the understanding of the limitations and generalizability of the results. Internal validity is associated with the logical coherence between the objectives of the paper, the inclusion criteria, the analysis method, and the conclusions drawn. In this study, internal validity is due to the use of a search strategy defined according to WOS rules by including terms, bulleted lists, and the temporal delimitations of 1 January 2020 to 31 May 2025. The selection of articles is based on eligibility criteria and dual analytical methods, qualitative and quantitative. The analysis of performance metrics ensures the objectivity of evaluations in relation to the performance of ML models used in the geriatric context. External validity refers to the extent to which conclusions generalize to external contexts; yet, it is limited by factors such as the studies’ origins in developed countries with an advanced IT infrastructure or the frequently heterogeneous data that do not allow for the reproducibility of models due to the lack of common standardization.
The implementation of ML algorithms in geriatrics faces an obstacle in the form of low trust from medical staff in black box models. Doctors are hesitant to use these tools, whose internal mechanism is not always explained in clinical terms. Another problem with these algorithms is the lack of justification to the patient. To overcome this obstacle, the authors recommend integrating explainable techniques such as SHAP or LIME. They interpret algorithmic decisions for clinicians. On the other hand, ML models should not be seen as replacements for medical decision-making. The authors emphasize that they recommend using these tools as complementary methods to assist data analysis specialists. Human–machine collaboration, where the final decision must remain with the medical professional, aids in clinical acceptance and the adherence to ethical principles. The development of explainable interfaces to support clinical validation is a prerequisite for integrating AI components into geriatric practice.
The authors highlight the gaps identified in the specialized literature through a series of research deficiencies represented by the lack of standardization of datasets, the lack of explanations for algorithmic decisions as they treat the issue like black boxes, the lack of longitudinal studies, the reduced integration into clinical practice, and the lack of regulations regarding ethics and data protection in AI-assisted geriatrics.
6. Conclusions
The paper is a systematic review of studies published between 1 January 2020 and 31 May 2025. These studies investigate ML-type algorithms in the field of geriatrics. The analysis reveals that the field is in an expansion phase. The annual increase in publications is approximately 18%. Interest in this field is particularly strong among researchers from the USA, China, Japan, and Germany.
A comparative analysis of the four databases revealed that PubMed provides the most results regarding research related to the field of geriatrics. Scopus stands out with its academic research-based approach, offering competitive results in the Elderly category. WOS has an intermediate position with lower results than PubMed or Scopus, but including a roughly constant number across all categories. IEEE is the most restricted platform, offering databases where the number of research papers is relatively small due to its focus on the technological field. In total, the comparative analysis highlighted the very small number of results from collaborations between geriatric specialists and those in the field of ML.
Among the analyzed algorithms, the most used are RF with over 52% of the total studies analyzed, followed by XGBoost with 24%, SVM with 16%, and other models such as KNN and Naïve Bayes. Algorithms are used in various areas of geriatrics, such as predicting postoperative mortality, where they achieve an accuracy of up to 92.4%, detecting falls with an accuracy of up to 99%, analyzing nutritional status with a sensitivity of 91%, and classifying the level of frailty with an accuracy of 95%.
The evaluation of algorithm performance is carried out using standard metrics represented by accuracy, precision, recall, sensitivity, F1-score, and AUC-ROC. Regarding accuracy, studies have reported values between 57% and 99.4%, with an average of 81.1%. Sensitivity is reported between 68% and 97%, with an average reported value of 91%. The F1-score is reported between 73% and 94%, while the AUC-ROC values range from 62.5% to 98.8%, suggesting the predictive capability of the models, especially in oncology and trauma. These results demonstrate the ability of ML algorithms in the early identification, prediction, and classification of diseases in the field of geriatrics.
In addition to these results, five deficient directions in the current literature are noted, namely, the lack of standardization of the datasets, the reduced explainability of the models, the absence of longitudinal studies, the low implementation in clinical practice, and the lack of explicit treatment of ethical and privacy issues. Based on these findings, the authors recommend the creation of open databases labeled with metadata that can be studied in various research contexts, the explicit integration of algorithmic explanations into clinical decision-making processes, the conduct of prospective studies with temporal validation and long-term patient follow-up, interdisciplinary collaboration between bioethics engineers and public policy specialists, and the adaptation of legal frameworks such as the GDPR and the AI Act to support the use of AI in geriatrics.
In conclusion, the paper demonstrates that ML algorithms contribute to the personalization, prediction, prevention, and adoption of tailored means of elderly care. With an average accuracy of over 80% in most clinical scenarios, these algorithms assist medical decision-making and optimize resource allocation in order to improve patients’ quality of life. For the integration of these models into health systems, technical validations and responsible approaches from ethical, legal, and social perspectives are necessary, aspects that can only be addressed through increased ongoing research in this field.