Next Article in Journal
Equine-Assisted Experiential Learning: A Literature Review of Embodied Leadership Development in Organizational Behavior
Previous Article in Journal
Global Research Trends on the Relationship Between Critical Thinking and Tertiary Education: A Bibliometric Analysis from the Perspective of Countries with Varying Human Development Levels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention

1
Ashford and St. Peter’s Hospitals NHS Foundation Trust, Ashford TW15 3AA, UK
2
School of Health Sciences, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
*
Author to whom correspondence should be addressed.
Adm. Sci. 2025, 15(8), 297; https://doi.org/10.3390/admsci15080297
Submission received: 19 June 2025 / Revised: 22 July 2025 / Accepted: 23 July 2025 / Published: 29 July 2025

Abstract

Background: In this work, we examine determinants of staff departure rates in the NHS, a critical issue for workforce stability and continuity of care. High turnover, particularly among clinical staff, undermines service delivery and incurs substantial replacement costs. Methods: Here, we analyse a unique dataset derived from Electronic Staff Records at Ashford and St. Peter’s NHS Foundation Trust, using a machine learning approach to move beyond traditional survey-based methods, to assess propensity to leave. Results: In addition to established predictors such as salary and length of service, we identify drivers of increased risks of staff exits, including the distance between home and workplace and, especially for medical staff, cost centre vacancy rates. Conclusions: These findings highlight the multifactorial nature of staff retention and suggest the potential of local administrative data to improve workforce planning, for example, through hyperlocal recruitment strategies. Whilst further work will be required to assess the generalisability of our findings beyond a single Trust, our analysis offers insights for NHS managers seeking to stabilise staffing levels and reduce attrition through targeted interventions beyond pay and tenure.

1. Introduction

Retention planning is a critical component of healthcare workforce management, recognised globally as a key prerequisite for high-quality patient care. In the UK, the NHS faces challenges in maintaining a stable workforce (Moscelli et al., 2024), with staff and resource shortages, job demands, work-related stress, the inability to deliver care according to professional standards, discrimination, and poor psychological health identified as some of the key factors contributing to turnover (Leary et al., 2024; A. Weyman et al., 2023; A. K. Weyman et al., 2019). High turnover rates among healthcare professionals lead to disruptions in care delivery, increased workloads for remaining staff, and low staff morale. These are all challenges currently faced by the NHS, exacerbated by the ageing workforce. This has the potential to lead to the loss of experienced personnel, eroding institutional knowledge, compromising team dynamics, and weakening the culture of safety that is essential in healthcare environments (McHugh et al., 2021; Needleman et al., 2002).
Retention is a key focus of the NHS Long Term Workforce Plan (NHS England, 2023), with an aim to drive improvements in organisational culture, leadership, and staff wellbeing, building on the principles set out in the NHS People Promise (NHS England, 2020). Effective retention strategies are essential for NHS organisations, both to reduce recruitment and onboarding costs and also to support continuity of care, improve patient outcomes, and maintain organisational morale (Tikhonovsky et al., 2023). But such strategies need to be monitored and adjusted, with some factors being national in nature and some being local and organisational (Kelly et al., 2022). In addition, understanding what factors constitute early warning signals for individual staff members’ intention to leave can help local management retain staff via more targeted interventions or support than can be delivered by a ‘one-size-fits-all’ retention strategy.
Two sets of organisational factors influencings employee job satisfaction have previously been proposed: extrinsic “hygiene factors” (salary, working conditions, etc.), which need to be adequately addressed to prevent discontentment, and intrinsic “motivation factors” (development, recognition, etc.), which increase long-term satisfaction and motivation (Herzberg, 2005; Sachau, 2007). Much work has been done on understanding key factors relating to turnover and retention amongst healthcare staff (Ahmed et al., 2022; Bimpong et al., 2020; Leary et al., 2024; Moscelli et al., 2025; A. Weyman et al., 2023; A. K. Weyman et al., 2019), and this body of literature indicates that there is an interplay of a range of individual, job-related, interpersonal, and organisational factors, with specific findings differing depending on research methodology, NHS staff groups included, and causes taken into consideration. Many publications have been based on staff surveys and therefore will suffer from subjectivity, survey fatigue, low completion rates, and recency bias (Bogner & Landrock, 2016; Byrne, 2022; McClendon, 1991), with declared intention to leave or stay in post as an outcome variable rather than actual departures. Other approaches to the analysis and prediction of staff departures have also been proposed (Bolt et al., 2022), for example, the use of fluctuation theory to understand and predict turnover. Such an approach can incorporate a full range of measures and indicators, including compensation, job satisfaction, external factors (including the job market), and life events, and model how these might feed into ‘staff departures’ as a stochastic process. These more data-driven approaches can also allow for tipping-point thresholds, such as when initial departures lead to a later surge in turnover (Nyberg & Ployhart, 2013).
The potential linkage of turnover to key data often already included in staff records is underpinned by extensive prior research in the field of electronic Human Resource Management (e-HRM) (Gardner et al., 2003; Strohmeier, 2007). In this work, we take a data-driven approach to identifying factors that influence retention, using Electronic Staff Record (ESR) data from Ashford and St. Peter’s NHS Foundation Trust (ASPH). Our aim is to explore how organisational-level data, which includes all staff groups, data relating to individual, job-related, and organisational factors, and information on whether an individual stayed in the organisation or left, could inform retention policies and whether such data can yield insights into which employees may be most at risk of departure. By demonstrating in a small pilot study that data-driven approaches can yield insight and contribute to an e-HRM-supported understanding of staff turnover, we seek to contribute to a framework for improving retention strategies in the NHS.

2. Materials and Methods

2.1. Dataset and Comparator Data

Staff data were retrieved from ASPH’s ESR and anonymised. ASPH is based in the southeast of England and operates across two main sites—St Peter’s Hospital in Chertsey and Ashford Hospital in Ashford. It is a medium-sized general trust serving a population of around 410,000 people. Records of all staff employed between 31 March 2018 and 31 March 2024 were included, with records excluded for (1) staff who ceased employment due to redundancy, death, or end of non-executive contracts for board members, and (2) rotational resident doctors, as these cycle through placements every 6–12 months. Staff were characterised as Medical, Infrastructure (estates and facilities staff, admin and clerical, and managers), Nursing/Midwives, Qualified STT (scientific, therapeutic and technical, covering all clinical staff other than doctors or nurses) and Clinical Support (including inter alia healthcare assistants, therapy assistants, phlebotomists and similar roles). The dependent variable was defined as whether or not a staff member left the Trust within the following 12 months, using a 31 March start date in each year. Only data that would have been available ‘at the time’ were included in the model, to avoid data contamination with future information, and the model was run once for each year. Employee retention is a ‘repeated measures’ problem so cumulative years in service (effectively, the total number of years of prior successful retention) were included as a covariate in the model. The variables captured are summarised in Table 1; the source for all data was the NHS Trust ESR.
In total 8753 individual records were retrieved for the dataset. Of these, 1330 were medical staff, 2478 were NHS Infrastructure Staff, 2655 were Registered Nurses & Midwives, and 2290 were in other categories (including Qualified STT and Clinical Support). In aggregate, across the series of annual models, there were 48,293 observations.
Comparator data for Ashford and St. Peter’s Hospitals NHS Foundation Trust were taken from NHS Workforce Statistics, published monthly by NHS Digital England (NHS England, 2024).

2.2. Machine Learning Methodology: Supervised Binary Classification

Following data extraction from ASPH Electronic Staff Records, a data matrix was constructed using anonymised staff-in-post identifiers (rows) versus predictor variables (columns) for machine learning. All data were de-identified before analysis to preserve anonymity. An XGBoost (eXtreme Gradient Boosting) binary classifier was chosen, due to its advantages in terms of model interpretability, for example, easily understandable feature importance and its ability to deal with multicollinearity and non-linear relationships. A training/testing split of 80:20 was used to facilitate model training and evaluation, with the random seed set to 42. Grid search cross-validation was employed to systematically search the hyperparameter space and identify the best combination of hyperparameters. Model performance was evaluated using overall accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC). Accuracy represents the proportion of correct predictions across all instances, providing an overall measure of model correctness. Sensitivity (also known as recall) quantifies the model’s ability to correctly identify individuals who departed, reflecting its effectiveness in detecting true positive cases. Specificity measures the ability of the model to correctly identify individuals who remained employed, indicating its capacity to minimise false positives. The AUC provides a summary measure of the model’s discriminative ability across all classification thresholds, with higher values indicating better overall performance in distinguishing between staff who resigned and those who did not. All measures were reported as proportions from 0 to 1. To analyse the individual features’ impact on the likelihood of a staff member leaving, feature importance within the model was assessed using SHAP values. SHAP (SHapley Additive exPlanations) values quantify each feature’s contribution to a specific model prediction by distributing the prediction difference (from a baseline). A positive SHAP value indicates that the feature pushes the model output toward a higher risk of departure, whereas a negative SHAP value suggests that it reduces the predicted risk. The SHAP values were then visualised in a beeswarm plot, providing an assessment of both overall feature importance and the directionality of each feature’s effect. Finally, partial dependence plots were generated. A partial dependence plot shows how predicted outcomes change as one input feature is varied, while holding all other features constant. This provides an interpretable curve that reveals the isolated effect of that feature on the model’s output.
All data were analysed in the Python programming language (version 3.14), with the XGBoost algorithm and partial dependence plots implemented using the sk-learn library (version 1.6.1) (Pedregosa et al., 2011).

3. Results

3.1. Overall Retention and Departure Data

Ashford and St. Peter’s NHS Foundation Trust had an average leaver rate of 11% over the period from 2018 to 2024. Whilst detailed comparator data were not available from NHS Workforce Statistics for the complete timeframe, the Trust leaving rates from 2021 to 2024 were close to the average of its ten closest ‘recommended peers’ (using the NHS Model Health System), with the most similar attributes using variables/metrics from, inter alia, finances, estates, workforces, populations, and clinical factors, as summarised in Figure 1.

3.2. Machine Learning Classification Results

An XGBoost model was used to identify the most important factors and breakpoints in predicting the likelihood of a staff member terminating employment in the Trust (12 month prospectively). Key metrics for the XGBoost classification model are shown in Table 2, in predicting whether an individual staff member would leave over the next twelve months, for the held-out test set of ESRs.
Overall, the model showed only a modest ability to predict departures, with an area under the curve (AUC) score of 0.65, where a score of 1.00 would be a perfect prediction and a score of 0.50 would represent random chance.

3.3. Key Predictors

The ten variables used as predictors of a staff member leaving the Trust are ranked by feature importance in Figure 2. The single most significant predictor was length of service, followed by age. Distance (home to work) was also notably significant.
As feature importance does not fully explain the complexity of some relationships, partial dependence plots for each predictor variable are shown in Figure 3 for the overall population of ASPH staff. A partial dependence plot fixes other variables and shows how, in isolation, the measured variable influences the probability of departure over the next 12 months.
A number of clear relationships related to individual members of staff emerged, which were largely consistent between staff types. Length of service tended to reduce the likelihood of departure. Age showed a U-shaped relationship with the probability of departure, with higher probabilities of leaving for both younger and older employees. The number of days off work due to sickness in the previous year also showed a relationship with leaving probability, albeit modest, with an increase noticeable at 8 days. Salary also played a role, with a declining probability of leaving as remuneration (and probably seniority) increased. Finally, the distance from home to work showed a notably strong relationship with the probability of leaving, with distances over five miles showing a step-change in the probability of leaving.
Turning to environmental characteristics, cost centre vacancies exhibited a weak overall relationship with leaver probability. Line manager length in post showed a weak relationship with leaver probability, with very short tenures linked to a slight increase in departures, but little effect beyond this point. The year of departure had a modest impact, with reduced rates of leaving in 2019-2020 followed by an increase, likely reflecting pandemic-related issues. Finally, increased days of training undertaken by line managers were positively associated with retention (and conversely, line managers not taking up training was associated with increased departure risk).
Data were also analysed for subsets of the population, specifically Medical staff (n = 1330), Nursing & Midwifery staff (n = 2478), Qualified STT/Clinical Support Staff (n = 2290) and Infrastructure staff (n = 2478). For some variables the trends were similar between employee groups. There were, however, more substantial differences between categories of staff related to (1) cost centre vacancies and (2) distance to work, shown here in Figure 4 and Figure 5. The weak overall relationship between vacancy rates and the probability of departure masked variation across staff groups. For Infrastructure and Nursing and Midwifery staff, vacancy rates had little impact on leaving probability, whereas Medical staff saw a significant rise in the probability of leaving as cost centre vacancies increased. For the distance to work, again, the overall relationship masked larger differences, with medical staff being relatively less sensitive to the distance to work.
Additional partial dependence plots showed less variation between staff groups but are provided for completeness in the Supplementary Materials.

4. Discussion

In this study, we show which factors contribute most to increased probabilities of staff departures. The work is not intended to suggest that machine learning can be used to predict individual reasons for staff leaving. In this sense, the algorithm is similar to car insurance models: it can identify factors that increase the likelihood or risk of an event but will not perform well in forecasting outcomes for specific individuals. The overall model showed modest predictive capabilities, with an AUC value of 0.65. Nonetheless, whilst there are too many individual specific factors that cannot practicably be included in a model, the machine learning approach utilised in this work did show which variables tended to increase the risks of departure.
Length of service showed the strongest feature importance with regard to staff retention, with newly qualified/new-in-post staff being particularly at risk. Age also played a significant role, showing a U-shaped relationship, with younger staff likely more mobile and less settled and older staff possibly facing challenges and/or issues relating to pension arrangements. These relationships are already well-described in the literature (Moscelli et al., 2024; Raman et al., 2024; Taylor et al., 2024). The distance from home to work showed a notably strong relationship with the probability of leaving and was the third most important variable in the model, with a step-change increase for the Trust’s staff at five miles. This cut off point is likely to be related to the geographical location of the Trust just outside the administrative boundaries of Greater London and the available transport links and transport costs, therefore would presumably be different for other Trusts. Medical staff were the only group that was relatively insensitive to this variable, possibly due to higher salaries and therefore smaller influence of transport costs. Nonetheless, this result indicates that Trusts could increase their focus on this as a retention factor, by targeting ‘hyperlocal’ recruitment, especially for non-Medical staff groupings. Some NHS Trusts in the UK are actively pursuing hyperlocal recruitment, with Leeds Community Healthcare NHS Trust successfully filling vacancies through a local recruitment campaign, though the impact on retention has not been reported; however, other benefits such as exceptionally high-quality applications and a positive impact on the local community by offering good work have been noted (NHS Employers, 2023). It is also well-understood that commuting times are an important contributor to work–life balance, which in turn influences employees’ commitment to the organisation and their wellbeing (Moscelli et al., 2025). Issues such as these can receive less attention than headline pay levels but can nonetheless contribute to cumulative dissatisfaction. Healthcare organisations based in rural areas are known to have particular difficulties with recruiting and retaining staff, and in these settings, recruiting locally increases retention (Abelsen et al., 2020). Conversely, Trusts operating in urban and suburban areas may be able to implement measures to address staff issues around commuting. The NHS 10 Year Health Plan (Department of Health and Social Care, 2025) recommends greater focus on recruiting staff in local communities, and our findings corroborate this as a key measure for improving retention.
The number of days off work due to sickness in the previous year also showed a relationship with leaving probability, albeit modest, with an initial increase with any sick leave and a further increase noticeable at 8 days. The relationship between sickness absences and the probability of leaving is well-evidenced (Taylor et al., 2022), with another study indicating that 3 days of absence within a month relating to mental health reasons increased the likelihood of nurses and midwives leaving the acute sector by 27% and consultants by 58%, compared to peers who did not take time off work (Kelly et al., 2022). Presenteeism may also play a role in illness or stress being poorly measured by the actual days of sick leave taken (Taylor et al., 2022). ASPH targets 10 days of sickness absence over a rolling 12-month period as a trigger for formal sickness management, typical within NHS Trusts. The data analysed here indicate that the increased probability of departure occurs sooner than this point, consistent with the fluctuation theory stance on departures not being single decisions, but an accumulation of micro-decisions driven by stress over time. This highlights the significance of supporting staff psychological health and suggests that HR managers should consider providing support for staff taking even a few days of sick leave, and that existing approaches targeting a one-size-fits-all threshold should be adapted by staff type.
Unlike other professional groups, medical staff saw a significant rise in the probability of leaving as the proportion of vacancies in their clinical area increased, indicating that overall staffing levels within their area of practice have a greater impact on retention risks for doctors than for others, potentially due to their overall clinical responsibility being more demanding. An association between senior doctor retention and the retention of co-worker nurses, particularly at pay bands 6–7, which are in operational roles, has been previously reported (Moscelli et al., 2025). High levels of vacancies are often remedied by temporary staffing which can be disruptive for permanent staff and increase their propensity to leave (Bajorek & Guest, 2019; Oliveira et al., 2023). It may also reflect tighter job markets for medical staff, making it easier for medical staff to find new roles and leave when cost centre vacancy rates are high (under-resourcing itself creates a job opportunity as gaps need to be filled, a zero-sum game for NHS Trusts overall). Cost centre vacancy rates are, of course, already measured by NHS managers, but these results illustrate the potential for the differential treatment of specific staff groups to improve retention, as well as the importance of not becoming reliant on temporary staffing.
The probability of leaving decreased with line manager tenure and the number of days of management training that an employee’s line manager had in the preceding 36 months. Although the measured line manager impact is modest compared to other factors, it is a manageable variable. Leadership in acute healthcare settings is complex and requires a multitude of competencies—investing in management training and supporting staff to develop as leaders could improve working relationships and conditions. A UK-wide study with emergency department staff found that leaders did not have protected time for performing this role and were not supported to attend leadership training as part of professional development (Daniels et al., 2024). The Messenger Review of Health and Care Leadership in England (Department of Health and Social Care, 2022) identified gaps in support for leaders and recommended focusing on more consistent and substantive career development pathways for them. This study’s findings support this as a strategy that could help support staff retention.
The work described here does have a number of limitations. First, data were provided on an anonymised and processed basis, with each year treated as an independent set of records rather than being freely available within a Secure Data Environment, which limited the methods that could be applied. Second, whilst clear associations were shown, it is inherent to staff retention that data will be noisy and that some staff will leave for reasons that cannot be measured or predicted through organisational datasets, such as sudden changes in personal circumstances or adverse team cultures. Multi-method approaches to data collection such as the integration of staff surveys or other sources could improve model performance via a comprehensive e-HRM approach (incorporating greater use of fluctuation theory, for example, to model the contribution from changes in environmental instability) but were not possible here. Model performance showed a result that was better than chance but tended to over-predict leaving rates, had poor AUC overall, and would not be suitable for individual staff member predictions (albeit it is also possible that the model predicts leavers who were ultimately persuaded to stay by management actions). The innate variability of staff retention reduces the statistical power of any analysis or algorithm and likely would have been made worse during the 2020–2022 COVID-19 pandemic period, in spite of efforts to manage these strains (Coster et al., 2022; Zasada et al., 2024). Second, due to only having data from a single Foundation Trust, it was not possible to construct well-powered datasets for smaller staff groups, such as therapists or technical staff. The use of single-trust data also limits the generalisability of the conclusions. Third, this work focused on easily accessible data from ESRs and so the list of possible variables was constrained. Whilst this choice was deliberate in order to provide an analysis framework that could be replicated by any Foundation Trust without complex data integration, a broader analysis might have revealed other factors driving retention and allowed for more modelling of interaction effects. A larger study encompassing multiple trusts would be well-placed to address these limitations, as well as other analyses, such as whether the influence of independent variables (predictors) changed over time.

5. Conclusions

Here we have shown that analysis of electronic staff records can yield insight into predictors of staff retention rates and represents an important additional source of information compared to the existing use of survey data, which by its nature tends to be backward looking. Whilst this work represents a pilot study for a single NHS Foundation Trust, it is likely that the results derived herein will reflect common issues and problems. Nonetheless, future work across different NHS Foundation Trusts will be essential to ascertain the generalisability of these findings. An expanded work should also target longitudinal analysis of Trust staff retention including additional sources of data. For example, it may be possible to detect the influence of policy changes over time or achieve better granularity of analysis with regard to specific staff categories (paramedics, nurses, support staff, and others) or measure tipping points and interaction effects. Nonetheless, the findings presented here provide an initial demonstration of how data-driven methods, including machine learning, can yield insight for health planners.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/admsci15080297/s1: Figure S1: Partial dependence plots for length of service; Figure S2: Partial dependence plots for age (last birthday); Figure S3: Partial dependence plots for line manager (of staff member) training days; Figure S4: Partial dependence plots for absence (sickness); Figure S5: Partial dependence plots for supervisor time in role; Figure S6: Partial dependence plots for salary.

Author Contributions

Conceptualization, R.M. and M.S.; Data curation, R.M.; Funding acquisition, M.S.; Investigation, R.M., M.Z. and M.S.; Methodology, M.S.; Project administration, M.S.; Resources, R.M. and C.T.; Software, R.M.; Visualization, R.M. and M.S.; Writing—original draft, R.M., M.Z. and M.S.; Writing—review & editing, R.M., M.Z., C.T. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Surrey Heartlands Workforce Alliance Innovation Fund.

Informed Consent Statement

Employee consent was not sought on the basis that the analysis undertaken was within the routine and expected operation of the organisation infrastructure, with no additional production or reporting of un-anonymised or unaggregated data.

Data Availability Statement

The dataset underpinning this work has not been made available to prevent individual data instances and members of staff being identified.

Conflicts of Interest

R.M. is an employee of Ashford and St. Peter’s NHS Foundation Trust.

Abbreviations

The following abbreviations are used in this manuscript:
ASPHAshford and St. Peter’s NHS Foundation Trust
AUCArea under the curve
ESRElectronic Staff Record
NHSNational Health Service
SHAPSHapley Additive exPlanations
STTScientific, therapeutic and technical staff
XGBoosteXtreme Gradient Boosting

References

  1. Abelsen, B., Strasser, R., Heaney, D., Berggren, P., Sigurðsson, S., Brandstorp, H., Wakegijig, J., Forsling, N., Moody-Corbett, P., Akearok, G. H., Mason, A., Savage, C., & Nicoll, P. (2020). Plan, recruit, retain: A framework for local healthcare organizations to achieve a stable remote rural workforce. Human Resources for Health, 18(1), 63. [Google Scholar] [CrossRef]
  2. Ahmed, S., Hossain, M. A., & Shamszaman, Z. U. (2022, December 2–4). A statistical analysis of the staff data to evaluate the influence of the retention factors in the NHS England. 2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA) (pp. 251–260), Phnom Penh, Cambodia. [Google Scholar] [CrossRef]
  3. Bajorek, Z., & Guest, D. (2019). The impact of temporary staff on permanent staff in accident and emergency departments. Journal of Organizational Effectiveness: People and Performance, 6(1), 2–18. [Google Scholar] [CrossRef]
  4. Bimpong, K. A. A., Khan, A., Slight, R., Tolley, C. L., & Slight, S. P. (2020). Relationship between labour force satisfaction, wages and retention within the UK National Health Service: A systematic review of the literature. BMJ Open, 10(7), e034919. [Google Scholar] [CrossRef]
  5. Bogner, K., & Landrock, U. (2016). Response biases in standardised surveys (GESIS survey guidelines) response Biases in standardised surveys (GESIS survey guidelines) (Version 2.0). GESIS—Leibniz Institute for the Social Sciences. [Google Scholar] [CrossRef]
  6. Bolt, E. E. T., Winterton, J., & Cafferkey, K. (2022). A century of labour turnover research: A systematic literature review. International Journal of Management Reviews, 24(4), 555–576. [Google Scholar] [CrossRef]
  7. Byrne, Z. S. (2022). Understanding employee engagement: Theory, research, and practice (2nd ed.). Routledge. [Google Scholar] [CrossRef]
  8. Coster, J., O’Hara, R., Glendinning, R., Nolan, P., Roy, D., & Weyman, A. (2022). PP38 impact of working through COVID-19 on ambulance staff resilience and intention to leave the NHS: A mixed methods study. Emergency Medicine Journal, 39(9), e5. [Google Scholar] [CrossRef]
  9. Daniels, J., Robinson, E., Jenkinson, E., & Carlton, E. (2024). Perceived barriers and opportunities to improve working conditions and staff retention in emergency departments: A qualitative study. Emergency Medicine Journal: EMJ, 41(4), 257–265. [Google Scholar] [CrossRef] [PubMed]
  10. Department of Health and Social Care. (2022). Health and social care review: Leadership for a collaborative and inclusive future. GOV.UK. Available online: https://www.gov.uk/government/publications/health-and-social-care-review-leadership-for-a-collaborative-and-inclusive-future (accessed on 11 June 2025).
  11. Department of Health and Social Care. (2025, July 15). 10 year health plan for England: Fit for the future. GOV.UK. Available online: https://www.gov.uk/government/publications/10-year-health-plan-for-england-fit-for-the-future (accessed on 15 July 2025).
  12. Gardner, S. D., Lepak, D. P., & Bartol, K. M. (2003). Virtual HR: The impact of information technology on the human resource professional. Journal of Vocational Behavior, 63(2), 159–179. [Google Scholar] [CrossRef]
  13. Herzberg, F. (2005). Motivation-hygiene theory. In Organizational behavior 1. Routledge. [Google Scholar]
  14. Kelly, E., Stoye, G., & Warner, M. (2022). Factors associated with staff retention in the NHS acute sector. The IFS. [Google Scholar] [CrossRef]
  15. Leary, A., Maxwell, E., Myers, R., & Punshon, G. (2024). Why are healthcare professionals leaving NHS roles? A secondary analysis of routinely collected data. Human Resources for Health, 22(1), 65. [Google Scholar] [CrossRef]
  16. McClendon, M. (1991). Acquiescence and recency response-order effects in interview surveys. Sociological Methods & Research, 20(1), 60–103. [Google Scholar] [CrossRef]
  17. McHugh, M. D., Aiken, L. H., Sloane, D. M., Windsor, C., Douglas, C., & Yates, P. (2021). Effects of nurse-to-patient ratio legislation on nurse staffing and patient mortality, readmissions, and length of stay: A prospective study in a panel of hospitals. The Lancet, 397(10288), 1905–1913. [Google Scholar] [CrossRef]
  18. Moscelli, G., Nicodemo, C., Sayli, M., & Mello, M. (2024). Trends and determinants of clinical staff retention in the English NHS: A double retrospective cohort study. BMJ Open, 14(4), e078072. [Google Scholar] [CrossRef]
  19. Moscelli, G., Sayli, M., Mello, M., & Vesperoni, A. (2025). Staff engagement, co-workers’ complementarity and employee retention: Evidence from English NHS hospitals. Economica, 92(365), 42–83. [Google Scholar] [CrossRef]
  20. Needleman, J., Buerhaus, P., Mattke, S., Stewart, M., & Zelevinsky, K. (2002). Nurse-staffing levels and the quality of care in hospitals. New England Journal of Medicine, 346(22), 1715–1722. [Google Scholar] [CrossRef]
  21. NHS Employers. (2023). Hyperlocal recruitment at leeds community healthcare NHS trust | NHS employers. Available online: https://www.nhsemployers.org/case-studies/hyperlocal-recruitment-leeds-community-healthcare-nhs-trust (accessed on 12 June 2025).
  22. NHS England. (2020). NHS England » our NHS people promise. Available online: https://www.england.nhs.uk/our-nhs-people/online-version/lfaop/our-nhs-people-promise/ (accessed on 12 June 2025).
  23. NHS England. (2023, June 30). NHS England » NHS long term workforce plan. Available online: https://www.england.nhs.uk/publication/nhs-long-term-workforce-plan/ (accessed on 12 June 2025).
  24. NHS England. (2024). NHS workforce statistics. NHS England Digital. Available online: https://digital.nhs.uk/data-and-information/publications/statistical/nhs-workforce-statistics (accessed on 14 June 2025).
  25. Nyberg, A. J., & Ployhart, R. E. (2013). Context-Emergent Turnover (CET) Theory: A theory of collective turnover. Academy of Management Review, 38(1), 109–131. [Google Scholar] [CrossRef]
  26. Oliveira, L., Gehri, B., & Simon, M. (2023). The deployment of temporary nurses and its association with permanently-employed nurses’ outcomes in psychiatric hospitals: A secondary analysis. PeerJ, 11, e15300. [Google Scholar] [CrossRef] [PubMed]
  27. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85), 2825–2830. Available online: http://jmlr.org/papers/v12/pedregosa11a.html.
  28. Raman, S. S. S., McDonnell, A., & Beck, M. (2024). Hospital doctor turnover and retention: A systematic review and new research pathway. Journal of Health Organization and Management, 38(9), 45–71. [Google Scholar] [CrossRef] [PubMed]
  29. Sachau, D. A. (2007). Resurrecting the motivation-hygiene theory: Herzberg and the positive psychology movement. Human Resource Development Review, 6(4), 377–393. [Google Scholar] [CrossRef]
  30. Strohmeier, S. (2007). Research in e-HRM: Review and implications. Human Resource Management Review, 17(1), 19–37. [Google Scholar] [CrossRef]
  31. Taylor, C., Maben, J., Jagosh, J., Carrieri, D., Briscoe, S., Klepacz, N., & Mattick, K. (2024). Care Under Pressure 2: A realist synthesis of causes and interventions to mitigate psychological ill health in nurses, midwives and paramedics. BMJ Quality & Safety, 33(8), 523–538. [Google Scholar] [CrossRef]
  32. Taylor, C., Mattick, K., Carrieri, D., Cox, A., & Maben, J. (2022). ‘The WOW factors’: Comparing workforce organization and well-being for doctors, nurses, midwives and paramedics in England. British Medical Bulletin, 141(1), 60–79. [Google Scholar] [CrossRef] [PubMed]
  33. Tikhonovsky, N., Grasic, K., & Treharne, C. (2023). HPR92 Burnt out or something more? investigating drivers of and spatial variation in NHS staff turnover intention. Value in Health, 26(12), S269–S270. [Google Scholar] [CrossRef]
  34. Weyman, A., O’Hara, R., Nolan, P., Glendinning, R., Roy, D., & Coster, J. (2023). Determining the relative salience of recognised push variables on health professional decisions to leave the UK National Health Service (NHS) using the method of paired comparisons. BMJ Open, 13(8), e070016. [Google Scholar] [CrossRef]
  35. Weyman, A. K., Roy, D., & Nolan, P. (2019). One-way pendulum? Staff retention in the NHS: Determining the relative salience of recognised drivers of early exit. International Journal of Workplace Health Management, 13(1), 45–60. [Google Scholar] [CrossRef]
  36. Zasada, M., van Even, S., Maben, J., Oates, J., & Taylor, C. (2024). Team time as a wellbeing intervention for NHS staff: A qualitative evaluation of implementation during the COVID-19 pandemic. BMC Health Services Research, 24(1), 1622. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Annual headcount leaver rates from trusts in England, starting at the last day of March for each year. Staff commencing or returning from maternity leave are not counted as joiners or leavers in this table; junior doctors have been excluded from the table. Sourced from NHS England Workforce Statistics.
Figure 1. Annual headcount leaver rates from trusts in England, starting at the last day of March for each year. Staff commencing or returning from maternity leave are not counted as joiners or leavers in this table; junior doctors have been excluded from the table. Sourced from NHS England Workforce Statistics.
Admsci 15 00297 g001
Figure 2. Beeswarm SHAP values and feature importance for the predictor variables in the XGBoost model. Predictor variables are ranked from top (highest importance) to bottom (lowest importance). SHAP values indicate the impact on departure risk—positive values (right) increase risk of departure, while negative values (left) decrease risk. The colour legend denotes feature value direction: for length of service, longer tenure (red) corresponds to negative SHAP values (lower departure risk), whereas shorter tenure (blue) corresponds to positive SHAP values (higher departure risk).
Figure 2. Beeswarm SHAP values and feature importance for the predictor variables in the XGBoost model. Predictor variables are ranked from top (highest importance) to bottom (lowest importance). SHAP values indicate the impact on departure risk—positive values (right) increase risk of departure, while negative values (left) decrease risk. The colour legend denotes feature value direction: for length of service, longer tenure (red) corresponds to negative SHAP values (lower departure risk), whereas shorter tenure (blue) corresponds to positive SHAP values (higher departure risk).
Admsci 15 00297 g002
Figure 3. Partial dependence plots for key predictor variables in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Figure 3. Partial dependence plots for key predictor variables in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Admsci 15 00297 g003
Figure 4. Partial dependence plots for cost centre vacancies in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Figure 4. Partial dependence plots for cost centre vacancies in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Admsci 15 00297 g004
Figure 5. Partial dependence plots for home-to-work distance in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Figure 5. Partial dependence plots for home-to-work distance in the XGB model, with y-axes showing the influence on the probability of departure. Probabilities are not additive and show how each variable influences the baseline probability of departure if all other variables are fixed.
Admsci 15 00297 g005
Table 1. Variables included in the dataset.
Table 1. Variables included in the dataset.
VariableMeasure
Age (last birthday)Years
Distance to workMiles
Point of departure12-month period (April to March)
Contract typeFull-time or part-time
SalaryPounds
Line manager years in positionYears
Sickness/absence in last 12 monthsDays
Cost centre vacanciesProportion over/under budget
Length of serviceYears
Line manager trainingDays
Table 2. Performance metrics for XGB model using the test set.
Table 2. Performance metrics for XGB model using the test set.
MetricScore
Accuracy0.70
Specificity0.73
Sensitivity0.50
AUC score0.65
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Milsom, R.; Zasada, M.; Taylor, C.; Spick, M. Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention. Adm. Sci. 2025, 15, 297. https://doi.org/10.3390/admsci15080297

AMA Style

Milsom R, Zasada M, Taylor C, Spick M. Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention. Administrative Sciences. 2025; 15(8):297. https://doi.org/10.3390/admsci15080297

Chicago/Turabian Style

Milsom, Rupert, Magdalena Zasada, Cath Taylor, and Matt Spick. 2025. "Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention" Administrative Sciences 15, no. 8: 297. https://doi.org/10.3390/admsci15080297

APA Style

Milsom, R., Zasada, M., Taylor, C., & Spick, M. (2025). Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention. Administrative Sciences, 15(8), 297. https://doi.org/10.3390/admsci15080297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop