2.1. Retirement
Following the OECD [
33] and Dorn and Souza-Posa [
34], retirement is defined as “having a self-described status of retired, regardless of employment status and receipt of a pension”. This definition should be seen in contrast with others based on income or employment status [
33]. Retirement is furthermore assumed to be an absorbing state or an event, as opposed to a process [
35]. The decision to retire has been studied from several perspectives, using several statistical techniques. Here, the approaches are broadly divided along two strands: econometrics, and what here is termed “substantive determinants”, encompassing sociological, psychological, and health determinants.
In the first strand, econometrics and microeconomics have centered on the decision to retire from the perspective of the rational individual facing an optimization problem, by modeling concepts such as individual preferences, expected utility, and assets. Studies have followed the traditional economic debates [
36] between structural models using concepts such as utility maximization [
37,
38,
39,
40,
41,
42,
43,
44,
45], along with reduced-form models that address utility maximization functions indirectly by studying sociodemographic explanatory variables. Dahl et al. [
46] analyzed the impacts of family situation, the presence of children, age, educational level, being a civil servant, income, spouse characteristics, and industry. Structural models also seem well suited to gauging the effects of specific policy interventions on behavior [
46,
47,
48,
49,
50,
51]. More recently, Hospido and Zamarro [
47] investigated couples’ retirement patterns using the partner’s retirement, age, age difference with partner, education, the presence of children and grandchildren, health, and the partner’s health as the main explanatory variables, using a bivariate probit technique. Other examples of reduced-form models focus on policy discontinuities to assess the influence of policy changes on retirement patterns [
49,
50], on the impact of systemic factors such as the 2009 financial crisis [
48], and on the relationship between financial incentives and retirement [
51].
Stock and Wise, in their seminal [
52] article outlining the option value model, contrasted the structural model with the reduced-form model and attempted to reconcile them. Their model combines both the structural and reduced-form approaches by adding the time variation (i.e., the individual re-evaluates their choices periodically); it can be used as purely structural (i.e., when all structural utility parameters are estimated) or in a reduced form (when those parameters are assumed).
The option value model has given rise to a sprawling literature; it is mostly used in its reduced form, although structural estimations have taken place as well; Lumsdaine, Stock, and Wise [
53], as well as the contributors mentioned by Gruber and Wise [
54], estimated the parameters either by regression or by a grid search. In this regard, it has been argued that the structural versions of the model “are rarely estimated successfully” [
51]. Examples from the reduced form include Van Sonsbeek [
55] and Mazzaferro and Morciano [
56], who estimated models within the microsimulation literature. Hanappi et al. [
57] and Borsch-Zupan [
58] combined their own estimations for some parameters with assumptions for others. Other reduced-form applications of the model include those of Belloni and Alessie [
59], Samwick and Wise [
60], Samwick [
61], and Berkel and Borsch-Zupan [
62]. The reduced-form models mentioned above [
49,
51] are also inspired by Stock and Wise’s [
52] option value framework.
In the second strand—situated within the fields of sociology, organizational psychology and health studies—countless studies have been conducted to assess the impact of several variables on retirement, without using a rational choice framework [
2,
63]. The classification of the variables influencing the decision to retire in review studies from this strand seems to have remained stable throughout the years. Feldman [
64] identified four types of factors: individual history, opportunity structures in the career path, organizational factors, and external environment; he further divided those variables into push and pull factors, which either coerce or motivate workers to retire, and which remain widely used [
2,
35,
65,
66].
More recently, a similar classification of the determinants of retirement into four levels has been proposed by Topa et al. [
63]: individual (income, financial security, and health), job (job satisfaction), work (organizational pressures, workplace timing for retirement, and job stress), and family (family pull); they conducted a meta-analysis of 151 articles dealing with these determinants. Fisher et al. [
1] proposed a comprehensive conceptual model in which variables are divided into antecedents (e.g., macroeconomic factors, family, work-related and person–job factors, individual features) and moderators (e.g., context of retirement transition, retirement planning, form of retirement). Scharn et al. [
2] defined 12 categories, giving more attention to personal characteristics: demographic factors, health, lifestyle, social factors, social participation, work characteristics, job demands, contextual factors, financial factors, retirement preferences, macro effects, and birth cohort. Beyond these factors, other issues such as retirement planning have also been researched [
67].
The empirical evidence for the existing models points to different directions. Topa et al.’s [
63] meta-analysis mentioned above concludes that the best predictor of early retirement is the timing of retirement at the workplace, followed by organizational pressures. Other aspects (e.g., health, financial security, family pull) play a smaller, albeit significant role, even though some individual studies have noted large effects in particular contexts. Sundstrup et al. [
68], for example, point in a recent publication to the importance of poor health and physical work demands in explaining early retirement in Denmark. This relates to Scharn et al.’s systematic review [
2], which underlines the disparate findings of studies across countries, which may be related to the institutional regime at stake. This is further echoed by Boissonneault et al. [
69], who conducted a systematic review of the factors explaining the recent increases in actual retirement ages across the OECD member states; they also pointed to institutional heterogeneity, and highlighted policy changes, education levels, and personal wealth as the factors driving the increase. The effects of policies have also been highlighted by Boot et al. [
70].
Until recently, most of the available studies were based on cross-sectional analysis [
71,
72,
73,
74,
75]. In recent years, however, much progress has been made in the use of longitudinal data. Throughout the past decade, more complex statistical techniques have been applied. Bockermann and Ilmakunnas [
76] used probit models to study the impact of working conditions on early retirement. Trentini [
77] used logit regressions to analyze SHARE data, and concluded that low levels of education and a stable career drive retirement patterns in Italy; in addition, he identified health as a potential factor driving involuntary retirement. SHARE data and event history analysis were used by Macken et al. to conclude that education plays a role in explaining involuntary labor market exit across Europe [
78]. Likewise, Hagan et al. [
79] used event history analysis to disentangle the causality of health status and retirement, and found that acute health shocks have a large effect on retirement decisions, possibly mediated by institutional dynamics. Using a stratified Cox regression, a cause-specific Cox model, and the model of Fine and Gray, van der Mark-Reeuwijk demonstrated that workers with poor health were more likely to leave the labor force than workers with good health [
80]. By means of multilevel event history analysis, De Preter et al. confirmed the influence of bad health, as well as the presence of grandchildren, as factors speeding retirement [
66]. The role of grandparenting was further explored and confirmed by Van Bavel and De Winter—also by means of multilevel event history analysis [
81]. The same technique was used by De Breij et al. [
82] to highlight the role of low education in explaining early exit from the labor market, and by Radl and Himmelreicher to gauge the influence of spousal behavior on retirement; the authors concluded that a partner’s early retirement pushes both men and women to retire as well, whereas the loss of a partner or a divorce have the opposite effect [
83]. Bertogg et al. also used multilevel analysis to explore the interaction of gender norms and retirement patterns, and found that female main earners transition to retirement earlier than secondary earners [
84]. Other techniques have also been applied. Using sequence analysis, Hoven et al. produced clusters of workers’ careers, and found that poor socioeconomic circumstances are related to early retirement [
85]. Nonparametric estimations were used by Manoli and Weber, who concluded that financial incentives matter little to keep workers at work for longer [
50]. Radl further applied piecewise exponential functions to explore the way in which class membership shapes retirement, along with other factors, and found that workers at both ends of the class continuum retire the latest [
86]. Hofacker et al. investigated the likelihood of voluntary vs. involuntary retirement by means of multinomial logistic regressions across three countries (England, Germany, and Japan), and found national differences in the ways in which some factors, such as education levels, influence retirement choices [
87].
  2.2. Machine Learning
The main goal of the scientific field of machine learning—often interchangeably referred to as data mining—is find patterns in data. Following the recent econometrics literature, we opted to focus on the term “machine learning” and leave aside the thereto-related “big data” and “data mining”. Big data is a concept lacking a uniform definition; it may, for instance, refer to large datasets [
3,
88], or to data fulfilling several criteria (e.g., volume, velocity, variety, exhaustiveness, resolution, relational nature, flexibility) [
89]; due to this murkiness, it will not be discussed here further. Data mining is referred to as the automated process of discovering patterns in data, with a focus on structural patterns, whereas machine learning points to the concrete techniques within data mining that guide the discovery [
3,
90]. Hassani et al. [
91] seem to see both terms as substitutes, and mention several machine learning techniques as applications of data mining. In the context of this paper, analyses are limited to supervised machine learning, understood as those techniques focusing “primarily on prediction problems: given a dataset with data on an outcome Y
i [such as retirement], which can be discrete or continuous, and some predictors X
i, the goal is to estimate a model on a subset of the data, given the values of the predictors X
i” [
92]. Unsupervised learning, which attempts to find patterns in the data without any guidance from the researcher [
93], is beyond the scope of this article.
In spite of the large variety of machine learning approaches, their modus operandi is often uniform; it starts with partitioning data randomly into a training set—on the basis of which a prediction model is estimated (learner)—and a testing set, on which the predictions are tested [
94]. Usually, several models are compared, and a choice is validated. Their performance is not only assessed by the accuracy with which they can predict results correctly in the test set, but by more complex criteria. Varian [
3] highlights the fact that validation and testing are often combined. Moreover, regularization parameters can be used in order to improve the model’s performance. These parameters often look at one dimension of the machine learning model at stake and vary depending on the kind of algorithm used. Examples include the number of branches (depth) of a decision tree, the number of trees in a random forest, the number of levels in deep learning and neural networks, or the sum of squared beta coefficients in ridge regression (for a simplified overview of the technique and the different types of regularization functions, see Mullainathan and Spiess [
93]; for a technical overview, see Hastie et al. [
94]).
The assessment of the predictive performance and the regularization usually takes place empirically by applying k-fold validation techniques: the data are divided into k subsets labelled s = 1, …, k; subsequently, the model is fitted using k-1 subsets, predictions are made for subset s, and the associated performance is measured. Then, the regularization parameter is chosen with the best average performance [
3]. Regularization parameters may or not may be used at all, with some studies foregoing them [
95].
Parameters are also used to measure algorithm and model performance, e.g., the receiver operating characteristic (ROC) graph and the area under the curve (AUC) derived from it, the Brier score, classification errors, the confusion matrix, the Gini coefficient, the kappa statistic, log loss, several variations of the mean errors (e.g., absolute, root, square, log), the Matthews correlation coefficient, and sensitivity and specificity indicators.
Within supervised learning, there are a vast number of available algorithms. Basic forms include “Naïve Bayes” probabilistic classification, “divide-and-conquer”, “separate-and-conquer”, linear and logistic regressions, mistake-driven algorithms, instance-based learning, and clustering. Ensemble methods build upon the basic algorithms and encompass, among others, bagging and boosting techniques, random forests (which combine the two former), and meta-models (“stacking” techniques). Deep learning algorithms lie at the basis of several types of neural networks [
96].
Arguably, discussing all of the algorithms would prove impossible. Therefore, some heuristic criteria to define this review’s scope are defined here in light of the retirement literature and the use of longitudinal data referred to in the introduction.
First, the retirement-as-decision-making paradigm, with retirement being a puzzle that needs to be explained, makes it necessary to use algorithms equipped to deal with regression rather than classification problems (which excludes, for example, clustering). In addition, given the current research trends and the availability of longitudinal data, the machine learning algorithm used for the analysis should be adequate to process longitudinal data by means of event history analysis. Second, the use of the SHARE data imposes the further constraint of right censoring. Third, given the time-varying nature of the variables explaining retirement—such as health—the algorithm should be able to accommodate time-varying covariates. We deal with these issues in the following subsections.
  2.3. Machine Learning and EHA
As has been pointed out before, the current literature on retirement makes extensive use of EHA, also called “survival analysis” or “hazard models”. Wang, Li, and Reddy [
97] provide a comprehensive overview of both “classical” and machine learning EHA: survival trees, Bayesian methods, neural networks, support-vector machines, ensemble learning (e.g., random survival forests, bagging survival trees, and boosting), active learning, transfer learning, and multi-task learning. The latter four fall into the category of advanced machine learning, and according to the authors would be better at dealing with censored information.
Even though econometrics increasingly makes use of machine learning to study phenomena such as companies’ financial distress [
98], bankruptcy [
99], the microsimulation of pension expenses [
100], and employee turnover [
101], the field of retirement remains underexplored from the machine learning perspective. As an illustration, a quick search with the Web of Science search engine delivered only six results linked to the economics domain, none of which dealt with the decision to retire. One study among them [
95], which used data from a longitudinal survey (SHARE’s US counterpart) to estimate the performance of machine learning algorithms vis-à-vis traditional statistic methods, had a similar setup to this paper’s; the authors compared the performance of different machine learning methods to statistical regressions when explaining several health indicators with social determinants. By contrast, a large body of knowledge on machine learning—and more specifically, on event history analysis—exists in other fields.
Using the bibliometrix R package and the biblioshiny interface, a short search for the main methods of analysis mentioned by Wang et al. [
97] within the query results mentioned above rendered the results laid out in 
Table 1. It should be noted that decision trees were not looked at, given the fact that random forests constitute a more advanced application of the same technique.
Given the relative abundance of random forest references over other techniques, the chances of mainstream solutions being created within this algorithm for the concrete problems presented by our data (time-varying covariates) are higher. Moreover, the classification techniques used by random forests (i.e., data splitting) are straightforward, and suited for the exploratory setup of this article.
The basic components of survival forests are survival trees [
102], based on the classification and regression trees (CART) technique [
103]. CART is a learning algorithm by which decision trees are created. Decision trees may be roughly defined as the construction of groups of observations (nodes) according to their value on a certain outcome variable; this is done by the successive partition of data along a set of explanatory variables, according to some splitting criterion, such as the sum of squared deviations from the mean. The partition results in a final classification [
102].
Survival trees [
102] build on the general principles of decision trees and CART, but split the data according to the survival function. In other words, the observations grouped together share a similar survival function. The calculation of the survival functions may be carried out using known techniques, such as exponential or Cox regression models.
Survival and decision trees, in turn, serve as the basis for random forests, in which bootstrap samples are drawn from the data and trees are grown using them. At each node, a random number of explanatory variables is selected. The final outcome is predicted by “averaging the predictions from each individual tree” [
102]. The random survival forest (RSF) algorithm is implemented in the randomForestSRC R package [
104].
As has been mentioned above for machine learning in general, survival forests have been only sparingly applied to the economic literature [
23]. There are, however, a growing number of articles in the medical literature that compare survival forests with mainstream statistical methods. Using the search query TOPIC: (“survival forests” “Cox regression”), a non-exhaustive search was performed on 29 June 2021 on the Web of Science Core Collection; it rendered 32 results, of which the subject of 7 was a comparison of Cox regressions with survival forests. One study was a systematic review comparing the results of machine learning and conventional statistical methods for the prediction of heart failure [
105]. A summary of the studies assessing individual models is provided below in 
Table 2.
Table 2 above paints a mixed picture; the findings do not favor one model above the other. This coincides with the findings of the systematic review mentioned above [
105]. Moreover, it is clear that they mostly rely on the comparison of a single model, and no simulations are carried out. It is also remarkable that all of the articles, with the exception of one, are from the field of medicine.
   2.4. Random Forests and Retirement Data
The retirement conundrum poses two problems for the use of random forests: right censoring, and time-varying covariates. A censored observation implies that the event of interest might occur in the future (right censoring) or has occurred in the past (left censoring), but is not observed. In the case of retirement, right censoring means that a subject leaves the observation period without having gone into retirement.
A popular approach to right censoring is to estimate a distribution of censoring by applying the inverse probability of censoring weighting (IPCW) approach [
113], which estimates a separate risk function for censoring, defines censoring weights based on that function, and applies the weights to the subsequent analysis in the dataset. Other approaches have been developed [
114,
115], but their application and incorporation into existing random survival forest software remains limited [
116].
Time-varying covariates [
117,
118] are variables whose values may vary in time—as opposed to invariant or near-invariant variables, such as gender. Examples include the health status of near-pensioners, or the number of children living at home. The analysis of time-varying covariates has been applied only partially to survival forests. Moradian [
119] compiled the existing approaches and focused on three methods: Bou-Hamad et al. [
99], which models discrete survival for time-invariant covariates; Bou-Hamad et al. [
120], which allows for time-varying covariates; and Schmid et al. [
121], which is similar to the second method. These methods are relatively simple to implement, as they involve the creation of “pseudo-subjects”, by which single observations are split along the lines of time periods or, in the case of Schmid et al. [
121], the period itself is added as a covariate. The separate datasets are then used to build random forests without any survival elements. This approach entails that a single subject may end up “split” across two different nodes of a tree. Moradian [
119] introduced variations on the three methods and carried out simulations using the randomForestSRC package.