The Classification of Profiles of Financial Catastrophe Caused by Out-of-Pocket Payments: A Methodological Approach

Maria-Carmen García-Centeno; Román Mínguez-Salido; Raúl del Pozo-Rubio

doi:10.3390/math9111170

,

and

¹

Department of Applied Mathematics and Statistics, CEU San Pablo University, Julian Romea 23, 28003 Madrid, Spain

²

Department of Public Economy, Statistics and Economic Policy, University of Castilla-La Mancha, Avenida Los Alfares 44, 16071 Cuenca, Spain

³

Department of Economics and Finance, University of Castilla-La Mancha, Avenida Los Alfares 44, 16071 Cuenca, Spain

^*

Author to whom correspondence should be addressed.

Mathematics2021, 9(11), 1170;https://doi.org/10.3390/math9111170

This article belongs to the Special Issue Applications of Quantitative Methods in Business and Economics Research

Version Notes

Order Reprints

Abstract

The financial catastrophe resulting from the out-of-pocket payments necessary to access and use healthcare systems has been widely studied in the literature. The aim of this work is to predict the impact of the financial catastrophe a household will face as a result of out-of-pocket payments in long-term care in Spain. These predictions were made using machine learning techniques such as LASSO (Least Absolute Shrinkage and Selection Operator) penalized regression and elastic-net, as well as algorithms like k-nearest neighbors (KNN), MARS (Multivariate Adaptive Regression Splines), random forest, boosted trees and SVM (Support Vector Machine). The results reveal that all the classification methods performed well, with the complex models performing better than the simpler ones and showing no evidence of overfitting. Detecting and defining the profiles of individuals and families most likely to suffer from financial catastrophe is crucial in enabling the design of financial policies aimed at protecting vulnerable groups.

Keywords:

financial catastrophe; out-of-pocket payment; machine learning; classification; profiles

1. Introduction

Sustainable Development Goals (SDG) are the goals established by the United Nations to be achieved in the decade 2020–2030. Goal 3 includes the necessity to “ensure healthy lives and promote well-being for all at all ages”, while subgoal 3.8 explicitly states the following: “Achieve universal health coverage, including financial risk protection, access to quality essential health-care services and access to safe, effective, quality and affordable essential medicines and vaccines for all” [1]. One of the indicators used to measure the degree to which this goal has been achieved is indicator 3.8.2, defined as “the proportion of population with large household expenditure on health as a share of total household expenditure or income” [2].

It is well known that a person’s access to healthcare in most countries requires expenditures by their families through fees, co-payment or charges [3] called out-of-pocket (OOP) payments [4]. These OOP can put a great financial burden on families [5,6] and even make it impossible to receive healthcare services due to a lack of financial resources [7].

There are a large number of studies that have used 3.8.2 to analyze the magnitude and extent of a financial catastrophe [8,9] although with different nuances in their calculations [10]. A household is defined as catastrophic when the economic resources used to pay for healthcare exceed a specific percentage of the equivalent household income [11]. These thresholds are standards and can vary depending on the healthcare system, illness, country or moment in time. The thresholds the indicator 3.8.2 uses to make its evaluations are 10% and 25% [2]. The most frequently used thresholds in the literature are 10%, 20%, 30% and 40% [9,12].

A complementary intrinsic analysis of the financial catastrophe measure is the analysis of sociodemographic and health variables of households associated with this condition [8,9]. The econometric methodologies traditionally used to make this analysis are the binary probit and logit models [8], and to a lesser extent, the ordered and multinomial probit and logit models.

Therefore, the main objective of this study is to predict the rate of financial catastrophe in households in Spain as a result of OOP payments in long-term care (LTC) using different statistical techniques and automatic classification algorithms.

The rest of the paper is structured as follows. Section 2 reviews the literature on both the financial catastrophe and the methodologies used to estimate factors associated with financial catastrophes. Section 3 describes the main materials and methods used in the work (i.e., the characteristics of the database for Spain, the variables used and the algorithms applied). Section 4 contains an analysis of the main results obtained and, finally, after the discussion in Section 5, the main conclusions are highlighted in Section 6.

2. Review of the Literature

2.1. Financial Catastrophe Associated with Out-of-Pocket Healthcare Payments

Undoubtedly, the type of health system existing in each country conditions the relevance and impact of OOPs. The key aspects of health system design are the financing and regulation of the systems themselves (who pays to support health systems and the people working in health services?), together with the provision and organization of health services [13].

The combination of different options offers four models of health systems: the Beveridge model (based on taxation and with many public providers); the Bismarck model (funded by a social insurance system and with a mixture of public and private providers); the private insurance model; and the absence of a defined model (especially in Asian and African countries) [14].

The vast extant literature has demonstrated a greater vulnerability and risk of facing financial catastrophe as a result of healthcare expenses in low and middle-income countries [8], in areas of lower income per capita [9], in low-income families [8] or in households with unemployed members [15]. Sociodemographic profiles which increase the probability of financial catastrophe have also been identified, such as those families with elderly members [12,16], with members suffering from chronic diseases [17], with elderly members with chronic diseases [12,18], with disabled people [19,20] or with those with severe disabilities which make them dependent [21].

Given the vulnerability of less developed and developing countries, the literature has focused on these countries. This is because their healthcare systems are newly established and households are required to contribute an important amount to access this care, which results in the exclusion of many. Among examples of this are studies of Asian countries like Vietnam [9], Nepal [22], Thailand [23], Bangladesh [24] and systematic reviews of the Asian continent [25,26]. In Africa, different studies have been carried out in Nigeria [27], Zambia [28] or Kenya [29], as well as systematic reviews [30]. There have also been studies centered on South America and other Latin American countries [31,32].

Studies have also been carried out in Europe, but a review of these studies on this subject has shown them to be scarce and obsolete [33]. Some of the countries that have been studied in terms of OOP payments associated with healthcare systems are Portugal [34], Poland, Germany and Denmark [35], Italy [36], or in the area of access to private healthcare services, Greece [37]. However, despite the fact that the social protection models in European countries are extensive and provide generous coverage, there are some OOPs on healthcare expenditures that a significant percentage of households with financial restrictions have to pay.

In the little analyzed field of LTC, it has recently been demonstrated that the cost of LTC is always high with respect to household income, implying that LTC is often unaffordable in the absence of social protection [10], and not all countries have a consolidated LTC system.

2.2. Traditional Methodology vs. Innovative Methodology

Special attention has been paid to the different methods used to estimate financially catastrophic health spending. In this sense, the budget share method defined in the SDGs overestimates financial hardship among rich households and underestimates hardship among poor households [38], which makes it difficult to detect financially burdened households.

Apart from the study of this specific component, the traditional methodology used to develop sociodemographic and clinical profiles of financial catastrophe victims has been through OLS, binary, multinomial and ordered logit models, and binary, multinomial and ordered probit models. These models are able to capture the influences of different profiles on the basis of a functional ex-ante relationship among the potentially influential variables (usually sociodemographic or clinical) involved in a financial catastrophe. In fact, these models are called parametric since the dependent relationship among the variables is known, with the exception of the parameters that can be estimated from the data.

However, there is an emerging methodology that has been tentatively applied to the general field of healthcare economy or the specific area of financial catastrophe. This is the application of machine learning techniques and algorithms, with the main advantage or characteristic being that they do not impose functional relationships among variables a priori, thus permitting the modelling (and capturing) of more complex dependencies among the data. This mainly nonparametric modelling implies assuming additional efforts both in the availability of the data and the use of intensive computational techniques.

There is extensive literature that explores the application of this methodology in different fields of science. The contributions in the area of health sciences stand out [39] by providing a panoramic vision as well as a perspective of advances to come, while [40] carrying out a compilation of recent applications of machine learning related to medicine. Meanwhile, in the field of biocomputing and biotechnology [41], these contributions include a compilation on the potential different techniques could have when they are applied in fields such as proteomics, genomics and similar areas. The textbook used by [42] and [43] includes an updated state of the art view of this field as well as foreseeable future perspectives. In areas such as the economy and finance [44], this textbook provides an assessment of the early contributions of machine learning to economics and predictions about its future possibilities, while in [45] a textbook about the application of different techniques in the area of finance is included. However, as far as we know, the application of these techniques has been relatively limited in the literature on the economy of healthcare, and for this reason, this study aims to investigate the possible uses of these new methodologies to implement automatic systems of classification that can assist in the decision-making process. Correctly predicting the rate of catastrophe has relevant intrinsic value in establishing a decision-making system which could lead to detecting the profiles of catastrophes and taking action to remedy them.

To our knowledge, only one study has included machine learning algorithms to predict the financial hardship associated with OOP medical expenditures in Rwanda. Although 96% of the population in Rwanda is covered by health insurance from the community health service, and around 74% of the population has health insurance, there are relevant OOP medical expenditures which severely limit the access to and utilization of health services [46]. One of the possible solutions is to predict OOP medical expenditures with accuracy, and machine learning techniques and algorithms allow this to be done.

3. Materials and Methods

3.1. Database Characteristics

In Spain, the 2006 Act for the Promotion of Personal Autonomy and Care of Dependent Persons [47], commonly known as the Dependency Act (DA), is a national law designed to provide services to people who are permanently dependent on others to help them with the basic activities of daily living [48]. DA funding was theoretically established with about a third of the total cost of care paid for by the beneficiaries (depending on the economic resources of the household) and the remaining two-thirds by the Public Administration. The economic capacity of the household is made up of income received from employment, capital income and wealth [49].

The Spanish Disability and Dependency Survey (SDDS) conducted by the Spanish National Statistics Institute [50] was used to estimate, firstly, the OOP payments associated with LTC (detailed information can be found elsewhere [51]) and, consequently, to estimate the catastrophe rate resulting from these OOP payments using the measure of catastrophe defined by [9] (detailed information can be found elsewhere [21]). The next step was to classify the households into the above categories of catastrophe for the Spanish case. Specifically, five categories were defined in accordance with the thresholds established in the literature [9,52]: less than 10% if OOP payments for dependent care do not exceed 10% of equivalent household income; the 10–20% interval if OOP payments exceed 10% of equivalent household income and do not exceed 20%; the 20–30% interval if OOP payments exceed 20% of equivalent household income and do not exceed 30%; the 30–40% interval if OOP payments exceed 30% of equivalent household income and do not exceed 40%; more than 40% if OOP payments exceed 40% of equivalent household income.

3.2. Predictor Variables

The explanatory variables were selected from an extensive review of the literature, essentially comprising sociodemographic characteristics [8,12,18,21,25,52,53,54,55,56,57]. The sociodemographic characteristics are: gender (male, female); age; marital status (married, single, widowed, separated/divorced); educational level (very low: illiterate/primary school incomplete, low: primary or equivalent, medium: secondary school/vocational training, high: university degree or equivalent); activity status (receiving earnings-related pension, employed, unemployed, other situations [housewife, student, etc.]); household income (less than 500€, 500–1000€, 1000–1500€, 1500–2000€, more than 2000€); household members; level of dependence (level I [25–49 points], level II [50–74 points], level III [75–100 points]); regional Gross Domestic Product (GDP) per capita (low per capita GDP, medium per capita GDP, high per capita GDP); regions of Spain (Andalusia, Aragon, Asturias, Balearic Islands, Basque Country, Canary Islands, Cantabria, Castile-La Mancha, Castile-Leon, Catalonia, Extremadura, Galicia, Madrid, Murcia, Navarra, La Rioja, Valencia, Ceuta and Melilla); ideology of the government (left-wing, right-wing); number of informal care hours received and members with intellectual disabilities and mental illnesses.

3.3. Statistical Analysis

The group of techniques and algorithms that have been used to carry out this study include traditional classification techniques (multinomial logistics, LASSO penalized regression and elastic-net) as well as other algorithms associated with machine learning and artificial intelligence, such as k-nearest neighbors (KNN), MARS, random forest, boosted trees and SVM. As is well-known in the literature, see textbooks in [58,59,60,61], classic methods based on specific parameters (logistic regression) are adequate if the function specified a priori approximates reality; however, important biases could occur if this is not the case. On the other hand, these methods tend to be stable, and the estimations do not usually fluctuate much among different samples (except for the existence of important outliers or other anomalies in the data). The algorithms normally used in machine learning (KNN, random forest, SVM, etc.) are, in their majority, nonparametric or semi-parametric, and tend to have much less specification bias. Nevertheless, they are likely to show great changes among different samples. This type of trade-off is known in the literature as the bias–variance tradeoff. Therefore, these algorithms have the tendency towards overfitting; that is, the error rate obtained in the adjustment sample used to estimate the fitting sample is much lower than that obtained in the test sample. A general outline of the entire process is presented in Figure 1.

Figure 1. General outline of the process.

3.3.1. Algorithms

The techniques and algorithms used in this work to predict the rates of catastrophe are the following ones [61]:

Multinomial logistic regression. This parametric technique assumes that a logistic relation exists between the independent variables and the catastrophe rate and estimates the coefficients of said regression for each catastrophe rate category.
Penalized multinomial logistic regression. This is a variation of the above multinomial logistic regression, which penalizes the elastic-net type coefficients [62]; that is, it is a combination of the penalization of absolute values and the squared estimated coefficients. The function to optimize is as follows:

$a r g {m i n}_{β} {\frac{1}{n} \sum_{i = 1}^{n} - l_{i} (β) + λ [(1 - α) {‖ β ‖}_{2}^{2} / 2 + α {‖ β ‖}_{1}]}$

(1)

where $l_{i} (β)$ is the log-likelihood of the i-th observation and the penalty terms are the L₁ and L₂ norms of beta coefficients, respectively. As particular cases of the elastic-net penalty, for α = 1 the LASSO regression [63] is obtained (which is the default in glmnet package), meanwhile α = 0 corresponds to ridge regression [64]. Any value 0 < α < 1 will provide a combination between LASSO and ridge regression. The parameter λ controls the overall preponderance of the penalty term in the optimization problem. Both parameters are determined using the training data.
k-nearest neighbors and weighted KNN [65,66]. This fully nonparametric method determines the value of an observation based on the weight of the closest observations. In the tuning process, it is necessary to choose the type of distance used as well as the maximum number of neighbors considered and their kernel of weight. The kernel function sets the rule of weighting the neighboring observations by underweighting the most distant neighbors.
MARS. This algorithm, named multivariate adaptive regression splines [67], creates piecewise linear functions as hinges to approximate nonlinear relations. It can also allow interactions among the functions of different variables. Among the tuning parameters are the degree of interaction and the pruning process.
Random forest. This algorithm is based on the aggregation of classification trees through bootstrapping [68]. The singular characteristic of this algorithm is that, during the division process of each tree, only a subgroup is randomly chosen from all the available predictor variables to mitigate the effects of the multicollinearity present in large databases, in other words, the aggregated trees are decorrelated. The number of selected variables in each partition is determined using a tuning process.
Support vector machines (SVM). This algorithm [58,69] performs a division of the space of the predictor variables where the boundaries can be nonlinear and a cost is assigned to the observations that are incorrectly classified. The tuning parameters are usually the global permitted cost, the type of kernel used to establish the boundaries and the sigma associated with the kernel.
Boosted trees. The technique of boosting for trees [70,71,72] is based on constructing trees iteratively in such a way that the data for each tree are weighted differently from the residuals obtained from the previous trees. Among the tuning parameters, the maximum number of iterations, the maximum depth of each tree in each iteration and the learning rate of aggregation among trees are usually used.

Table 1 gives a summary of the classification techniques considered in the study as well as the relevant tuning parameters of each technique. All the techniques and algorithms were implemented in R language (version 4.0.5) [73] through the caret package (vers. 6.086) [61,74]. In fact, many algorithms were implemented in different packages [75,76,77,78,79,80,81] that are called by caret (information about which particular packages were used for each technique is given in Table 1). Furthermore, all the computations have been made using R 4.0.5 for Windows 10 in a workstation with 8 Cores and 16GB of RAM. The R code is available upon request.

Table 1. Algorithms and statistical techniques used for classification of the catastrophic rate.

It is important to point out that the opinion of the authors about the suitability of the models has been eclectic from the beginning, and the usefulness of the models in this context should be considered solely on their predictive abilities in terms of catastrophe rates. Logically, all the techniques should be evaluated using the same a priori predictor variables (although some of the techniques can carry out predictive variable selection processes during their training phase) as well as the same group of observations.

3.3.2. Partition of Training and Test Data

Given that the techniques of complex classification are prone to overfitting, the database was first divided into two groups:

The first group is called the training group and includes 80% of the data (5021 observations). This group was used to estimate the parameters of the models as well as to perform the tuning processes inherent in the majority of the techniques. It is important to point out that, although the training group was randomly selected, it should always be the same for all the techniques used.
The second group is called the test group and includes the remaining 20% of the data (1253 observations). This group of data was not used in any moment to estimate or train the models and statistic algorithms. Therefore, the test group included new data which permitted the different techniques to be evaluated and compared.

Figure 2 shows the frequencies of the catastrophe rates in each category for the training group and the test group. As can be seen, the frequencies of each catastrophe category are, in percentages, quite similar in the training and test groups. This demonstrates that the results of the predictive evaluation in the test group are valid with respect to the entire database.

Figure 2. Frequencies of the catastrophic rate in the training and test groups.

3.3.3. Metrics for Measuring Performance

Once the training process was carried out with the first group, the evaluation of the predictive performance of each technique took place on the test group. The option chosen to measure performance was the one which is the simplest and easiest to understand and the most consistent in evaluating the correct percentage of classifications obtained for each category. Furthermore, we evaluated general accuracy; that is, the correct percentage of classifications for the whole set of data tests of each algorithm. Finally, if no statistically significant differences existed among the predictive performance of the different models in the test group, the simplest models were chosen, following the parsimony principle.

3.3.4. Tuning Process

It is important to point out that all the tuning processes of each algorithm have been performed in the manner explained above; that is, using cross-validation, the training group was divided into five parts. Four of them were used to adjust the models for each combination of tuning parameters, and the remaining group evaluated the predictive quality. The process was repeated five times, changing the four groups used to adjust the models and the group evaluating the predictions each time. In this way, for each combination of tuning parameters, all the available data were used in the training but the overfitting effect common to complex techniques was mitigated. Finally, this process was repeated three times, changing the random cross-validation selection. The tuning parameters finally selected were those which maximized accuracy. In the specialized literature, this method is known as repeated cross-validation [61,82]. For the process to be homogeneous and reproducible with all the techniques and algorithms compared, the same random seeds were used to generate the entire process in a way such that all the algorithms used the same datasets.

4. Results

Descriptive information about the sociodemographic characteristics of the sample for different regions of Spain are included in Table 2. We can see that two out of every three dependent people are women (67.85%), and the mean age is 72.86 years (DE: 18.92). The most common marital status types are widowed (42.06%) and married (39.74%), the predominant educational level is basic (primary or equivalent and lower, 90.72%) and the number of equivalent members of the household is 1.92 (DE: 0.66). The majority receive a pension (84.08%). Level II and level III are recognized in 34.63% and 38.94% of the sample, respectively; two out of every three people have severe difficulties performing the basic activities of daily living (65.80%), while 27.66% have moderate difficulties.

Table 2. Sociodemographic data of the sample, divided by values of catastrophe measures.

Almost a third of the population live in low, a third in middle and a third in high-income per capita GDP regions, while two out of three people live in communities governed by left-wing parties. The mean score of dependency obtained is 61.28 points (DE: 18.29), and two out of three people suffer from mental diseases (65.73%). A total of 18.98% of people receive professional care financed by their families, and the number of hours of informal care received is 36.33 (DE: 49.37). We can observe a similar behavioral pattern for all the variables in the different thresholds analyzed.

The most significant effect is in educational levels and levels of per capita income in the regions of residence. Levels of dependence also demonstrate disparate behavior. In the under 40% thresholds, level I is predominant (41.63% for the under 10% threshold, 40.51% for the under 20% threshold, 61.14% for the under 30% threshold and 50.67% for the under 40% threshold), while for the 40% threshold, the higher levels of dependence (levels II and III) have a greater weight in the overall sample (43.30% and 38.87%, respectively).

Figure 3 includes graphics reflecting the tuning process, which indicate the values of accuracy for each possible group of tuning parameters. As we can observe, the values of the parameters selected in the tuning process are those which maximize accuracy in each case. In addition, for most algorithms, the combination of parameters that maximize accuracy are selected for relatively complex models (high values of tuning parameters), indicating the possible existence of highly nonlinear and complex relations between the rate of catastrophe and the predictor variables.

Figure 3. Tuning process. The y-axis measures the accuracy (percentage of correct classifications) and the x-axis measures the tuning parameters of the corresponding algorithm.

Table 3 shows the classification tables with the correct percentages of classifications for each technique using the test data. While the columns represent the data on observed catastrophe rates in the test group, the rows show the predicted category for each algorithm on the same group of data. Each square represents the percentage with respect to the observed category in such a way that the columns always add up to 100.

Table 3. Classification tables for each technique in the test group.

In general, we can see that all the classification methods perform well. The more complex classification models perform much better than the simpler models, and there is no evidence of overfitting in the types of models such as k-nearest neighbors (classification percentages: 82.09%, 86.99%, 80.08%, 73.10% and 58.91% for the categories <10%, 10–20%, 20–30%, 30–40%, >40% of catastrophe, respectively) and boosted trees (gbm) (90.67%, 85.97%, 77.64%, 69.66% and 62.38% for the five categories analyzed, respectively). The best classification techniques are SVM (classification percentages: 90.30%, 92.09%, 88.62%, 96.55% and 71.78%) and random forest (91.04%, 93.37%, 91.06%, 94.48% and 72.28%). In contrast, the parametric models show lower values for the classification percentages (for logistic regression: 72.39%, 83.42%, 56.10%, 46.21% and 48.51%; and penalized logistic regression: 71.27%, 83.42%, 56.10%, 46.21% and 49.01% for the five analyzed categories, respectively).

Table 4 summarizes the general accuracy (percentage of correct classifications) of each algorithm using the data test. This table again shows the better performance of the algorithms random forest and SVM over the rest of the techniques. Furthermore, in general, the semiparametric and non-parametric algorithms have much better performances than parametric models (logistic and penalized logistic models in our case).

Table 4. General accuracy (% of correct classifications in test data).

Finally, Table 5 and Table 6 demonstrate the importance of the variables in the classification results for the models that perform the best (SVM and random forest). With SVM, the ranking and degree of dependence, monthly household income and per capita household income are the four variables that have the greatest weight in the classification of catastrophe risk in the different thresholds analyzed. Random forest includes being married as the most important predictive variable, followed by the dependence ranking and regional per capita income (highlighting Castile-La Mancha), and then the degree of dependence.

Table 5. Importance of variables in SVM.

Table 6. Importance of variables in Random forest.

5. Discussion

The first global studies showed that the percentage of catastrophic households due to healthcare OOP payments varied from 0.01% in the Czech Republic to 10.45% in Vietnam (in a study of 59 countries), demonstrating that those countries with advanced social security structures or healthcare systems financed by taxes protected their population financially [8]. In a review of 89 countries representing 89% of the world’s population, a catastrophe rate of 1.47% was obtained, showing that 18 countries have a rate of catastrophe that exceeds 4% [52]. A recent systematic review carried out in 133 countries revealed that in 2010, the global rate of catastrophic spending in PDB on healthcare for the 10% threshold was 11.7%, revealing that 808 million people had catastrophic healthcare expenses [83].

A specific analysis of less developed countries and continents showed that in Asia, where a study was carried out in eleven low to middle-income countries (which represent 79% of the Asian population and 48% of the world’s population), the incidence of poverty increased by 14% when the analysis considered OOP healthcare payments [25]. A review of Sub-Saharan African countries found large variations depending on the country under study, reflecting an average rate of 17% for the 40% threshold, which especially worsened when the person suffered from HIV/ART and malaria, causing the catastrophe rate to reach 100% of households [30]. When twelve Latin American and Caribbean countries were analyzed, in the 30% threshold, the average catastrophe rate was 8.23%; this was with important heterogeneity since this rate was quite high in countries such as Nicaragua (19.9%), Guatemala (16.3%) and Ecuador (15.8%) [31].

To the best of our knowledge, the literature has presented the profile or profiles of the types of families most at risk of financial catastrophe caused by OOP payments in different areas of expenditure. These profiles include, for example, being male [34,84], being married [85], having a lower level of education [14], being unemployed [15], having a lower household income [38], suffering from diseases such as cancer [5], diabetes or cardiovascular diseases [18] or chronic diseases [17], being elderly [21,52], being elderly and suffering from chronic diseases [12,18], and being disabled [19,20] or dependent [21]. Specifically singled out risk factors are belonging to a poor household and suffering from a chronic disease [86], living in the city [12,34], living in regions with medium and high levels of GDP per capita [18,52] and living in low and middle-income countries [8].

Among the recommendations based on the explicative method, a study found that direct OOP payments for healthcare expenditures increased poverty. The researchers denounced the need for more effective policies, focusing on the Asian population who live on less than a dollar a day, including 2.7% of the 78 million people studied [87]. In this sense, it was recommended that that governments consider additional measures to increase financial protection for poor households faced with payments for medical treatment, concretely, in their study, for the treatment of cancer [88]. Another study found that in Africa, social protection assistance based on subsidies for medicines, free medical care or the extension of social security have not sufficiently protected households financially due to nonmedical costs that are nevertheless intrinsically related, such as transport and food [30]. A proposal to reduce financial catastrophe could be implemented progressively in the healthcare financing system by substituting OOP payments for funds from indirect financial sources [7].

In this work, through a design intended to obtain a correct and precise classification of families at greater risk of catastrophe, we used methods of machine learning which, as far as we know, have not been used in this field of study before. In order to do so, we selected the groups of parameters, which maximized the accuracy of each of the techniques studied. Therefore, if the result of the training group was representative and there was no overfitting, the classification obtained with the test group should be optimal. It is important to point out that if the parameters chosen in the tuning process were those which maximize accuracy in each case, other options presented in the literature have also been considered as, for example, selecting the group of parameters with the lowest value so that its accuracy was within the typical deviation of optimum accuracy [58,59,60,61].

To our knowledge, only one study has used machine learning techniques to detect financial catastrophe derived from OOP medical expenditures in Rwanda [46]. This study considers these algorithms: random forest, decision tree models, gradient boosting and regression tree models. Most of these algorithms are based on tree models and, therefore, represent only a part of the set of algorithms available in machine learning (although all of them are very useful tools for making predictions). In the present study, the algorithms included were the following: multinomial logistic regression and penalized multinomial logistic regression (with elastic-net penalties), k-nearest neighbors, MARS, random forest, support vector machines (SVM) and boosted trees. In fact, our choice of algorithms was made with the intention of covering the greatest prediction possibilities to make the comparison more extensive. Furthermore, although in our case random forest achieved very good results, the SVM algorithm (not based on trees) achieved similar results. To sum up, in our opinion, the papers focused on different algorithms and the previous study [47] restrict the comparison mainly to algorithms based on trees, while our paper is more extensive in the set of prediction tools used, by adding algorithms based on different backgrounds.

To continue with the results of this study, we found that those models with greater complexity performed better than the simpler models, with no evidence of overfitting. This demonstrates that the most frequently used parametric models (logistic regression) can have specification bias and that the instability associated with nonparametric models (bias–variance trade-off) can be reduced to a great extent by performing model aggregation through simulation. In fact, in this work, an improvement in predictive terms can be discerned, which compensates for having to implement more computationally complex systems. In addition, once the machine learning algorithms have been trained (random forest, SVM, etc.), since they work automatically, they can be used to obtain predictions in real time at a cost, which is similar to the other techniques. Logically, this predictive gain by the more complex machine learning techniques is a result of applying these algorithms (with high computational cost) to a heterogeneous and sufficiently extensive database to allow the most complex relations among the data to be learned.

Among the most complex models, the ones which are fundamentally nonparametric performed the best (specifically, random forest and SVM). There is evidence that there are nonlinearities in the data, and that they probably cannot be captured using functional relations. In fact, as is well-known in the literature [60], if nonlinearities do not exist in the data, multinomial logistic regression should show results similar to SVM, while in this case, the performance of the latter was clearly superior. This could be explained by the nonlinear partition of the space generated by the predictor variables that the algorithm SVM makes when it uses a nonlinear kernel (as in this case). This partition is not able to replicate itself through parametric models and, for this reason, this type of model has greater difficulties in adequately classifying the categories of catastrophe with less frequent observations (especially those over 20%). The success obtained by random forest could be explained by the fact that the value of mtry selected was quite high, for which quite a few predictors were used in each partition. This induced, in each individual tree, a tendency of overfitting and instability in the results obtained but, in performing bootstrapping among the different trees inherent in this technique, the instability (or variance) of the algorithm was reduced, maintaining its capacity to distinguish the areas with the best classification ability inside the space generated by the predictors. The complex model which obtained the worst results was boosting. This could be because since this technique performed the tuning process considering the residuals of each stage, instead of performing random bootstrapping (like random forest), it could be more affected by overfitting than the other techniques.

A problem, which was detected in the analysis, has to do with the category “Upper 40%”, which systematically had the worst predictive performance and, more seriously, a large part of the classification errors go to the farthest categories. This category is the one that most worries the literature since the families included in this category are in situations of absolute financial vulnerability [21]. The problems with this category could be due to the great heterogeneity that exists among the observations of this category [15], which implies that any method would have problems in categorizing it adequately. However, it is true that there was an improvement in classification with the more complex techniques (random forest and SVM) in this category as well, so it could be expected that if there were more observations in the dataset which allowed their greater heterogeneity to be captured, the difference in the predictive performance of this category with respect to the other categories would be reduced (at least for the models with the best predictive ability).

This work has the following limitations. The first one is in reference to the estimation carried out on the specific database of a population at a concrete moment in time. It is necessary to extend it with more recent databases from other countries in order to analyze whether the present empirical case of success using these techniques is maintained in other international micro databases. The second limitation has to do with the lack of consideration in this study of some machine learning algorithms, like neural networks and deep learning. They have not been used due to the good results obtained with SVM and random forest (since they are highly complex models, we have considered the area to be covered in this case). Nevertheless, it could be relevant to include them in a future comparison, especially if larger databases are available and there is greater heterogeneity.

6. Conclusions

In summary, to the best of our knowledge, this is the first study in the financial catastrophe literature that has developed a classification of financial catastrophe risk caused by OOP payments; in this concrete example, payments associated with LTC expenses.

While no methodology allows for an instantaneous classification of an applicant’s profile, the methodology presented here permits the risk of financial catastrophe to be classified, which can direct the recommendations made throughout the literature in a more specific way and with better results. All the classification methods performed well, with the complex models performing better than the simpler ones and showing no evidence of overfitting.

In the specific casuistry of LTC in Spain, the subject of study in this work, 68.07% of families have to use more than 40% of their income for OOP payments, with an average monthly addition over that amount of 341.66€ for the greatest degree of dependence (level III) [51]. Specifically, the sociodemographic factors that increase the probability of becoming a victim of financial catastrophe are: being elderly, being single, widowed or separated, having lower levels of household income and education, having greater levels of dependence [21], being unemployed, living in a region with lower per capita income and living in a region governed by right-wing parties [15]. The four essential classifying variables of catastrophe obtained in this study are the ranking and degree of dependence, being married (which was the most relevant category in the studies mentioned above) and regional per capita income.

It is claimed that “there is no universal formula that can be used to help poor countries design ways to increase reliance on prepayment and reduce out-of-pocket payments” [52]. For this desire to find an alternative perspective of classificatory analysis, the use of these types of methodologies is proposed for a problem that worries health authorities globally, as subgoal SDG 3.8 [1] exemplifies. Proposing this methodology as a third indicator of the degree to which this subgoal has been achieved together with the existing instruments 3.8.1 and 3.8.2 [89], is the main objective of this study.

The comparison between the two approaches used, parametric versus semiparametric and non-parametric algorithms, demonstrates an almost perfect correlation between the statistically significant variables in the first approach and the greater predictive weight in the second. The reason for this is that, although it is very complicated to evaluate functional dependency between the catastrophe rate and the group of independent predictor variables with many of the techniques used in machine learning, it is possible to measure the importance of each predictor variable in average predictive performance [59,61]. Nevertheless, it is necessary to keep in mind that these types of techniques (in some cases black-box) are designed with a predictive approach, not an explicative one, so that their ability to measure influences among variables is less than parametric techniques such as logistic regression.

With the complexities among the dependencies of the data, MARS predicted better than multinomial logistic regression, although their performance was worse than random forest and SVM. This is because, although these are more complex than linear or logistic techniques (for example, MARS induces non-linearity, allowing the existence of thresholds in the linear relationships, and includes products of predictor functions to capture the interactions among them), they are less adaptable than more complex fully non-parametric models. Since in this particular case there appear to be very complex dependencies, the intermediate models (MARS) did not obtain the best results.

This leads us to conclude that the penalized methods barely improve the predictions of the nonpenalized methods. As has been previously mentioned, the existence of strong nonlinearities in the dependencies of the variables means that none of these techniques is able to capture them if the parametric specification is not suitable. Penalizing the coefficients or performing variable selection will not improve the approximation.

Future lines of research applying the group of methodologies presented here in large databases related to financial catastrophe caused by healthcare payments are necessary to corroborate our results. This would permit the establishment or design of procedures for legislators and authorities who implement social and healthcare policies to detect those individuals or families at greater risk of financial catastrophe, and attempt to protect them financially with exemptions or alternative OOP payment designs. In addition, given the results obtained in this research, it would be very interesting to compare the best algorithms (random forest and SVM) with the performance of other nonparametric algorithms, such as neural networks, to attempt to discover if the complexity of the relationships can be fully captured using different alternatives. Another important future line of research is the consideration of n-dimensional groups (combining two, three, or more one-dimensional groups) as explanatory variables, in order to set up a multi-perspective analysis of data which offers a complete design of the financial catastrophe profiles.

Author Contributions

Each of the authors indicated in this manuscript have equally contributed to the conceptualization, methodology, software, validation and formal analysis, as well as to the investigation and provision of resources and data management. Finally, the writing task has also been tackled by all the authors, by providing new perspectives and improving it. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Cátedra Mutua Madrileña-USPCEU, grant number 060516-USPMM-01/17, AEI-Ministry of Science and Innovation (PID2019-107800GB-I00 and PID2019-104901RB-I00) and the Spanish State Programme of R+D+I (ECO2017-83771-C3-1-R).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement