Considering the differences between healthcare systems in the EU28 countries, we proceeded to identify and analyze the situation of the self-perceived health status across the EU28 member states and to determine if and how does the digitalization of health services impact the self-rated health of the European populations. For this analysis, 10 years of data from the 28 member states of the European Union was collected. The variables selected in this analysis refer to the self-rated health status of the population (
—as a dependent variable—and some digitalization indicators related to health such as the internet usage level for seeking health-related information (
and the health-related online purchases or downloads (
, as independent variables. The analysis has been conducted in the EU28 area, from 2009 to 2018, based on Eurostat, OECD, and WHO data. The authors faced a research limitation in the matter of self-reported data, which can contain potential sources of bias that the authors had no control of, such as the case of the self-perceived health status [
18,
32,
33].
Because the data we gathered covers 28 states over a 10 year period, the panel data regression models, i.e., dynamic panel regression has been used for analyzing data and model creation. The authors applied the Arellano–Bond one-step difference general method of moments estimator, as this technique enables overcoming the associated econometric issues of the working dataset. The EViews 11 software has been used for data processing.
3.1. Data
The first step was to select the variables involved in the analysis. The dependent variable considered for the analysis is the self-perceived health status as being very good or good.
According to Eurostat statistics, self-perceived health status varies over time among the EU population on a four-item scale, ranking very good, too good, fair, and bad. A cumulative statistic of self-perceived health status as being good or very good has been used in our study, as was also provided in the same Eurostat database because we intended to explain how people that are concerned about their own health behave for maintaining and improving it via new technologies. Additionally, we aimed to provide the results to academia and governments in order to include them in further research studies and development policies for sustainable development of the health systems across Europe.
Eurostat statistics show that the share of the population concerned with personal health status increases with the level of education and income [
30], reflecting the diverging levels of affordability of medical care, income-dependent lifestyles, or fewer problems in meeting medical needs for people with a higher level of income compared with the low-income earners; in the same note, education is linked to income levels, so people with increased education are likely to meet their medical needs and to be more aware of the suitability of adopting healthy lifestyles among others. This means that the more educated and wealthier the person is, the more interested he will be in monitoring his health status because he understands better the consequences of ignoring healthcare, and more than that, he owns the financial resources for investing in technology-related health apps. This population segment usually owns high-quality ICT devices and is able to use mobile technologies and the internet with ease in order to search for information, and therefore, the authors decided to focus on analyzing their health-related online behavior. As one can see from
Figure 3, differences between the self-perceived health statuses of populations of the EU28 member states exist over the considered 10 year period of time [
14].
Inside the EU, between 2009 and 2018, 66% of the population having completed tertiary education perceived their health as good or very good [
34]. For most of the Eastern European countries, the population reporting their health status as being very good or good is similar to the EU28 level, Romania registering an even higher score starting from 2013. As shown in the figure above, Ireland is the highest-ranked country in terms of population perceiving their health as good or very good, followed by Cyprus, Greece, Spain, and Belgium, whereas the Baltic countries are those whose populations rate their health in a less positive way (only 46% of the Latvians perceive their health as good or very good, and 45% of the Lithuanians rate their health in a positive way). Nevertheless, self-perceived health status is an indicator of well-being, as well as a measurement unit for the quality of life [
35,
36], not only a predictor of healthy years of life, mortality, or life expectancy [
37], therefore the differences occurring between countries in the matter of self-rated health accurately illustrates differences in health systems, i.e., the gaps between Western and Eastern European countries in terms of life quality, well-being, social inequalities, and so on.
The second variable considered for the analysis is “information seeking on health issues on the Internet”.
Figure 4 shows the internet usage for health-related information seeking rates in the EU28 states [
32]. As one can see, the populations from northern Europe use the internet for seeking health-related information the most (Finland, Denmark, Sweden, but also the Netherlands, Luxembourg, and Germany). This is because the mentioned countries are ranked the highest in terms of digital skills, according to the latest Digital Economy and Society Index [
31]. On the other hand, Romania and Bulgaria present similarities in this case, in the sense that fewer individuals from these two countries use the internet for health-related information, compared to the EU28 level.
The third variable considered for the analysis is related to the online purchases and downloads linked to health, including medicines purchased online and mobile health apps downloads/purchases. Regarding the percent of individuals who proceeded to health-related online purchases or downloads, Germany is ranked the highest with 28% of its population buying or downloading health-related goods (
Figure 5). The top is completed by Denmark and, surprisingly, by Romania and Greece.
The differences between various countries in the European Union, highlighting the gaps between Eastern and Western Europe, happened mostly because of the disparities between them in terms of aspects such as digital skills, openness to new technologies, education attainment levels, and steps taken into what is called the “digitized economy”. So, the fact that a quite large percentage of Romanians purchase or download health-related goods and/or services online seems unexpected, especially considering that Romania is ranked among the lowest in Europe in terms of digital skills [
31].
3.2. Methodology and Model
The analysis was based on a general model taking the following form:
where
Yit is the “self-perceived health status” as the dependent variable, which indicates the percentage of the population reporting a good or very good health status,
Xit is an independent variable referring to the “Internet use for seeking health-related information”, and
Zit is another independent variable related to “health-related online purchases and downloads” (where “
i” stands for the country and “
t” for the time). While
β and
δ are two column vectors of coefficients,
εit refers to the disturbance term, which is composed of random variables. As shown in Equation (2), the disturbance term
εit has two orthogonal components:
ui are the fixed effects (variables that are constant across individuals, meaning that they do not change or they change at a constant rate over time), and
vit are the idiosyncratic shocks (unobserved factors that impact the dependent variable both over time and across individuals).
Before running the regression analysis, the authors had to decide what estimator should be used. The decision of using the Arellano–Bond one-step difference Generalized Method of Moments (GMM) estimator is based on the outputs of panel-data-specific problems examination such as endogeneity (situations when any of the independent variable—in our case ”Internet use for seeking health-related information” and/or ”health-related online purchases and downloads” is correlated with the disturbance term), heteroscedasticity (the standard errors of a variable monitored over a period of time are non-constant), and serial correlation, often occurring in time series when a variable and its lagged version, i.e., Yit and Yit−1 are correlated with one another over periods of time. This means that the level of a variable affects its future level. In our case, the self-perceived health status level monitored at t − 1 impacts its level at t moment.
The above-mentioned problems often arise in panel data analysis, and if they are not considered when choosing the appropriate method for estimation, classical methods such as the ordinary least squares method (OLS) would lead to biased estimators. For the fact that the process of self-rating the health status is a dynamic one, which means that the observations self-perceived health status level relate to past ones, as well as for the fact that the time-invariant country effects could be correlated with the independent variables and that the dataset consists in a quite short time component compared to the large territorial dimension (10 years and 28 countries), the authors proceeded to use a dynamic panel regression using the Arellano–Bond one-step difference General Method of Moments for estimation, as this is set to overcome the above highlighted econometric issues related to the working dataset of this analysis.
The Arellano–Bond one-step difference GMM converts the independent variable according to the differencing method. This produces the least amount of bias and variance in parameters estimation [
38].
Taking all these into considerations, to justify the choice of the above-mentioned method, the variables for this analysis were first tested for endogeneity using EViews 11, the results showing that the
variable (the health-related online purchases and downloads) is correlated with the error term (see
Table 1). This leads to biased estimates.
Thus, the simultaneity in this estimation leads to biased estimates, as the assumption in the Gauss–Markov theorem related to the independent variables not being correlated with the error term is violated [
39]. As endogeneity is confirmed in this case,
p < 0.05 for the
independent variable, the authors concluded that the OLS estimation will produce an inconsistent and biased estimator, so they proceeded to use a method based on instrumental variables, which allow the derivation of consistent estimates.
Secondly, the likelihood test output confirmed that the standard errors are non-constant, thus meaning that heteroskedasticity occurred in our working dataset (
p = 0.000 < 0.05, as for
Table 2).
The serial correlation was tested by applying the Breusch–Pagan Lagrange Multiplier (LM) test, which revealed that, in this classical regression model, the error terms are correlated (the
p-value is lower than 0.05, thus confirming the presence of serial correlation, as shown in
Table 3).
To solve the problem related to the endogeneity occurring in the case of the variable
Zit, a set of instrumental variables were included in the analysis. These variables referred to aspects such as the inequality of income distribution, the share of the population with tertiary education attainment, the digital inclusion of individuals, the Gini coefficient, at risk of poverty rate, the population distribution by gender, the median average age of the population, the employment rate, and the ability to make ends meet. The Sargan and Arellano–Bond tests (see
Table 4 and
Table 5) were used in order to identify possible restrictions.
The null hypothesis for this test was that the instruments as a group are not linked to a set of residuals. The statistical confirmation of this hypothesis proved the validity of the instrumental variables chosen for the analysis. On the other hand, the Arellano–Bond test for first-order (AR(1)) and for second-order (AR(2)) in the first differences was applied to the differenced residuals, being based on the no serial correlation null hypothesis. In our case, the test for the AR(1) process in first differences rejects the null hypothesis, while the test for AR(2) in first differences, which identifies the serial correlation in levels, states the acceptance of the null hypothesis of no serial correlation. According to the regression output, the R-Squared value of 0.78 indicates the fact that 78% of the variation in self-perceived health can be explained by the independent variables (“Internet use for seeking health-related information” and “health-related online purchases and downloads”).
Considering the
p-values, the followings can be asserted: the model is valid, and all the variables included in the analysis are significantly influencing the self-perceived health of the population, as the
p-value is lower than 0.05. The Sargan tested if the instruments are valid. The confirmation is given by the J-Statistic (
Table 4), for which the
p-value = 0.357. Therefore, the authors can state, with a probability of 95% that the instruments used in this analysis are valid. The Arellano–Bond test (
Table 5) revealed no autocorrelation for the second order (AR(2)),
p-value = 0.5118. While for the AR(1) in first differences, the Arellano–Bond test rejects the null hypothesis, as expected (
p-value = 0.00 < 0.05). For the case of AR(2) in first differences, the null hypothesis stands, which means that no serial correlation is present.
Thus, the explanatory model will take the following expression:
where:
Yit Yi,t−1—the percentage of the population in the country “i” that perceived its health status as being good or very good during the year “t” and in the previous year (“t − 1”).
Xit—the percentage of the population in the “i” country that is using the internet for seeking health-related information during the year “t”.
Zit—the percentage of the population in the “i” country that purchased health-related issues online during the year “t”.