Subsequently, longitudinal regression models were applied for both dependent variables—EI and TEA.
2.2.1. Feature Extraction
In the age of big data, an appropriate selection of attributes is essential for several reasons, some of which are next enumerated. The knowledge induced by data analysis algorithms on a smaller number of attributes is often more understandable. Moreoever, several algorithms present worse performance when using a large number of variables, and the attributes’ selection can improve it. Thus, when large multivariate data sets are analysed, it is often desirable to reduce its dimensionality. In addition, some domains have a high cost of data collection, in which cases, attribute selection can reduce costs. For a review on the recent developments of feature extraction methods such as PCA and FA, see [
18].
In statistics, machine learning and information systems, dimensionality reduction (or dimension reduction) is the process of reducing the number of random variables into consideration in order to obtain a set of main variables. The main approaches for dimension reduction can be divided into: feature selection and extraction. The selection methods try to choose the most representative variables as variables to consider. The extraction methods use the information contained in the data to create another data set with a smaller dimension, while retaining as much information from the original data as possible.
PCA is an extraction technique. It replaces the
p original variables,
, by a smaller number,
, of the derived variables
—the principal components (PC). The PC are non-correlated linear combinations of the original variables, i.e.,
PCA aims to represent (or describe) an initial number of variables by a smaller number of hypothetical variables. Thus, it identifies new variables (in smaller numbers) without a significant lost of information. The components are calculated in descending order of importance, with the first explaining as much of the data variability as possible, i.e.,
Often, it is possible to retain most of the variability in the original variables, with
q much smaller than
p [
19].
PCA is based on the analysis of linear correlations between variables, which tries to identify sets of variables that are highly correlated. It allows one to conclude whether it is possible to explain the pattern of correlations through a smaller number of variables. As each linear combination explains as much as possible of the unexplained variance and it has to be orthogonal to any other combination already defined, the set of all combinations found constitutes a unique solution.
Therefore, it is an exploratory analysis technique used to reduce the size of the data, and to identify latent factors.
The representation by PC does not require assumptions on the probability distribution of the variables. However, PCA is largely dependent on the existence of variables with different scales. Therefore, variables must be standardised prior to applying PCA.
PCA implies the absence of error; the latent variables are linear combinations of the initial variables.
In Factor Analysis (FA), each observed variable is described as a function of the common factors retained, and the error is presented, because usually a smaller number of factors is retained when compared with the initial number of variables.
In FA, only the common variation, shared by all the variables, is retained in each factor, while in PCA, the total variation is present in the set of original variables.
Thus, a rotation process is considered in FA. A rotation is a linear transformation that is performed on the component solution for the purpose of making the solution easier to interpret. Interpreting a rotated solution is to identify the concept measured by each of the retained components, also designated as a construct or latent variable.
The number of factors to retain can be defined using several criteria, for example, the factors with an eigenvalue larger than 1 and the percentage of variation of the original variables retained, among others.
PCs,
, are expressed as linear combinations of the original variables,
(see Equation (
1)). Thus, the error is absent.
In contrast, in FA, each observed variable,
, is described as a function of the common factors,
, with
:
However, the factors are easier to interpret and, therefore, facilitate the definition of latent variables in the data.
Data Adequacy
It is not always appropriate to use FA. In fact, exploratory FA is only useful if the matrix of population correlation is statistically different from the identity matrix [
20]. It is, then, necessary that the variables are correlated. The Bartlett Sphericity test can be used to test the equality of the correlation matrix to the identity matrix. Only if there is a statistical difference between them is the FA is useful in the estimation of common factors.
The sample adequacy for FA application can be measured by the Kaiser–Meyer–Olkin (KMO) coefficient. This coefficient is based on the partial correlations between the variables and provides information on whether variable selection and sample size are suitable for FA. Sample adequacy can be classified as: incompatible, for KMO
; bad, for KMO
; standard, for KMO
; medium, for KMO
; good, for KMO
; and very good, for KMO
[
21].
The Measure of Sample Adequacy (MSA) of each item/variable must be >0.50. If the variable has a MSA lower than this value, it must be removed from the analysis, as it is not sufficiently correlated with the others and will not be suitable for the analysis.
The communalities, computed in both procedures, PCA and FA, refer to the percentage of the accounted variance of the observed variable retained in the components (or factors). A given variable will display a large communality if it loads heavily on at least one of the retained components.
Retaining Factors
There are a several methods that can be used to select the appropriate number of components to retain in FA [
22]. The most used are: the Kaiser criterion, which proposes to retain the factors with an eigenvalue larger than 1; the observation of the Scree plot, by evaluating when there is a substantial decline in the magnitude of the eigenvalues; or by specifying a limit value for the cumulative percentage of variance explained by the factors (which are usually larger than 70%). In several research areas, the interpretability criteria are perhaps the most important criterion for determining the number of components.
Reliability
To measure the reliability of a factor, the Cronbach’s alpha can be used. This is the most widely used objective measure of reliability. Lee Cronbach established the Cronbach’s alpha in 1951 [
23] to provide a measure of a test or scale’s internal consistency, and it is determined by a number between 0 and 1. The amount to which all elements measure the same concept or construct is referred to as internal consistency. The value of Cronbach’s alpha rises as the variables become more linked. A high Cronbach’s alpha coefficient, on the other hand, does not automatically imply a high level of internal consistency. More comparable items testing the same notion should be added to raise Cronbach’s alpha.
Typically, the acceptable values of Cronbach’s alpha range from 0.70 to 0.95 [
24].
2.2.2. System GMM
This study analyses repeated measures of EFCs through time for each country, hence longitudinal models (also known as panel models) are applied.
Contrary to cross-sectional methods that aggregate time-series, which may conceal an underlying dynamic (macroeconomic or of other nature), longitudinal models offer the possibility to account for, and investigate, the heterogeneity that may occur between economies.
The standard Fixed Effects Model (FEM) and the Random Effects Model (REM) control for the existence of a bias related to heterogeneity across economies and time. FEM considers individual, unobserved characteristics (i.e., unobserved heterogeneity). The REM model specification assumes that group effects follow a normal distribution over all the groups. However, these models do not overcome the endogeneity problem due to the potential correlation between one or several explanatory variables and the residuals. One strategy to overcome the endogeneity concern could be the Instrumental variable (IV) estimator; however, the main challenge is obtaining valid instruments applicable to panel analyses [
25] and theoretically validating them.
In addition to the aforementioned problem, there is another disadvantage in using the FEM and REM models, which is that they are static and do not allow the unbiased estimation of parameters when including past (lagged) values of the dependent variable in the model.
In fact, recently, studies that analyse EI and TEA (e.g., [
26,
27,
28,
29]), have taken into account the dynamic effect of past values of the variables in their estimations, finding a statistically significant effect. Thus, it was decided to consider in the present analysis the effect of the lagged values of EI and TEA in the estimations.
Because the lagged dependent variable is dependent (correlated) on the error term, a dynamic longitudinal model should be considered. In fact, using the pooled OLS estimation technique yields inconsistent estimates. Furthermore, using the FEM estimator to transform the data to remove the fixed effects does not completely eliminate the inconsistency, because the transformed lagged dependent variable still depends on the error term [
30,
31].
Moreover, the data under analysis is extremely unbalanced, as there is no information on the variables of interest for all countries in all moments (years). This unbalanced nature of the data requires that the longitudinal dynamics of the expected values of EI and TEA are analysed through a model that overcomes this issue.
Thus, in this analysis, the General Method of Moments (GMM) estimator, proposed by [
32,
33], is used. This model controls for possible endogeneity and unobserved heterogeneity. This technique use lags of the dependent variables as explanatory variables to capture the effect of past values of the dependent variable. In this, lagged dependent variable values are used as (internal) instruments to control this endogenous relationship. The GMM model eliminates endogeneity by “internally transforming the data”. This transformation is a statistical process that subtracts a variable’s past value from its present value [
17]. As a result, the number of observations is reduced, and this process (internal transformation) improves the GMM model’s efficiency [
34].
However, to prevent potential data loss, as we deal with unbalanced data, the second-order transformation (two-step GMM) was considered, as recommended by [
32]. This transformation applies “forward orthogonal deviations”, subtracting the average of all future available observations of a particular variable from its current value [
17], instead of only subtracting from the previous observation (as in the first-step procedure).
The GMM model considered can be parameterised, as follows:
where
is the dependent variable (in this case, IE or TEA),
is the vector of independent variables,
is the autoregressive parameter,
is the unobserved fixed effect and
is the idiosyncratic shocks normally distributed with zero mean and constant variance. The error term follows the error component structures, in which
;
and
,
.
Additionally, to capture the overall economic and social context, it is controlled for per-capita GDP (following the works of [
35,
36,
37]), for size of country’s population, as this is likely to affect the number of people available to work in the labour force, as well as the country’s entrepreneurship rates [
38]. Furthermore, year dummy variables (time-specific effects) were included to reduce the influence of cross-sectional error dependence in the dynamic panel model.
System GMM uses moment conditions that are functions of the model parameters and the data, such that their expectation is zero at the parameters’ true values. It controls for endogeneity of the lagged dependent variable in a dynamic panel model—when there is correlation between the explanatory variable and the error term in a model; omitted variable bias; unobserved heterogeneity; and measurement errors.
This method consists of a system of two equations: an original equation, expressed in the levels’ form with the first differences as instruments. It transforms all aggressors through differentiation, thus removing fixed effects (that do not vary over time); and using a transformed equation, expressed in a first-differenced form with levels as instruments. To clarify:
where is the instrument of .
The calculation of the system GMM estimator is based on a stacked system comprising moment conditions, for which instruments are observed (where T represents the total number of years, in this specific case). The Arellano–Bond test for serial correlation is used to test for second-order serial correlation in the first-differenced residuals for model diagnostics. The null hypothesis states that the residuals are uncorrelated serially. If the null hypothesis cannot be rejected, it demonstrates that there is no second-order serial correlation and that the GMM estimator is reliable. In addition, the Hansen J-test is used to assess the null hypothesis of instrument validity as well as the validity of the additional moment restriction required for system GMM. The instruments are valid if the null hypothesis is not rejected. The variance inflation factor (VIF), an indication of how much the standard error has inflated, was used to calculate collinearity diagnostics for the regression equation estimated (results are not presented due to space restrictions, but available under request).