## 1. Introduction

A seemingly unrelated regression (SUR) system, originally proposed by

Zellner (

1962), comprises multiple individual regression equations that are correlated with each other. Zellner’s idea was to improve estimation efficiency by combining several equations into a single system. Contrary to SUR estimation, the ordinary least squares (OLS) estimation loses its efficiency and will not produce best linear unbiased estimates (BLUE) when the error terms between the equations in the system are correlated. This method has a wide range of applications in economic and financial data and other similar areas (

Shukur 2002;

Srivastava and Giles 1987;

Zellner 1962). For example,

Dincer and Wang (

2011) investigated the effects of ethnic diversity on economic growth.

Williams (

2013) studied the effects of financial crises on banks. Since it considers multiple related equations simultaneously, a generalized least squares (GLS) estimator is used to take into account the effect of errors in these equations.

Barari and Kundu (

2019) reexamined the role of the Federal Reserve in triggering the recent housing crisis with a vector autoregression (VAR) model, which is a special case of the SUR model with lagged variables and deterministic terms as common regressors. One might also consider the correlations of explanatory variables in SUR models.

Alkhamisi and Shukur (

2008) and

Zeebari et al. (

2012,

2018) considered a modified version of the ridge estimation proposed by

Hoerl and Kennard (

1970) for these models.

Alkhamisi (

2010) proposed two SUR-type estimators by combining the SUR ridge regression and the restricted least squares methods. These recent studies demonstrated that the ridge SUR estimation is superior to classical estimation methods in the presence of multicollinearity.

Srivastava and Wan (

2002) considered the Stein-rule estimators from

James and Stein (

1961) in SUR models with two equations.

In our study, we consider preliminarily test and shrinkage estimation, more information on which can be found in

Ahmed (

2014), in ridge-type SUR models when the explanatory variables are affected by multicollinearity. In a previous paper, we combined penalized estimations in an optimal way to define shrinkage estimation (

Ahmed and Yüzbaşı 2016).

Gao et al. (

2017) suggested the use of the weighted ridge regression model for post-selection shrinkage estimation.

Yüzbaşı et al. (

2020) gave detailed information about generalized ridge regression for a number of shrinkage estimation methods.

Srivastava and Wan (

2002) and

Arashi and Roozbeh (

2015) considered Stein-rule estimation for SUR models.

Erdugan and Akdeniz (

2016) proposed a restricted feasible SUR estimate of the regression coefficients.

The organization of this paper is as follows: In

Section 2, we briefly review the SUR model and some estimation techniques, including the ridge type. In

Section 3, we introduce our new estimation methodology. A Monte Carlo simulation is conducted in

Section 4, and our economic data are analysed in

Section 5. Finally, some concluding remarks are given in

Section 6.

## 2. Methodology

Consider the following model:

the

${i}^{\mathrm{th}}$ equation of an

M seemingly unrelated regression equation with

T number of observations per equation.

${\mathbf{Y}}_{i}$ is a

$T\times 1$ vector of

T observations;

${\mathbf{X}}_{i}$ is a

$T\times {p}_{i}$ full column rank matrix of

T observations on

${p}_{i}$ regressors; and

${\mathbf{\beta}}_{i}$ is a

${p}_{i}\times 1$ vector of unknown parameters.

Equation (

1) can be rewritten as follows:

where

$\mathbf{Y}={\left({\mathbf{Y}}_{1}^{\prime},{\mathbf{Y}}_{2}^{\prime},\cdots ,{\mathbf{Y}}_{M}^{\prime}\right)}^{\prime}$ is the vector of responses and

$\mathbf{\epsilon}={\left({\mathbf{\epsilon}}_{1}^{\prime},{\mathbf{\epsilon}}_{2}^{\prime},\cdots ,{\mathbf{\epsilon}}_{M}^{\prime}\right)}^{\prime}$ is the vector of disturbances with dimension

$TM\times 1$,

$\mathbf{X}=diag\left({\mathbf{X}}_{1},{\mathbf{X}}_{2},\cdots ,{\mathbf{X}}_{M}\right)$ of dimension

$TM\times p$, and

$\mathbf{\beta}={\left({\mathbf{\beta}}_{1}^{\prime},{\mathbf{\beta}}_{2}^{\prime},\cdots ,{\mathbf{\beta}}_{M}^{\prime}\right)}^{\prime}$ of dimension

$p\times 1$, for

$p={\sum}_{i=1}^{M}{p}_{i}$.

The disturbances vector

$\mathbf{\epsilon}$ satisfies the properties:

and:

where

$\mathsf{\Sigma}=\left[{\sigma}_{ij}\right],i,j=1,2,\cdots ,M$ is an

$M\times M$ positive definite symmetric matrix, ⊗ stands for the Kronecker product, and

$\mathbf{I}$ is an identity matrix of order of

$T\times T$. Following

Greene (

2019), we assume strict exogeneity of

${\mathbf{X}}_{i}$,

and homoscedasticity:

Therefore, it is assumed that disturbances are uncorrelated across observations, that is,

and it is assumed that disturbances are correlated across equations, that is,

The OLS and GLS estimator of model (

2) are thus given as:

and:

${\widehat{\mathbf{\beta}}}^{\mathrm{OLS}}$ simply consists of the OLS estimators computed separately from each equation and omits the correlations between equation, as can be seen in

Kuan (

2004). Hence, it should use the GLS estimator when correlations exist among equations. However, the true covariance matrix

$\mathsf{\Sigma}$ is generally unknown. The solution for this problem is a feasible generalized least squares (FGLS) estimation, which uses covariance matrix

$\widehat{\mathsf{\Sigma}}$ of

$\mathsf{\Sigma}$ in the estimation of GLS. In many cases, the residual covariance matrix is calculated by:

where

${\widehat{\mathbf{\epsilon}}}_{i}={\mathbf{Y}}_{i}-{\mathbf{X}}_{i}{\widehat{\mathbf{\beta}}}_{i}$ represents residuals from the

${i}^{\mathrm{th}}$ equation and

${\widehat{\mathbf{\beta}}}_{i}$ may be the OLS or ridge regression (RR) estimation such that

${({\mathbf{X}}_{i}^{\prime}{\mathbf{X}}_{i}+\lambda \mathbf{I})}^{-1}{\mathbf{X}}_{i}^{\prime}{\mathbf{Y}}_{i}$ with the tuning parameter

$\lambda \ge 0$. Note that we use the RR solution to estimate

$\mathsf{\Sigma}$ in our numerical studies because we assume that two or more explanatory variables in each equation are linearly related. Therefore,

$\widehat{\mathsf{\Omega}}=\widehat{\mathsf{\Sigma}}\otimes \mathbf{I}$, the FGLS of the SUR system, is:

By following

Srivastava and Giles (

1987) and

Zeebari et al. (

2012), we first transform Equation (

2) by using the following transformations, in order to retain the information included in the correlation matrix of cross equation errors:

Hence, Model (

2) turns into:

The spectral decomposition of the symmetric matrix

${\mathbf{X}}_{\ast}^{\prime}{\mathbf{X}}_{\ast}$ is

${\mathbf{X}}_{\ast}^{\prime}{\mathbf{X}}_{\ast}=\mathbf{P}\mathsf{\Lambda}{\mathbf{P}}^{\prime}$ with

$\mathbf{P}{\mathbf{P}}^{\prime}=\mathbf{I}$. Model (

3) can then be written as:

with

$\mathbf{Z}={\mathbf{X}}_{\ast}\mathbf{P}$,

$\mathbf{\alpha}={\mathbf{P}}^{\prime}\mathbf{\beta}$ and

${\mathbf{Z}}^{\prime}\mathbf{Z}={\mathbf{P}}^{\prime}{\mathbf{X}}_{\ast}^{\prime}{\mathbf{X}}_{\ast}\mathbf{P}=\mathsf{\Lambda}$, so that

$\mathsf{\Lambda}$ is a diagonal matrix of eigenvalues and

$\mathbf{P}$ is a matrix whose columns are eigenvectors of

${\mathbf{X}}_{\ast}^{\prime}{\mathbf{X}}_{\ast}$.

The OLS estimator of model (

4) is:

The least squares estimates of

$\mathbf{\beta}$ in model (

2) can be obtained by an inverse linear transformation as:

Furthermore, by following

Alkhamisi and Shukur (

2008), the full model ridge SUR regression parameter estimation is:

where

$\mathbf{K}=diag({\mathbf{K}}_{1},{\mathbf{K}}_{2},\cdots ,{\mathbf{K}}_{M})$,

${\mathbf{K}}_{i}=diag({k}_{i1},{k}_{i2},\cdots ,{k}_{i{p}_{i}})$ and

${k}_{ij}=\frac{1}{{\left({\widehat{\alpha}}^{\mathrm{OLS}}\right)}_{ij}^{2}}>0$ for

$i=1,2,\cdots ,M$ and

$j=1,2,\cdots ,{p}_{i}$.

Now let us assume that uncertain non-sample prior information (UNPI) on the vector of

$\mathbf{\beta}$ parameters is available, either from previous studies, expert knowledge, or researcher’s experience. This information might be of use for the estimation of parameters, in order to improve the quality of the estimators when the sample data have a low quality or may not be reliable

Ahmed (

2014). It is assumed that the UNPI on the vector of parameters will be restricted by the equation for Model (

2),

where

$\mathbf{R}=diag({\mathbf{R}}_{1},{\mathbf{R}}_{2},\cdots ,{\mathbf{R}}_{M})$,

${\mathbf{R}}_{i},i=1,\cdots ,M$ is a known

${m}_{i}\times {p}_{i}$ matrix of rank

${m}_{i}<{p}_{i}$ and

$\mathbf{r}$ is a known

${\sum}_{i}^{M}{m}_{i}\times 1$ vector. In order to use restriction (

7) in Equation (

2), we transform it as follows:

where

$\mathbf{H}=\mathbf{R}\mathbf{P}$ and

$\mathbf{\alpha}={\mathbf{P}}^{\prime}\mathbf{\beta}$, which is defined above. Hence, the restricted ridge SUR regression estimation is obtained from the following objective function:

where

${\mathbf{Z}}_{\mathbf{K}}=({\mathbf{Z}}^{\prime}\mathbf{Z}+\mathbf{K})$.

**Theorem** **1.** The risks of ${\widehat{\mathbf{\alpha}}}^{\mathrm{RR}}$ and ${\tilde{\mathbf{\alpha}}}^{\mathrm{RR}}$ are given by:where $\mathbf{\delta}={\mathsf{\Lambda}}^{-1}{\mathbf{H}}^{\prime}{\left(\mathbf{H}{\mathsf{\Lambda}}^{-1}{\mathbf{H}}^{\prime}\right)}^{-1}\left(\mathbf{H}\mathbf{\alpha}-\mathbf{r}\right)$. **Proof.** For the risk of the estimators

${\widehat{\mathbf{\alpha}}}^{\mathrm{RR}}$ and

${\tilde{\mathbf{\alpha}}}^{\mathrm{RR}}$, we consider:

where

${\mathbf{\alpha}}^{\ast}$ is the one of the estimators

${\widehat{\mathbf{\alpha}}}^{\mathrm{RR}}$ and

${\tilde{\mathbf{\alpha}}}^{\mathrm{RR}}$ and

$\mathbf{M}\left({\mathbf{\alpha}}^{\ast}\right)=E\left[\left({\mathbf{\alpha}}^{\ast}-\mathbf{\alpha}\right){\left({\mathbf{\alpha}}^{\ast}-\mathbf{\alpha}\right)}^{\prime}\right]$. Since:

where

$\mathsf{\Lambda}={\mathbf{Z}}^{\prime}\mathbf{Z}$.

Using

$\mathsf{\Lambda}(\mathbf{K})={\left[\mathbf{I}+{\mathsf{\Lambda}}^{-1}\mathbf{K}\right]}^{-1}$,

${k}_{ij}\ge 0$, we get:

Hence,

Therefore, the risk of

${\widehat{\mathbf{\alpha}}}^{\mathrm{RR}}$ is directly obtained by definition. Similarly,

and,

Thus, the risk of

${\tilde{\mathbf{\alpha}}}^{\mathrm{RR}}$ is directly obtained by definition. □

## 5. Application

In the following section, we will apply the proposed estimation strategies to a financial dataset to examine the relative performance of the listed estimators. To illustrate and compare the listed estimators, we will study the effect of several economic and financial variables on the performance of the “Fragile Five” countries (coined by

Stanley 2013) in terms of their attraction of direct foreign investment (FDI) over the period between 1983 and 2018. The “Fragile Five” include Turkey (TUR), South Africa (ZAF), Brazil (BRA), India (IND), and Indonesia (IDN).

Agiomirgianakis et al. (

2003),

Hubert et al. (

2017), and

Akın (

2019) used the FDI as the dependent variable across countries. With five countries, we have

$M=5$ blocks in our SUR model, with measurements of

$T=36$ years per equation.

Table 2 provides information about prediction variables, and the raw data are available from the World Bank

1.

We suggest the following model:

where

i denotes countries

$(i=\mathrm{TUR},\mathrm{ZAF},\mathrm{BRA},\mathrm{IND},\mathrm{IDN})$ and

t is time

$(t=1,2,\cdots ,T)$. Following

Salman (

2011), the errors of each equation are assumed to be normally distributed with mean zero, homoscedastic, and serially not autocorrelated. Furthermore, there is contemporaneous correlation between corresponding errors in different equations. We test these assumptions along with the assumptions in

Section 2. We first check the following assumptions of each equation:

Nonautocorrelation of errors: There are a number of viable tests in the reviewed literature for testing the autocorrelation. For example, the Ljung–Box test is widely used in applications of time series analysis, and a similar assessment may be obtained via the Breusch–Godfrey test and the Durbin–Watson test. We apply the Ljung–Box test of (

Ljung and Box 1978). The null hypothesis of the Ljung–Box Test, H

_{0}, is that the errors are random and independent. A significant

p-value in this test rejects the null hypothesis that the time series is not autocorrelated. Results reported in

Table 3 suggest a rejection of H

_{0} for the equations of both TUR and IND at any conventional significance level. Thus, the estimation results will be clearly unsatisfactory for these two equation models. To tackle this problem, we performed the first differences procedure to transform the variables. After transformation, the test statistics and

p-values of the equation TUR and IND were

${\chi}_{(1)}^{2}=1.379$,

$p=0.240$ and

${\chi}_{(1)}^{2}=0.067$,

$p=0.794$, respectively. Hence, each equation satisfied the assumption of nonautocorrelation. We confirmed our result using the Durbin–Watson test.

Homoscedasticity of errors: To test for heteroscedasticity, we used the Breusch–Pagan test (

Breusch and Pagan 1979). The results in

Table 4 failed to reject the null hypothesis in each equation.

The assumption homoscedasticity in each equation was thus met.

Normality of errors: To test for normality, there are various tests such as Shapiro–Wilk, Anderson–Darling, Cramer–von Mises, Kolmogorov–Smirnov, and Jarque–Bera. In this study, we performed the Jarque–Bera goodness-of-fit test (

Jarque and Bera 1980).

The null hypothesis for the test is that the data are normally distributed. The results reported in

Table 5 suggested a rejection of H

_{0} only for ZAF. We also performed the Kolmogorov–Smirnov test for ZAF, and the results showed that the errors were normally distributed. Thus, each equation satisfied the assumption of normality.

Cross-sectional dependence: To test whether the estimated correlation between the sections was statistically significant, we applied the

Breusch and Pagan (

1980) Lagrange multiplier (LM) statistic and the

Pesaran (

2004) cross-section dependence (CD) tests. The null hypothesis of these tests claims there is no cross-section dependence. Both tests in

Table 6 suggested a rejection of the null hypothesis that the residuals from each equation were significantly correlated with each other. Consequently, the SUR model would be the preferred technique, since this model assumed contemporaneous correlation across equations. Therefore, the joint estimation of all parameters rather than OLS, on each equation, was more efficient (

Kleiber and Zeileis 2008).

Specification test: The regression equation specification error test (RESET) designed by

Ramsey (

1969) is a general specification test for the linear regression model. It tests the exogeneity of the independent variables, that is the null hypothesis is

$E\left[{\epsilon}_{i}|{\mathbf{X}}_{i}\right]=0$. Thus, rejecting the null hypothesis indicates that there is a correlation between the error term and the regressors or that nonlinearities exist in the functional form of the regression. The results reported in

Table 7 suggested a rejection of H

_{0} only for IDN.

Multicollinearity: We calculated the variance inflation factor (VIF) values among the predictors. A VIF value provides the user with a measure of how many times larger the Var

$({\beta}_{j})$ will be for multicollinear data than for orthogonal data. Usually, the multicollinearity is not a problem, as the VIFs are generally not significantly larger than one (

Mansfield and Helms 1982). In the literature, values of VIF that exceed 10 are often regarded as indicating multicollinearity, but in weaker models, values above 2.5 may be a cause for concern. Another measure of multicollinearity is to calculate the condition number (CN) of

${\mathbf{X}}_{i}^{\prime}{\mathbf{X}}_{i}$, which is the square root of the ratio of the largest characteristic root of

${\mathbf{X}}_{i}^{\prime}{\mathbf{X}}_{i}$ to the smallest.

Belsley et al. (

2005) suggested that a CN greater than fifteen poses a concern, a CN in excess of 20 is indicative of a problem, and a CN close to 30 represents a severe problem.

Table 8 displays the results from a series of multicollinearity diagnostics. In general, EXPORTS, IMPORTS, and BALANCE were found to be problematic with regard to VIF values, while the others may be a little concerning. On the other hand, the results from the CN test suggested that there was a very serious concern about multicollinearity for the equations of ZAF, BRA, and IDN. In light of these results, it was clear that the problem of multicollinearity existed in the equations. According to

Greene (

2019), the SUR estimation is more efficient when the less correlation exists between covariates. Therefore, the ridge-type SUR estimation will be a good solution of this problem.

Structural change: To investigate the stability of the coefficients in each equation, we used the CUSUM (cumulative sum) test of

Brown et al. (

1975) that checks for structural changes. The null hypothesis is that of coefficient constancy, while the alternative suggests inconsistent structural change in the model over time. The results in

Table 9 suggested the stability of coefficients over time.

Following

Lawal et al. (

2019), we selected important variables in each equation of the SUR model and implemented the stepwise AIC forward regression by using the function

**ols_step_forward_aic** from the

**olsrr** package in the R project. The statistically significant variables are shown in

Table 10. After that, the sub-models were constituted by using these variables per equation.

In light of the selected variables in

Table 10, we construct the matrices of restrictions as follows:

thus, the reduced models are given by:

Next, we combined Model (

14) and Models (

15)–(

19) using the shrinkage and preliminarily test strategies outlined in

Section 3. Before we performed our analysis, the response was centred, and the predictors were standardized for each equation so that the intercept term was omitted. We then split the data by using the time series cross-validation technique of

Hyndman and Athanasopoulos (

2018) into a series of training sets and a series of testing sets. Each test set consisted of a single observation for the models that produced one step-ahead forecasts. In this procedure, the observations in the corresponding training sets occurred prior to the observation of the test sets. Hence, it was ensured that no future observations could be used in constructing the forecast. We used the function

**createTimeSlices** from the

**caret** package in the R project here. The listed models were applied to the data, and predictions were made based on the divided training and test sets. The process was repeated 15 times, and for each subset’s prediction, the mean squared error (MSE) and the mean absolute error (MAE) were calculated. The means of the 15 MSEs and MAEs were then used to evaluate the performance for each method. We also report the relative performances (RMAE and RMSE) with respect to the full model estimator for easier comparison. If a relative value of an estimator is larger than one, it is superior to the full model estimator.

In

Table 11, we report the MSE and MAE values and their standard errors to see the stability of the algorithm. Based on this table, as expected, the RE had the smallest measurement values since the insignificant variables were selected as close to correct as possible. We saw that the performance of the PSE after the RE was best by following the SE and the PTE. Moreover, the performance of the OLS was the worst due to the problem of multicollinearity.

In order to test whether the two competing models had the same forecasting accuracy, we used the two-sided statistical Diebold–Mariano (DM) test (

Diebold and Mariano 1995) when the forecasting horizon was extended to one year, and the loss functions were both squared errors and absolute errors. A significant

p-value in this test rejected the null hypothesis that the models had different forecasting accuracy. The results based on the absolute-error loss in

Table 12 suggested that the FME had different prediction accuracy with all methods except RE. Additionally, the forecasting accuracy of the OLS differed from the listed estimators. On the other hand, the results of the DM test based on the squared error loss suggested that the observed differences between the RE and shrinkage estimators were significant.

Finally, the estimates of coefficients of all countries are given in

Table 13.