Forecasting Energy-Related Carbon Dioxide Emissions in Thailand’s Construction Sector by Enriching the LS-ARIMAXi-ECM Model

The Thailand Development Policy focuses on the simultaneous growth of the economy, society, and environment. Long-term goals have been set to improve economic and social well-being. At the same time, these aim to reduce the emission of CO2 in the future, especially in the construction sector, which is deemed important in terms of national development and is a high generator of greenhouse gas. In order to achieve national sustainable development, policy formulation and planning is becoming necessary and requires a tool to undertake such a formulation. The tool is none other than the forecasting of CO2 emissions in long-term energy consumption to produce a complete and accurate formulation. This research aims to study and forecast energy-related carbon dioxide emissions in Thailand’s construction sector by applying a model incorporating the longand short-term auto-regressive (AR), integrated (I), moving average (MA) with exogenous variables (Xi) and the error correction mechanism (LS-ARIMAXi-ECM) model. This model is established and attempts to fill the gaps left by the old models. In fact, the model is constructed based on factors that are causal and influential for changes in CO2 emissions. Both independent variables and dependent variables must be stationary at the same level. In addition, the LS-ARIMAXi-ECM model deploys a co-integration analysis and error correction mechanism (ECM) in its modeling. The study’s findings reveal that the LS-ARIMAXi (2, 1, 1, Xt−1)-ECM model is a forecasting model with an appropriate time period (t − i), as justified by the Q-test statistic and is not a spurious model. Therefore, it is used to forecast CO2 emissions for the next 20 years (2019 to 2038). From the study, the results show that CO2 emissions in the construction sector will increase by 37.88% or 61.09 Mt CO2 Eq. in 2038. Also, the LS-ARIMAXi (2, 1, 1, Xt−1)-ECM model has been evaluated regarding its performance, and it produces a mean absolute percentage error (MAPE) of 1.01% and root mean square error (RMSE) of 0.93% as compared to the old models. Overall, the results indicate that determining future national sustainable development policies requires an appropriate forecasting model, which is built upon causal and contextual factors according to relevant sectors, to serve as an important tool for future sustainable planning.


Introduction
Over the past few years and up to the present, Thailand has continuously made a firm effort to enhance its economic development. As a result, the national economy has continued to grow. The gross domestic product (GDP) has also grown at the same time [1]. In fact, Thailand has been 1.
We analyze stationary causal variables and those which are influential over the change of CO 2 emissions based on the augmented Dickey and Fuller theory [5]. We select stationary variables at the same level under the Sustainable Development Framework along with the use of data from 1990 to 2017.

2.
We bring those stationary causal variables to the same level to analyze a long-term relationship through a concept from Johansen and Juselius [6]. 3.
We apply co-integrated variables at the same level to construct the the long-and short-term auto-regressive (AR), integrated (I), moving average (MA) with exogenous variables (Xi) and the error correction mechanism (LS-ARIMAXi-ECM) model comprising endogenous variables and exogeneous variables. 4.
We examine the period of time (t − i) for the appropriateness of the LS-ARIMAXi (p, d, q, X t−i )-ECM model with Q-testing, as well as checking on spurious issues, consisting of heteroscedasticity, multicollinearity and autocorrelation.

5.
We compare the efficiency of the LS-ARIMAXi (p, d, q, X t−i )-ECM model with other existing models, including multiple regression, the grey model (GM (1,1)), grey model-autoregressive integrated moving average (GM-ARIMA) model, artificial neural network (ANN) model, autoregressive moving average (ARMA) model, and autoregressive integrated moving average (ARIMA) model, through the performance measurement of MAPE and RMSE.  6. We forecast future CO 2 emissions from the LS-ARIMAXi (p, d, q, X t−i )-ECM model during the years 2019 to 2038, totaling 20 years of forecasting. The flowchart of the LS-ARIMAXi (p, d, q, X t−i )-ECM model is shown in Figure 1.  The remainder of this paper is as follows: Section 2 is a literature review. Section 3 discusses the materials and methods. Section 4 shows the results. Section 5 summarizes the discussion. Section 6 is the conclusion.

Literature Review
Developing an energy-forecasting model is a key step to promoting a supportive national policy of an individual country. Having an efficient and effective model would allow all policy makers to make better decisions. Many studies have highlighted the significance of forecasting the energy consumption or other related areas. Ardakani and Ardehali [7] developed an optimized regression and ANN models for a long-term forecasting for the years 2010 to 2030 on the electrical energy consumption (EEC) of both developing and developed economies based on different optimized models and historical data types. By using such an approach, they obtained the result of which usage The remainder of this paper is as follows: Section 2 is a literature review. Section 3 discusses the materials and methods. Section 4 shows the results. Section 5 summarizes the discussion. Section 6 is the conclusion.

Literature Review
Developing an energy-forecasting model is a key step to promoting a supportive national policy of an individual country. Having an efficient and effective model would allow all policy makers to make better decisions. Many studies have highlighted the significance of forecasting the energy consumption or other related areas. Ardakani and Ardehali [7] developed an optimized regression and ANN models for a long-term forecasting for the years 2010 to 2030 on the electrical energy consumption (EEC) of both developing and developed economies based on different optimized models and historical data types. By using such an approach, they obtained the result of which usage of historical data of socio-economic indicators produce more accurate EEC forecasting. Azadeh, Ghaderi, Sheikhalishahi and Nokhandan [8] applied two different seasonal ANNs in order to predict a short load in Iran's electricity market. As regards their prediction result, it reflected a significant correlation between actual data and ANN outcomes. Hence, the ANN models outperform the regression models in terms of MAPE in most cases. Zhao, Zhao and Guo [9] carried out a study to estimate the electricity consumption in Inner Mongolia by using an integrated Grey model enriched by a Moth-flame optimization (MFO) algorithm along with rolling mechanism (Rolling-MFO-GM (1,1)). From their study, it can be seen that such a hybrid model can greatly enhance a forecasting performance for annual electricity consumption. In China, monthly electric energy was also estimated with the implementation of a feature extraction, and this study was investigated by Meng, Niu and Sun [10]. They found that the above method performed better than traditional approaches in terms of expected risk and forecasting precision. Hasanov, Hunt and Mikayilov [11] attempted to establish a model to forecast Azerbaijan's electricity demand in 2025 by applying co-integration and error correction approaches. In their study, Azerbaijan's electricity demand in 2025 was forecast between 19.50 and 21 TWh. Khairalla, Ning, AL-Jallad and El-Faroug [12] investigated the stacking multi-learning ensemble (SMLE) model to forecast energy consumption in the short term. The study's result demonstrated that the mentioned model functioned better and more accurately compared to other methods discussed in this paper.
In other studies, various methods are utilized differently, and their applications vary in context. Chang, Sun and Gu [13] presented a novel quantum harmony search (QHS) algorithm-based discounted mean square forecast error (DMSFE) combination model to forecast energy CO 2 emissions. This study's finding was able to certify the validity of the presented approach, while it also revealed that the forecasting precision can be enhanced to a certain degree. Zeng, Xu, Wang, Chen and Li, [14] examined and forecasted the allocative efficiency of China's carbon emission allowance financial assets at a provincial level for 2020. In their study, they deployed a zero sum gains data envelopment analysis (ZSG-DEA) model. As of their finding, an efficient allocation scheme for all the provinces, based on the mentioned model, was achieved. With that, they therefore provided a suggestion of which particular provinces have to cut off their CO 2 emission. Also, Liang, Niu, Wang and Chen [15] did an evaluation on the security early warning of energy consumption carbon emissions (ECCE) in Hebei Province of China. They constructed an assessment index system according to the pressure-state-response (P-S-R) model, as well as deploying the variance method and linearity weighted method in order to compute such an early warning index of ECCE. Their finding has shown the potential trend of growing improvement from the security index during 2015 to 2020, while the security degree and the corresponding alarm are found to be negative. Prakash, Xu, Rajagopal and Noh [16] presented a forecasting technique according to Gaussian process regression (GPR) to estimate an energy load, and the result reflected that the above method outperformed precisely as compared to other forecasting models. While Mehedintu, Sterpu and Soava [17] embarked on a study to estimate and predict the share of renewable energy consumption in final energy use within the European Union by 2020. This study's analysis utilized three macroeconomic indicators and five regression models (polynomial, ARIMA). Later, the finding showed a growing trend of the share. Liang, Niu, Cao and Hong [18] conducted an analysis and constructed a model to forecast China's electricity demand, in terms of carbon emissions. They began the study with an integration of the Grey relation degree (GRD) and induced ordered weighted harmonic averaging operator (IOWHA) in order to construct the optimal hybrid forecasting model, based on multiple regression and an extreme learning machine. Throughout the study, they drew the conclusion that the proposed model performs better than other forecasting models, especially in boosting overall instability. Furthermore, the study revealed that a low-carbon economy development will increase the demand for electricity, while it impacts the adjustment of the electricity demand structure. In other studies, Zhai and Wang [19] tried to predict the carbon emissions demands in India, under the balanced economic growth path, from 2009 to 2050, using the economy-carbon dynamic model. In this study, they projected that the cumulative energy demand and carbon emissions demand are 44.65 Gtoe and 36.16 Gt C, respectively. Additionally, those two demands will peak in 2045 at 1290.74 Mtoe and 1045.98 Mt C, respectively, while their demands disclose maximum values of 0.81 toe and 0.65 t C, respectively. On the other hand, in the case of China, Zeng and Chen [20] developed a low-carbon economy index evaluation system, based on the entropy weight method, in order to forecast the allocation ratio of carbon emissions in China for 2020 and 2030. They projected reasonable allocation ratios for carbon emission allowances during the predicted period. Attaining such an allocation ratio can help China in many respects, including economic development, energy conservation and emissions reduction. Zhou, Yu, Guang and Li [21] analyzed and predicted CO 2 emissions in China during the period of 2000 to 2014, by implementing the logarithmic mean division index (LMDI) and genetic algorithm-support vector machine (GA-SVM) model. Their finding reveals that the proposed model performs better than a back propagation neural network (BPNN) model and a single SVM model in terms of forecasting CO 2 emissions. Later, Xu, Gua, Liu and Dai [22] forecasted the final energy consumption of the Guangdong Province of China from 2013 to 2016 using a newly established GM-ARMA model, based on a HP filter. The study shows that this particular model has excellent precision and a higher level of reliability. Additionally, it indicates that the study region will face a serious issue concerning energy conservation and emission reductions in the next few years.
With different developed forecasting approaches, Zhao, Zhao, Liu, Su and An [23] conducted a study to forecast wind speed using the self-adaptive auto-regressive integrated moving average chaotic particle swarm optimization (SA-ARIMA-CPSO) approach. This approach was developed by a SA auto-regressive integrated moving average, with an exogenous variables (ARIMAX) model, through the optimization of the CPSO algorithm. Once the experimental result was revealed, the developed model was shown to outperform the other models. Souza, Christo and Almeida [24] proposed a method using the ARIMA model to locate the faults in power transmission lines. In the study, they analyzed the voltage oscillographic signals. Their study results were found to be satisfactory in a comparison with other used techniques in the literature. In addition, Farias, Puig, Rangel and Flores [25] attempted to forecast the demand of water distribution networks by deploying a multi-model predictor, qualitative multi-model predictor plus (QMMP+). In their study, it was found that such a predictor enhances the forecasting precision. On the other hand, Chen, Xu and Zhou [26] proposed a hybrid approach, combining the variational mode decomposition (VMD) denoising technique and the autoregressive integrated moving average (ARIMA) and GM (1,1) models to predict the lifetime of a battery (RUL). Once the experiment was carried out, a result was produced that indicates the accuracy of the proposed methods for lithium-ion battery on-line RUL prediction. Other than the above studies, Yang, Park, Choi, Kim, Munkhdalai, Musa and Ryu [27] conducted a comparative study on state-of-the-art techniques. They compared four different temporal outbreak detection algorithms, namely, the cumulative SUM (CUSUM), early aberration reporting system (EARS), ARIMA and the Holt-Winters algorithm. Here, the comparison results indicate that the EARS C3 method performs better than any other studied algorithms. However, it can be observed that the Holt-Winters outperforms the others when the baseline frequency and dispersion parameter values are less than 1.5 and 2, respectively. Additionally, Kahsai, Nondo, Schaeffer and Gebremedhin [28] investigated the relationship between energy consumption and economic growth in Sub-Saharan Africa by deploying a panel co-integration approach. Their examination explains the interdependence of energy consumption and economic growth in the study region. The results draw a vital conclusion for formulating sustainable development policies in order to achieve the efficient allocation of resources.
However, Xin, Zhou, Yang, Li and Wang [29] proposed a new method, which integrates the Kalman filter, ARIMA, and generalized autoregressive conditional heteroskedasticity (GARCH) to predict a bridge structure deformation. The study reported the discovery of a new way of predicting structural behavior, based on data processing, laying a basis for a bridge health monitoring system based on sensor data using sensing technology. Li, Yang and Li [30]  the next 10 years (2017-2026). The prediction results present an average annual growth rate of 5.26% for the predicted period. In addition to this, the average annual new added installed capacity for 2017-2026 is found to be 74 gigawatts. Kurecic and Kokotovic [31] examined the relevance of political stability on foreign direct investment (FDI) in three different panels-small, developed, and instability threatened economies-by implementing a Granger causality test, a vector autoregressive (VAR) framework and an ARDL model. As a result, the study presents a conclusion that there is a long-term relationship between political stability and FDI in the panel of small economies, while such a relationship is not found in other panels of larger and more developed economies. Meanwhile, Li and Su [32] adopted the VAR model to study the dynamic effect of renewable energy consumption on carbon dioxide emissions in the US, from 1990 to 2015. They found that the use of renewable energy would greatly help to reduce carbon emissions, yet natural gas consumption would have a negative impact on CO 2 emissions in the early stages. This could guide policy makers to develop energy-saving and emission-reduction policies. Consequently, Dai, Niu and Han [33] proposed to adapt the MSFLA-LSSVM model for CO 2 emissions prediction in China from 2018 to 2025. They concluded that China's CO 2 emissions would exhibit slow growth trend for the next few years. With this in mind, China's CO 2 emissions could be effectively controlled in the future, which could start to reduce the greenhouse effect. In another approach.
Last but not least, Jiang, Yang and Li [34] carried out a comparative study of forecasting an energy demand in India by deploying various methods, namely MGM, ARIMA, MGM-ARIMA, and back propagation neural network (BP). Based on their predicted result, India's energy demand will potentially increase by 4.75% from 2017 to 2030.
Based on a review of previous research, many works have presented metrologies, research methodologies, and various analytical results differently. Thus, this research is grounded in unique features which other existing research has not undertaken before in terms of its modeling, validation, spurious check testing, and the efficiency and effectiveness of its modeling regarding decision-making. In addition, a key feature of this study is the possible application of the model to other sectors according to their particular contexts.

Co-Integration Testing and Error Correction Mechanism Model Based on Johansen and Juselius
A co-integration test based on the concept of Johansen and Juselius [35] is developed to serve as the relationship model of at least two variables. If the model comes with large sample properties, the result generated may not be accurate as a reference. In practice, we will find that a regression indicates that a modelling variable is co-integrated. If we perform a regression in the form of an order or reverse order, this shows the variable as non-co-integrated. Hence, the second condition is taken out of interest, as the co-integration test should not vary over the change of the variables [5,6,35]: Equation (1) sets a hypothesis as follows: Equation (2) presents a hypothesis as below: whereλ i is an estimated value of the characteristic roots or eigenvalues derived from estimated matrix π, and T is a number of observations for an estimation of the characteristic roots, retrieved from the equation below: As for the residuals R ot and R pt , they can be derived from a regression of ∆u t and ∆u t−p with ∆u t−1 , . . . , ∆u t−p+1 , where x t and y t are the time series which are stationary at first differences I (1); A is a constant where u i is I (0).
A likelihood ratio test statistic of the null hypothesis is shown below: H o : A rank of π that is less or equal to k, or written as H o : r ≤ k Hence, Equation (4) tells us that α = (n × r) matrix, β = (n × r) matrix, r = a rank of matrix π where a characteristic of matrix α and β is as follows: where matrix β is a parameter matrix of co-integrating vectors, and matrix α is a parameter matrix of speed of adjustment parameters.
In estimating a parameter of the ECM for a co-integrated series, the multi-learning (ML) process that we consider is determined by a sequence, wherein a dimension n can be written as NID (0.Λ).
The process of cointegration testing according to Johansen can be seen as follows: Step 1: the procedure to evaluate the order of integration by testing and evaluating the order of integration of all the variables is done by plotting the data to see whether the data-generating process is a linear time trend or otherwise; the variables must be at the same level.
The lag length can be found through a test in VAR with undifferenced data, and later we can estimate a vector autoregression. The process starts from the longest lag length which is deemed reasonable, and we can check whether we can shorten the lag length or not. For instance, if we want to test a significance of lag 2 to lag 5, we have to estimate the VARs as follows [36]: where y t = n × 1 vector of variables; A 0 = n × 1 matrix of intercept terms; A i = n × n matrix of coefficient; u 1i and u 2i = n × n vector of error terms. In practice, we take an estimation of Equation (6) with a lag equal to 5 for each variable in each equation, and let ∑ 5 be a variance-covariance matrix of the residuals of Equation (6). Later, we estimate Equation (7) with only one lag for all variables in each equation, and let ∑ t be a variance-covariance matrix of residuals of Equation (7).
As for testing, we use a likelihood ratio test statistic as proposed by Sims [37], although the studied variables taken into account are non-stationary variables. The likelihood ratio test can be demonstrated as below: where: T = a number of observations; c = a number of parameters in unrestricted system; ln|∑ 1 | = natural logarithm of determinant of ∑ 1 ; ln|∑ 5 | = natural logarithm of determinant of ∑ 5 . A statistical test has a distribution as X 2 with a degree of freedom equivalent to the number of limited coefficients. However, we have found that A i has n 2 coefficient. In Equation (7), we have a limitation of A 2 = A 3 = A 4 = A 5 = 0, and that means that the limitation is equal to 4n 2 . Nonetheless, Enders (2010) suggested that we can choose a lag length p by using AIC or SBC.
Step 2 estimates the modeling and value of rank of π. In this case, the use of ordinary least square (OLS) is not appropriate for the estimation, because the restrictions must be inserted across the equation in matrix π. Here, we may choose to estimate in three different forms: (a) a form that gives a set of A 0 equivalent to zero, (b) a form with a drift or (c) a constant term in a co-integrating vector as shown below: where the drift term A 0 is given with restrictions to monitor an intercept appearing in the co-integrating vector in the case of the intercept existing in the co-integrating vector. However, we have to analyze the residuals of the model. If the errors are found not to be white noise, this means that the lag lengths are too short. In terms of the residuals' criteria, the first condition lies upon the residuals of a long-run equilibrium, which must be stationary, and the second condition is that the estimation of short-term deviation (that is ε t in Equation (9)) must be white noise. Thereafter, the characteristic roots of matrix π have to be estimated, and we compute the value of λ max and λ trace . However, to justify the hypothesis in which the variables are not co-integrated (rank π = 0), we have two possible statistical tests based on an alternative hypothesis. This is to say that if we want to test the hypothesis saying that the variables are not co-integrated (r = 0) where the alternative hypothesis is a co-integrating vector equivalent to or greater than 1 (r > 0), we need to do a statistical test of λ trace (0) as explained below. In the case of Equation (7), the value of the characteristic roots of matrix π 3 (assume n = 3) is λ 1 , λ 2 , λ 3 as shown in the following: where λ i = an estimated value of the characteristic roots (or known as eigenvalues) derived from matrix π estimated by λ 1 > λ 2 > λ 3 > . . . > λ n , and T = a number of observations we can use and compare with a critical value of λ trace .
Step 3 is a process of the coefficient analysis of co-integrating vectors, which have been normalized, as well as the coefficients of speed of adjustment, as demonstrated below: 1.
When we consider whether β 0 = 0 or otherwise, we must impose one restriction into the co-integrating vector with the use of the likelihood ratio test. This distributes X 2 with a degree of freedom equivalent to 1, and we assume that we cannot reject H 0 where β 0 = 0. Here, we may need to reapply the model where the constants are absent in the co-integrating vector; 2.
In limiting a normalized co-integrating vector at β 2 = −1 and β 3 = 1, we are imposing two restrictions into the co-integrating vector. When the likelihood ratio test is used here, in this case it is distributed as X 2 with degrees of freedom equivalent to 2 due to two restrictions; 3.
In testing whether β = (0, −1, −1, 1), we impose three restrictions including β 0 = 0, β 2 = −1, β 3 = 1 (β 1 is equal to −1). In this case, the statistical test is the likelihood ratio test, which is distributed as X 2 with a degree of freedom of 3. This type of testing is known as a joint restriction.

4.
For a test that is β = (0, −1, −1, 1), then the constraint 3 is for −1. In this case, the test statistic is the likelihood ratio test, which is a line of degrees of freedom equal to 3 tests (the joint restriction test).
Step 4 is a stage called "innovation accounting" (which falls under an analysis of impulse response and variance decompositions) designed as a useful tool to evaluate a relationship. If the relationship among other innovations is very low, it indicates that an identification problem will no longer occur. If the order is set differently, the impulse responses and variance decomposition would become similar. In testing the innovation accounting and casual factors toward an error-correction model, this helps to identify a structural model and answer the question of whether an estimating model is reasonable or otherwise.

Long-and Short-Term Auto-Regressive Integrated Moving Average with Exogenous Variables and the Error Correction Mechanism (LS-ARIMAXi-ECM) Model
The LS-ARIMAXi-ECM model is a newly developed model built upon a concept of the ARIMA model with the following conditions: (1) factors used in modelling are both endogenous variables and exogenous variables, and they must be stationary only at the same level; (2) when the first condition is fulfilled, the above factors have to undergo a co-integration test to investigate the long-term relationship of all factors at the same level only; (3) the next step is to build a forecasting model of LS-ARIMAXi-ECM whose construction is structured based on autoregressive (AR), integrated (I), moving average (MA), ECM (t − i) and exogenous variables (X i ), as explained in the next paragraph.

Autoregressive Moving Average (ARMA) Model
The ARMA (p, q) model is written as [38,39]: where t = 1, 2, . . . , T. If we consider at time T, the ARMA model becomes: or it can be written in another form as: where α(L) = 1 − α 1 L − α 2 L − . . . − α p L p and β(L) = 1 − β 1 L − β 2 L − . . . − β q L q while the information at time T can be replaced by I T = {X 1 . . . , X T , ε 1 . . . , ε T } . Equation (12) produces X T−1 and X T−2 as the equation below: A forecasting of time series 1 and 2 from ARMA(p, q) can be made below: While we can formulate X T+j in a general form as: where: Besides, we can also check ARMA(1, 1) as the above explanation when j → ∞ , and the forecasting can be executed from:X Equation (20) tells us when to forecast further where the forecasting result will approach α 0 1−α 1 −...−α p = E(X t ), and this is the average of time series X t in the ARMA(p, q) model. In addition, the j-step ahead forecast error and its variance can be easily executed when altering the ARMA(p, q) model into MA(∞) as explained below.
Since the time series X t is stationary, it can be rewritten as: when , which is the average. When β 0 α(L) ε T is considered, it shows a relativeness to ε T , and that Thus, Equation (21) with ARMA(p, q) can be formulated into the MA(∞) form as: We call this φ i (i = 1, 2, . . .) the impulse response function of the ARMA model. When the time series X T is stationary, φ 1 , φ 2 , φ 3 , . . . will rapidly decrease exponentially. However, Equation (24) can be used to compute the j-step ahead forecast error and its variance through the following description.

Autoregressive Integrated Moving Average (ARI MA(p, d, q)) Model
The non-stationary variables used in the modeling must be converted into a stationary variable before being deployed into the modeling by differentiating. This is called ARI MA(p, d, q) and can be explained through the equation below [38,39]: where t = 1, 2, . . . , T.
When the value X 1 , X 2 , . . . X T (otherwise denoted as I T ) is known, Equation (39) can be illustrated as follows: Another explanation of ARI MA(1, 1, 0) can be seen as below: where t = 1, 2, . . . , T As for forecasting with ARI MA(p, 1, q), this can be applied as demonstrated below. Assuming the ARI MA(p, 1, q) model is written as below: when the ARIMA model is retrieved, we will apply it to establish a model called the LS-ARIMAXi-ECM Model. This can be seen in the following.

LS-ARIMAXi-ECM Model
The LS-ARIMAXi-ECM model can be written as below: Y t−i = exogeneous variables, which are stationary at a level and ECM t−i = the error correction mechanism test. The LS-ARIMAXi-ECM model is a model that requires the testing of the appropriateness of the time-period through Q-test statistics. Also, it needs to undergo an assessment of its heteroskedasticity, multicollinearity, and autocorrelation. This is to ensure that the model will not be a spurious model. Once we derive the best model, we must test the model performance for both MAPE and RMSE values. Consequently, we can compare the above values of the model with other studied models to monitor the effectiveness of the model for future use.

Measurement of the Forecasting Performance
There are many methods we can choose; we decided to utilize the MAPE and RMSE to compare the forecasting accuracy of each model. The calculation equations are shown as follows [38,39]:

Screening of Influencing Factors for Model Input
In this paper, we bring the causal factors to bear on the stationary status under the Sustainable Development policy of Thailand. The time series data used ranges from 1990 to 2017 along with 8 factors, including carbon dioxide emissions (CO 2 ), per capita GDP (GDP), population growth (Population), urbanization rate (URT), industrial structure (IST), total coal consumption (CCT), oil price (OP), and total exports and imports (X − E).
The test was conducted based on the augmented Dickey and Fuller theory at Level I (0) and the First Difference I (1), as illustrated in Table 1.  Table 1 clarifies which of all the factors are analyzed in the unit root test and found to be non-stationary at Level I (0) or insignificant at 1%, 5% and 10%. Therefore, it requires a first difference analysis. This results in the fact that all the factors are stationary at Level I (1) or significant at 1%, 5% and 10%. Next, we bring the factors for co-integration testing using a concept of Johansen and Juselius in Table 2. Table 2. Co-integration testing using a concept of Johansen and Juselius. The test results of co-integration are shown in Table 2. The test presents a trace test score of 275.41 and 82.45. At the same time, the results of the maximum eigenvalue test are 141.25 and 96.05, which are higher than the MacKinnon critical values at the same significance levels. This signifies a long-term relationship of all variables as well as a feasible use of variables in structuring the LS-ARIMAXi-ECM model.

Formation of Analysis Modeling with the LS-ARIMAX i (p, d, q, X t−1 )-ECM Model
As for the LS-ARIMAX i (p, d, q, X t−i )-ECM model, it is built with the aim of being applicable in different contexts in various sectors. Hence, we seek to test an appropriate time period by using Q-testing. This produces a conclusion in which the right time is a period (t − i) of p, d, q, X t−i , and the best fit is the period (t − i). They are embedded in the LS-ARIMAX i (2, 1, 1, X t−1 )-ECM model as shown in Figure 2.      Table 3.   Figure 2 reflects the fact that the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model becomes the best forecasting model because all values of the Q test statistic at time (t − i) are in the criteria and meet all conditions, or the insignificance falls as follows; α = 0.01, α = 0.05 and α = 0.1. Therefore, this model can be used to forecast CO 2 emissions. However, the authors have discovered the best model currently to be the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model, and this allows us to know about the influence of the changes or elasticity of all independent variables causing changes in the CO 2 emission at time (t − i), as illustrated in Table 3.
The findings have illustrated that when the per capita GDP (∆ln(GDP) t−2 at time (t-2) changes about 1%, it affects CO 2 emissions (∆ ln(CO 2 ) t ) changing in the same direction equivalent to 6.78% at a confidence interval of 99%. While the population growth (∆ln(Population) t−1 changes by about 1%, it influences CO 2 emissions (∆ ln(CO 2 ) t ), changing in the same direction, equivalent to 2.33% at a confidence interval of 95%. When the urbanization rate (∆ln(URT) t−1 ) changes by about 1%, it changes CO 2 emissions (∆ ln(CO 2 ) t ) in the same direction, equivalent to 5.45% at a confidence interval of 99%. When the industrial structure ∆ ln (IST) t−1 changes by about 1%, it affects CO 2 emissions (∆ ln(CO 2 ) t ) changing in the same direction, equivalent to 4.62% at a confidence interval of 99%. When the total coal consumption (∆ ln (CCT) t−2 ) changes by about 1%, it changes CO 2 emissions (∆ ln(CO 2 ) t ) in the same direction, equivalent to 3.15% at a confidence interval of 99%. Also, when the total exports and imports ∆ ln (X − E) t−2 change by about 1%, they influence CO 2 emissions (∆ ln(CO 2 ) t ) changing in the same direction, equivalent to 6.40% at a confidence interval of 99%. With the same effect, when the oil price (∆ ln (OP) t−3 ) changes by about 1%, it affects CO 2 emissions (∆ ln(CO 2 ) t ) changing in the same direction, equivalent to 6.55% at a confidence interval of 99%. In the case of oil prices, although the oil price has climbed, energy consumption is also increasing, which results in increased CO 2 emissions. This is because oil price is not a product based on the law of demand.
In case of ECM t−1 at a coefficient value of −3.87, the adjustment of the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model toward the equilibrium is at a rate of 3.87%.
As far as the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model is concerned, we have compared it in terms of its model efficiency with other old models by deploying MAPE and RMSE. The comparison between the new model with the old ones-multiple regression, GM (1,1), ANN, ARMA, ARIMA, and GM-ARIMA-is undertaken as follows. Table 4 shows that the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model comprises the lowest value of MAPE and RMSE, equivalent to 1.01% and 0.93%, respectively. The GM-ARIMA model shows an MAPE and RMSE at 4.27% and 3.45%, respectively. The ARIMA model shows an MAPE and RMSE of 5.38% and 5.85%, respectively, while the ARMA model has an MAPE and RMSE equivalent to 10.18% and 11.36%, respectively. The ANN model produces MAPE and RMSE of 12.55% and 13.65%, respectively, whereas the GM (1,1) has MAPE and RMSE of 12.94% and 17.39%, respectively. Lastly, the multiple regression model generates an MAPE and RMSE equivalent to 20.05% and 19.49%, respectively. When comparing the studied model's values with other models, it is found that the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model is an efficient model and is suitable for future long-term forecasting.
When the most suitable forecasting model of LS-ARIMAX i (2, 1, 1, X t−1 )-ECM is retrieved, we can then use it to predict and estimate the carbon dioxide emissions in Thailand's construction sector for a duration of 20 years (2019-2038), as shown in Figure 3.
In case of 1 ECM t − at a coefficient value of −3.87, the adjustment of the LS-ARIMAXi ( 2,1,1, ti X − )-ECM model toward the equilibrium is at a rate of 3.87%.
As far as the LS-ARIMAXi ( 2,1,1, ti X − )-ECM model is concerned, we have compared it in terms of its model efficiency with other old models by deploying MAPE and RMSE. The comparison between the new model with the old ones-multiple regression, GM (1,1), ANN, ARMA, ARIMA, and GM-ARIMA-is undertaken as follows. Table 4 shows that the LS-ARIMAXi ( 2,1,1, ti X − )-ECM model comprises the lowest value of MAPE and RMSE, equivalent to 1.01% and 0.93%, respectively. The GM-ARIMA model shows an MAPE and RMSE at 4.27% and 3.45%, respectively. The ARIMA model shows an MAPE and RMSE of 5.38% and 5.85%, respectively, while the ARMA model has an MAPE and RMSE equivalent to 10.18% and 11.36%, respectively. The ANN model produces MAPE and RMSE of 12.55% and 13.65%, respectively, whereas the GM (1,1) has MAPE and RMSE of 12.94% and 17.39%, respectively. Lastly, the multiple regression model generates an MAPE and RMSE equivalent to 20.05% and 19.49%, respectively. When comparing the studied model's values with other models, it is found that the LS-ARIMAXi ( 2,1,1, ti X − )-ECM model is an efficient model and is suitable for future long-term forecasting. − t X -ECM is retrieved, we can then use it to predict and estimate the carbon dioxide emissions in Thailand's construction sector for a duration of 20 years (2019-2038), as shown in Figure 3.   Figure 3 shows that the CO 2 emissions for the next 20 years from 2019 to 2038 in Thailand's construction sector will increase along with a growth rate of 37.88%. In 2019, the CO 2 emissions are projected to be 43.31 (Mt CO 2 Eq) with a continuous increase. By 2038, the CO 2 emissions are forecast to be 59.72 (Mt CO 2 Eq). The above results reflect that the construction sector is a sector with continuous emissions of CO 2 , resulting in a continuous rise in greenhouse gas emissions.

Discussion
The result of this study is the establishment of the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model. This model is built and used to forecast CO 2 emissions in the construction sector in Thailand for 20 years in total (2019-2038). As for this model, only causal factors which are stationary at the same level are selected, and the model is free from being a spurious model. In this study, the model efficiency is evaluated by comparing the model performance with other old models, consisting of the multiple regression, grey model (GM (1,1)), ANN, ARMA model, ARIMA model, and GM-ARIMA model. The evaluation outcome reaffirms that the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model has better efficiency and is more appropriate for long-term prediction than the other existing models. In its prediction, the established model projects that there will be a continuous increase of CO 2 emissions at a growth rate of 43.31 (Mt CO 2 Eq) (2019-2038). This suggests that Thailand has to take serious action in policy planning as well as in following up evaluations in the construction sector. In the meantime, Thailand has to develop other sustainability policies in line with existing policies. This study differs from other previous studies as it builds a new LS-ARIMAXi-ECM model based on the concept of the ARIMA model coupled with co-integration testing. In modelling, the LS-ARIMAXi-ECM model is deployed with advanced statistics. Only the causal yet exogeneous factors are integrated, while an error correction mechanism has been incorporated to clearly determine the magnitude of equilibrium adjustment in both the short and long term. The unique feature of the LS-ARIMAX i (2, 1, 1, X t−i )-ECM model is that it can be applied to other sectors and areas. The model is not a spurious model as it is free from heteroskedasticity, multicollinearity, and autocorrelation. As such, this allows the model to accurately determine a magnitude of change of CO 2 emissions better than other existing models. Hence, it becomes supportive in the decision-making and long-term planning of Thailand in the future.
From the review of the literature, this kind of study has been shown to be relevant to other past research in terms of model applications in CO 2 emission forecasting. Zhao, Zhao, and Guo [9] used GM (1,1) optimized by MFO with a rolling mechanism to forecast the electricity consumption of Inner Mongolia; Chang, Sun, and Gu [13] forecast energy CO 2 emissions using a quantum harmony search algorithm-based DMSFE combination model; Zeng, Xu, Wang, Chen, and Li [14] forecasted the allocative efficiency of carbon emission allowance financial assets in china at the provincial level in 2020; Liang, Niu, Wang, and Chen [15] did an assessment analysis and forecast for the secure early warning of energy consumption carbon emissions in Hebei Province, China; Li, Yang, and Li [30] forecast China's coal power installed capacity using a comparison of MGM, ARIMA, GM-ARIMA, and NMGM Models.
However, the entirety of the literature is distinguished this paper in terms of its modeling process, application capability, appropriateness assessment of the time period, prediction quality and usage. In fact, this research aims to forecast energy-related carbon dioxide emissions in Thailand's construction sector for 20 years (2019-2038), which is constructed based on advanced research methodologies, high-quality statistics and a detailed research process. In the past, many studies have focused on research findings, not the research process. Therefore, some errors and potential risks occurred. Nonetheless, our particular study is seen as better and more efficient than any other previous studies in the field. Also, this study responds to a long-term need to have a model whose capacity is improved for future application in different contexts.
In the selection of software for use in this research, we decided to use the EVIEWS 9.2 software as a research tool to optimize the advanced statistics effectively. As for those who are interested in the software, EVIEWS can be downloaded in a student version at no cost or license fee, or you may choose other software as you see fit.
Regarding the limitations of this study, some factors of the sustainable development policy are not taken into account, including oil prices. This is because the Thai government has a policy to ensure diesel prices, and that is a major factor affecting energy consumption in Thailand. With government interference, the price of diesel fuel does not fluctuate in line with market mechanisms. Due to this phenomenon, this study is not able to include that factor, as it does not determine the real magnitude of the change in diesel prices on CO 2 emissions.

Conclusions
This paper has developed and established the LS-ARIMAX i (2, 1, 1, X t−i ) model for a useful application in forecasting the future trends of CO 2 emissions in the construction sector of Thailand for the next 20 years (2019-2038). This model is able to effectively and efficiently support sustainable development policy planning in Thailand. Most importantly, it can reduce errors in the planning so as to avoid mistakes of the past. In addition, the model is undertaken through careful research methods, with a highly statistical use of data. Additionally, we have chosen 8 variables from the casual factors. The variables are carbon dioxide emissions (CO 2 ), per capita GDP (GDP), population growth (Population), urbanization rate (URT), industrial structure (IST), total coal consumption (CCT), oil price (OP), and total exports and imports (X − E). All of the variables used are assessed by the unit root test, at the first level, and analyzed using the co-integration test, resulting in the LS-ARIMAX i (2, 1, 1, X t−i ) model. Additionally, they are tested for a proper time period (t − i). In fact, the model is found to be free from the issue of heteroskedasticity, multicollinearity, and autocorrelation and therefore, it becomes a most suitable model for forecasting CO 2 emissions in Thailand's construction sector, while it is available for future applications in other sectors and contexts both in Thailand and other countries.
One remaining aspect to reflect upon is that the LS-ARIMAX i (2, 1, 1, X t−i ) model has considered not only the stationary causal factors on the same level, but also proportion and relationship analysis. In addition to this, each factor of the model is analyzed based on rationality and the structure equation model (SEM) so as to increase its utilization and optimization for future research and policy planning.
As a recommendation for applying this research, the model should be adapted for the context of each sector and area. In particular, factors have to be stationary and influential over dependent variables. At the same time, they must have a co-integration at the same level to avoid the model being spurious and to decrease errors. Also, they must undergo an assessment of the appropriateness of their time period in order to produce the most accurate prediction result.
Author Contributions: J.S. and K.K. were involved in the data collection and preprocessing phase, model constructing, empirical research, results analysis and discussion, and manuscript preparation. All authors have approved the submitted manuscript.
Funding: This research received no external funding.