The Forecasting of a Leading Country’s Government Expenditure Using a Recurrent Neural Network with a Gated Recurrent Unit

: Economic forecasting is crucial in determining a country’s economic growth or decline. Productivity and the labor force must be increased to achieve economic growth, which leads to the growth of gross domestic product (GDP) and income. Machine learning has been used to provide accurate economic forecasts, which are essential to sound economic policy. This study formulated a gated recurrent unit (GRU) neural network model to predict government expenditure, an essential component of gross domestic product. The GRU model was evaluated against autoregressive integrated moving average, support vector regression, exponential smoothing, extreme gradient boosting, convolutional neural network, and long short-term memory models using World Bank data regarding government expenditure from 1990 to 2020. The mean absolute error, root mean square error, and mean absolute percentage error were used as performance metrics. The GRU model demonstrates superior performance compared to all other models in terms of MAE, RMSE, and MAPE (with an average MAPE of 2.774%) when forecasting government spending using data from the world’s 15 largest economies from 1990 to 2020. The results indicate that the GRU can be used to provide accurate economic forecasts.


Introduction
Government budgets for education, health care, defense, and welfare are large [1], and they cannot be accurately predicted using traditional econometric models because of the complex nonlinear relationships existing between variables [2]. Researchers have recently applied artificial intelligence to help governments be er allocate resources, deliver services, and coordinate large-scale operations [3]. A country's economic conditions and priorities are reflected in various factors that affect government spending, such as population size, which significantly impacts government spending [4]. A large population often requires increased investment in public services, healthcare, education, and social security programs. For example, in 2020 (World Development Indicators, 2021), countries with larger populations allocated a substantial portion of their budgets to social welfare, with spending on social welfare accounting for 30% to 45% of the total budget. The demand for social services varies across country, encompassing areas such as poverty alleviation, healthcare, education, and unemployment benefits, thereby influencing government spending decisions. Gross domestic product (GDP) indicates a country's economic performance and total output. Government spending and GDP are closely related. During periods of GDP growth, governments often allocate more resources to infrastructure development and public services in order to support further economic growth. Conversely, during economic downturns or recessions, governments may implement austerity measures and reduce spending to manage budget deficits [5]. Government spending is also influenced by other macroeconomic variables, such as inflation, unemployment rates, and interest rates. Rising inflation rates can lead to increased spending on subsidies and welfare programs to mitigate the impact of price increases on citizens' purchasing power [6]. Through measures such as job creation programs or unemployment benefits, unemployment rates can affect government spending. In addition, interest rates play a crucial role in determining the cost of government borrowing, which affects spending decisions. By taking into account these factors related to government spending, policymakers can make informed decisions about budget allocation, revenue generation, and overall economic management with the economy to promote sustainable development and meet societal needs [7].
According to the Organisation for Economic Co-operation and Development (OECD), government investments in education and health care contribute to long-term economic growth; specifically, every 1% increase in education spending was reported to be correlated with a 0.5% increase in GDP per capita over the long term (source: OECD. Education at a Glance 2020: OECD Indicators). However, the relationship between government spending and GDP is complex; it depends on several factors, such as the state of the economy, the type and composition of government spending, and the effectiveness of government policies [8,9]. Therefore, accurate forecasts of government spending are crucial for improving resource allocation and economic planning. Accurate forecasts help governments budget more effectively [1] and with less risk [2]. Jeong et al. demonstrated that neural networks can be used to improve traditional time series models, such as seasonal autoregressive moving average (ARIMA) models, when applied to budget-related prediction [10]. Neural networks are a powerful means of capturing complex, nonlinear relationships between variables [11]. However, neural networks can accurately predict government spending only if a suitable set of factors that affect spending are selected; however, this is difficult to achieve. For example, Palmer et al. found that neural networks accurately predict economic tourism demand only if the appropriate input variables and network architecture are selected [12]. Despite these challenges, neural networks are promising as a means to predict government spending, even with complex time series data.
Machine learning models have produced accurate forecasts on the basis of time series data. For example, Lago et al. demonstrated that a gated recurrent unit (GRU) model (a deep learning model) outperformed its traditional counterparts in accurately predicting spot electricity prices. The GRU model had the highest prediction accuracy with a symmetric mean absolute percentage error (symmetric MAPE) of 13.04%, outperforming a deep neural network (the second best model with a symmetric MAPE of 12.34%), a long short-term memory (LSTM) model, a convolutional neural network (CNN) model, and even several mixed models [13]. Another study formulated a hybrid recurrent neural network (RNN) model based on GRU and LSTM that predicts daily and hourly multilevel wind power in Germany from data on wind speed at different heights [14]. The proposed model outperformed ARIMA and support vector regression (SVR) models in terms of gradient stability, training speed, mean absolute error (MAE), root mean square error (RMSE), and t-test results on long sequences. Li et al. proposed a novel ensemble decision method based on deep reinforcement learning to predict GDP. In this method, predictions from GRU, temporal convolutional network, and deep belief network models were taken as input to train three GDP prediction models, and the deep Q network algorithm was used to optimize the integration weight coefficients. This proposed method outperformed 18 competing methods in evaluation experiments, achieving MAPE values below 4.2% in all tests [15].
Government spending in the economy takes many forms. The composition of GDP among different types of spending is often of interest to economists. Government spending is one of four components of GDP, including consumption, investment, government purchases, and net exports. According to the International Monetary Fund (h ps://www.imf.org/en/Home), a 1% increase in government spending as a share of GDP is correlated with a 0.4 to 1.2 percentage point increase in GDP growth, depending on how government spending is allocated. Conversely, decreased government spending can exert contractionary effects and even lead to recession. Government spending is also related to other macroeconomic variables. For example, infrastructure spending boosts investment and productivity, and education and healthcare spending enhances the productivity and health of the workforce in particular and the population in general; these factors promote long-term economic growth.
In this study, we reviewed existing forecasting methods and developed a GRU model that predicts government spending on the basis of financial GDP indicators. We used a historical dataset to improve model generalizability. Our model outperformed six competing methods in terms of accuracy and reliability in an extensive ba ery of experiments involving time series datasets on the GDPs of 15 of the largest economies in the world. Our contributions are as follows.
1. Our GRU model accurately predicts government spending on the basis of financial GDP indicators. 2. Our GRU model outperformed ARIMA, exponential smoothing (ETS), extreme gradient boosting (XGBoost), SVR, CNN, and LSTM models in evaluation experiments. 3. The strengths and weaknesses of neural network models in predicting government expenditure were explored. 4. The aforementioned methods can capture complex nonlinear relationships between different economic factors to generate accurate predictions.
This paper is organized as follows. Section 2 reviews existing prediction approaches, namely, ARIMA, ETS, support vector machine (SVM), XGBoost, CNN, LSTM, and GRU, and presents our proposed approach. Section 3 describes the setup and results of our evaluation experiments. Finally, Section 4 concludes the paper.

Methods
Statistical models, especially ARIMA and exponential smoothing, are widely used in time series forecasting because they can capture pa erns and dependencies in time series data [16]. Machine learning models are an improvement on such statistical methods and are becoming increasingly popular. These models are summarized as follows.

Autoregressive Integrated Moving Average
An ARIMA model describes the temporal evolution of the dependence between observations using the time lag between observations. It can be used to model stationary time series data wherein the mean and variance of the data remain constant over time. An ARIMA model combines autoregressive, differenced, and moving average components to model time series data [17] and has three main components: the number of lagged observations (p), the difference in nonseasonal observations (d), and the size of the moving average window (q). These three components must be applied in the order expressed in the tuple (p, d, q). In our experiment, the Python library pmdarima was used to implement ARIMA modeling. Subsequently, the auto.arima function in this package was used to create a process to automatically select the best parameters for the ARIMA model on the basis of the data. This helped the model quickly generate predictions.

Exponential Smoothing
ETS is used to model pa erns and dependencies over time, primarily in nonstationary time series data. ETS models output predictions by assigning exponentially decaying weights to past observations [18]. The Holt linear exponential smoothing method is commonly used, and predictions can be obtained using the following equations: where α and γ are smoothing constants ranging from 0 to 1. These standard equations are used to obtain the exponential trend detailed in [19]. In ETS, these techniques are used to simulate fundamental trends and pa erns, primarily for nonstationary time series data. ETS models are popular because they are easy to implement and can rapidly provide accurate predictions [20,21].

Support Vector Regression
Introduced by Vapnik et al., SVM is a machine learning algorithm that can be used for classification and regression problems. SVR is a variant of SVM and is used to solve regression problems to make predictions on continuous target variables. SVR can be used when the relationship between the independent and dependent variables is nonlinear [22]. In regression, an SVR model identifies the best-fi ing hyperplane with the maximum number of points. For a hyperplane Y = wx + b, the decision boundaries are wx + b = +a and wx + b = −a. Therefore, any hyperplane that satisfies SVR must lie in the bounds −a < (Y − wx + b) < +a. Thus, the goal is to identify a function that satisfies the decision boundary. SVR has been used to forecast, for example, the demand for wind and solar energy [23], the exchange rate of the Euro [24], the state of the climate [25], and the volume of airport freight [26]. In previous studies, SVR or hybrid methods with SVR have performed the best in evaluation experiments.

Extreme Gradient Boosting
A versatile and increasingly popular method designed to balance between performance and computational cost, XGBoost is an optimized version of the gradient boosting algorithm, and uses ensemble learning with multiple decision trees for prediction [27]. XGBoost has been applied in various fields, such as health care [28], computer vision [29], and missing data imputation [30].

Convolution Neural Network
CNNs consist of multiple layers and can be trained using a back-propagation algorithm. CNNs contain three layers: convolution, pool, and fully connected. The convolutional layer is responsible for learning the input's feature representation. It utilizes multiple convolutional kernels to compute different feature maps. In particular, the neurons in each feature map are connected to a set of neighboring neurons in the previous layer, and this set of neighboring neurons is called the receptive field of the neuron in the previous layer. To obtain a new feature map, the input is convolved with the trained convolution kernel. The resulting convolution is then subjected to an element-by-element nonlinear activation function. The activation function introduces nonlinear properties, which are advantageous for multilayer networks to detect nonlinear features. Commonly used activation functions include Sigmoid, tanh [31], and ReLU [Ref] the pooling layer is typically described as being located between two convolutional layers, and is responsible for achieving translation invariance by reducing the resolution of the feature maps. Each fea-ture map in the pooling layer is connected to the corresponding feature map in the previous layer. In the initial layer, the convolution kernel is designed to detect low-level features, such as edges and curves. Higher-level convolution kernels learn to encode more abstract features. By progressively stacking multiple convolutional and pooling layers, we can gradually extract higher-level feature representations. One or more fully connected layers designed for higher-level inference [30] may follow the multiple convolutional and pooling layers. The fully connected layer establishes connections between all neurons in the preceding layer and each neuron in the current layer, allowing for the generation of global semantic information.

Long Short-Term Memory
LSTM is a type of RNN specifically designed to overcome the vanishing gradient problem in traditional RNNs. It was introduced by Hochreiter and Schmidhuber in 1997 and has since been widely applied in various fields, including time series forecasting [32]. In LSTM, time series data are formulated as a sequence, wherein each observation corresponds to a time step. LSTM neural networks are then trained using previous observations to predict the next value in the sequence. Long-term dependencies in the data can be captured when the model is configured to use multiple previous time steps for each prediction. In recent studies, LSTM has had promising results in time series prediction. For example, LSTMs have produced more accurate forecasts of electricity load [33] and stock market trends [34] relative to competing methods. LSTM has also been used to predict wind power output [35], solar radiation [36], and agricultural production [37] from time series data. The LSTM architecture consists of a collection of recurrently connected memory blocks, which are subnetworks that maintain their state over time and regulate the flow of information through nonlinear gates. The structure of a single LSTM cell is shown in Figure 1A. These gates govern the interactions between different cells and regulate the flow of information. The input gate controls the process of updating the memory state. The output gate determines whether the output flow can influence the memory state of other cells. The forge ing gate determines whether the prior states should be remembered or forgo en. The LSTM is implemented using the following composite functions: where σ represents the logistic sigmoid function, i, f, o, and c represent the input gate, forget gate, output gate, and cell input activation vectors, respectively. h represents the hidden vector. The subscripts of the weight matrix have intuitive meanings. For example, Whi represents the hidden input gate matrix, and so on.

Gated Recurrent Unit
GRUs models are efficient, have fast training speed, and can address vanishing gradient problems in traditional RNNs, making them suited to applications involving largescale time series prediction [38]. GRU methods have performed well in various applications, such as in the prediction of gas concentrations in mines [39] and the prediction of COVID-19 mortality rates [40]. Cho et al. (2014) proposed an architecture wherein GRU gate mechanisms, including a reset gate rt and update gate Zt, are used to efficiently handle vanishing gradient problems ( Figure 1A) [33]. The reset gate is essential for the detection of short-term dependencies within sequences, allowing a network to selectively forget information from the previous time step and focus on the information that is most relevant to the current task. The update gate is responsible for capturing long-term dependencies in sequential data and determining how much information from the previous time step should be passed to the current time step; these determinations allow the network to adapt to changes in the input data and learn complex pa erns over a long period.
During training, a GRU model learns the optimal values of the gates and other parameters by using the input data Xt and the expected output. A trained GRU model then outputs predictions based on input sequences by using the learned parameters. The learning process is encapsulated in Equations (9)- (12), which describe the units that capture the respective dependencies within the GRU network [41]: In these equations, two fully connected layers with a sigmoid activation function σ produce the output of two gates. , , , and denote the weight parameters of the reset gate and the update gate. The ⊗ operator refers to element-wise multiplica-tion, and ⊕ refers to addition. When is integrated and updated, the cell uses the hyperbolic tangent activation function tanh at time step t to reach the candidate state defined in Equation (7). Subsequently, for the final memory update to be determined, Equation (6) is integrated with , and the hidden state ℎ that matches the hidden state at time step t -1 is identified. Finally, if the new candidate state of the hidden layer conforms to the description given in Equation (7), the final memory state ℎ is reached. Figure 2 summarizes the flow of the experiment in which models were evaluated in terms of their performance in predicting government expenditure. Our proposed GRU model was designed to have a reset gate rt and update gate Zt (Figures 1A,B and 3; Algorithm 1) to capture dependencies between different periods. Specifically, the data were obtained and subjected to preprocessing (e.g., missing value evaluation and feature normalization). These data were then divided into training and testing data. Subsequently, the performances of trained models using various methods were evaluated in terms of MAE, RMSE, and mean absolute percentage error (MAPE). The results are visualized graphically. The framework detail of GRU algorithm is represented in Figure 3.

Algorithm 1 GRU for Government Expenditure Forecasting
Let X be the dataset. Desired Output: Prediction of best model ( ) 1. Load time-series dataset, reconstruct, standardize, and preprocess the data 2. Split the data into training and testing sets, defined as and , respectively 3. Define GRU model structure and parameters 4. Update gate is obtained using (4) 5. Reset gate is obtained using (5) 6. Candidate memory state ℎ is obtained using (7) 7. Weighted variables, are initialized 8. Final memory state ℎ is computed using (6) 9. Calculate the loss function according to argmin ( ) = ∑ loss ( , y ) , where loss ( , y ) = ∑ | − | 10. Use to train the model 11. Evaluate model ( ) predictions using 12. Select the best model using performance metrics MAE, RMSE, and MAPE 13. Visualize and analyze the forecasting of the best model ( )

Evaluation Criteria
MAE, MAPE, and RMSE are widely used metrics of rolling forecasting performance. They describe the deviation between actual and predicted values, and are defined as follows: where is the actual value, is the predicted value, and N is the sample size. The MAE is defined as the mean of the absolute differences between the actual and predicted values, and ranges from 0 to positive infinity. The MAPE is a relative indicator that is invariant to the units or magnitude of the actual and predicted values. Finally, the RMSE is the square root of the quotient that is the average deviation between actual and predicted values divided by the number of observations. The RMSE is sensitive to minor errors in model predictions, and is thus an effective metric of accuracy. For the three metrics, lower values (i.e., values closer to 0) indicate a lower error and higher prediction accuracy. These metrics are designed to comprehensively assess our forecasts' accuracy, precision, and reliability. Since MAE calculates the absolute difference between predicted and actual values, it provides an unbiased average accuracy metric. This provides a visual interpretation of the error and insight into the degree of deviation from the true value. On the other hand, RMSE emphasizes the effect of larger errors by considering squared differences. It provides a measure of the dispersion and magnitude of the error, and can assess the accuracy and reliability of the prediction. Finally, MAPE reflects the relative error between the predicted and actual values, providing insight into the proportional accuracy of the forecast. Each metric captures a different aspect of prediction performance, enabling a comprehensive understanding of the model's strengths and weaknesses. These metrics form a robust framework for assessing the forecasting power of the model and the applicability of GDPbased government spending forecasts. Using these performance metrics allows for a broad assessment of forecasting methods. The choice of these metrics depends on the specific situation and the nature of the forecasting task. They provide information on different aspects that help to understand the model's accuracy, precision, and reliability. By considering these metrics together, a comprehensive assessment of model performance can be obtained, and the strengths and weaknesses of the model can be identified.

Data Source
To ensure a comprehensive analysis, we will focus on data from 1990 to 2020 sourced from the World Bank (h ps://data.worldbank.org/). This time horizon captures a period of significant economic fluctuations, including major events such as financial crises, recessions, and economic growth. By considering a broad time horizon, we aim to understand government spending behavior over time comprehensively. In selecting countries, we base our selection on their respective GDP rankings. Specifically, we selected the 15 countries with the largest GDP to ensure representation of the major global economies. These countries are the United States, China, Japan, Germany, the United Kingdom, India, France, Italy, Canada, South Korea, Russia, Brazil, Australia, Spain, and Mexico. By including these countries, we aim to capture the various economic systems, levels of development, and geopolitical factors that may influence government spending pa erns. The dataset was downloaded as a .csv file from the World Bank website and then transformed into a data frame in Python. The dataset obtained from the World Bank was of high quality, and did not contain any missing values. The descriptive statistics (specifically, mean, median, standard deviation, sample size, first quartile [Q1], third quartile [Q3], and interquartile range [IQR]) for each of these 15 countries are presented in Table 1. We standardized the data using the integrated Python function min_max_scaler; this function rescales a data point x to between 0 and 1 using the following equation: (x − xmin)/(xmax − xmin), where x is the value of the data point, xmin is the minimum value, and xmax is the maximum value.

GRU Architecture Results and Sensitivity Analysis
We employed three optimizer tools, namely Adam, RMSprop, and Adagrad, to detect the optimal architecture for each neural network. The CNN model contains a 1D convolutional layer (CONV1D) with 200 filters and kernel size 1, followed by a MaxPooling layer. The LSTM model has two layers of 200 units each, followed by a one-cell dense layer. The GRU model has two layers of 200 units each, followed by a separate dense layer of 1 unit. The Adam optimizer and the mean absolute error loss function were used to create these neural network models. To capture the historical dependencies, the training data were prepared using a backward window with 1 and 1000 periods for each model. Table 2 presents a comparison of the parameter se ings and the individual MAPE (%) results as a measure of precision. The adjusted period values, the number of neural network layers, and the activation function are shown. For the RMSprop optimizer, we increased the number of parameters while decreasing the number of periods. The overall MAPE value is slightly higher than the value in Table 2. For the Adagrad optimizer, we reduced the parameter and period values. It is important to note that the period values should not be excessively high, as they can lead to overfi ing. The LSTM shows slightly be er values with the parameter se ings of the Adagrad optimizer. The CNN model shows a MAPE value of 86.798%, indicating a much worse performance. After a training procedure based on the optimizer, Adam optimizer produced an optimal GRU architecture, and the architectural detail is shown in Table 2. We performed a sensitivity analysis, including experiments with parameter se ings for different scenarios. The number of units in the GRU layer increased and decreased slightly. In addition, we adjusted the values of epochs to different lower values to observe the model's performance over time, since too many epochs can lead to overfi ing. Table 3 presents the results of our sensitivity analysis. From Table 3, we can observe that lower epoch values, in combination with a higher number of GRU layers, tend to produce be er prediction results for all three evaluated metrics. Notably, the differences between the accuracy results from the sensitivity analysis are not substantial. This indicates that our GRU model compiles correctly and performs consistently under different assumptions and scenarios. The chosen optimizer is appropriate for our dataset, and can produce sufficiently accurate results, as shown in Table 3. The sensitivity analysis strengthens the robustness of our model and provides valuable insight into the effects of varying the parameters used. It allows us to understand the optimal parameter se ings for our particular task of forecasting government spending. The consistent performance of our model across different parameter configurations gives us confidence in its accuracy and reliability.

Experimental System
Python was used for data mining. Our GRU model was evaluated against ARIMA, ETS, SVR, XGBOOST, CNN, and LSTM models. The LSTM and GRU models were optimized using Adam [42]. Different learning rates α ranging from 0.1 to 0.0001 were tested, and the corresponding validation loss was observed. Finally, a learning rate of 0.01 was chosen, and β1 and β2 were set to their default values of 0.9 and 0.999, respectively. The training and testing sets comprised data from 1990-2014 and 2015-2020, respectively. Figure 4 and Table 4 present the MAE, RMSE, and MAPE results of the seven models for all and each of the 15 countries, respectively.

Comparison and Discussion
Artificial intelligence encompasses a range of techniques that aim to mimic living things' behavior to improve decision-making and mitigate potential risks to economic stability and growth. One such technique is GRU, which focuses on automatically identifying meaningful pa erns in data. GRU has become a valuable tool for extracting information from large datasets [43]. Departure from traditional computing methods is a common feature of GRU applications. Due to the complexity of the pa erns identified, it is impractical for a human programmer to provide explicit and detailed instructions for performing these tasks. In addition, the GRU techniques can learn from a wide variety of data types and recognize pa erns that may be impossible for a human to detect. This enables GRU to operate in resource-constrained environments [44]. As a result, GRU could play a critical role in public administration, and provide insights that can inform decision-makers.
The GRU model was evaluated against ARIMA, ETS, SVR, XGBOOST, CNN, LSTM, BiRNN, and ASRNN models. The LSTM, BiRNN, ASRNN, and GRU models were optimized using Adam [42]. ARIMA had much lower performance compared with the other models. To improve ARIMA performance, the lag order (p), moving average order (q), and differencing degree (d) can be manually rather than automatically configured [45], and other diagnostic measures can be applied [46,47]. The remaining models produced reasonably accurate predictions. We analyzed different RNN variants, including the bidirectional RNN (BiRNN) [48] and a ention-based [49] sequential RNN (ASRNN) models. For the BiRNN model, we performed dataset preprocessing and used a BiRNN architecture along with a dropout layer to prevent overfi ing. Similarly, for the ASRNN model, the dataset underwent the same preprocessing step, and the architecture employed an a ention mechanism to capture the relevant information from the sequential data. Different learning rates α ranging from 0.1 to 0.0001 were tested, and the corresponding validation loss was observed. Finally, a learning rate of 0.01 was chosen, and β1 and β2 were set to their default values of 0.9 and 0.999, respectively. The training and testing sets comprised data from 1990-2014 and 2015-2020, respectively. Figure 4 and Table 4 present the MAE, RMSE, and MAPE results of the seven models for all and each of the 15 countries, respectively.
Our GRU model performed the best when used to forecast government expenditure (Table 4). The average MAE, RMSE, and MAPE values for country-level predictions were 0.5284, 0.7469, and 2.774%, respectively. ETS, XGBoost, CNN, BiRNN, and ASRNN also produced satisfactory country-level predictions. The country-level predictions of the seven models and the actual values are plo ed in Figure 5. The actual and predicted trends for all models except the ARIMA model had a close fit upon visual inspection. Accurate predictions for China were difficult to achieve, although the predictions of ETS, CNN, BiRNN, ASRNN, and GRU were reasonably close. In general, GRU captured outliers and changes in trends well, unlike ARIMA. Some methods, such as LSTM and SVR, follow an average trend starting from a particular value. Data spanning a longer period are required to more fully analyze the capabilities of these methods, especially GRU.
Since GRU uses only two gates instead of three, and one memory cell, it is considered an efficient implementation of LSTM. Despite the simplicity of GRU, it still achieves be er results than LSTM [43]. LSTM was introduced to address the problems of vanishing and exploding gradients in RNNs. LSTM uses memory cells and gates to facilitate the flow of information over time. Subsequently, GRU was introduced as a more efficient form of LSTM, which combines two LSTM gates without a memory cell and reveals the full hidden content without any control [41]. Compared to LSTM, GRU, therefore, exhibits higher computational efficiency. For the advantage of input information, bidirectional RNNs were introduced. BiRNN uses information from both directions to predict output sequences by cascading and connecting backward and forward hidden states to each output node. The GRU addresses the problems of gradient explosion and the long-term dependency of RNNs, and requires fewer training parameters compared to the LSTM, which is also a variant of the RNN. Both the GRU and the LSTM have their advantages and disadvantages in practical applications [41]. Determining effective hyperparameters, such as the number of neuron units and the learning rate, remains a challenging problem, despite GRU's good performance in time series prediction and its wide applicability. Optimizers are powerful tools for the optimization of hyperparameters. In this study, the hyperparameters of the GRU were optimized using Adam, RMSprop, and RMSprop. Additionally, a sensitivity analysis improved the robustness of our model and provided valuable insight into how changing the parameters affected the model. As a result, we were able to understand the optimal parameter se ings for the prediction of government expenditure, the specific task at hand. We are confident in the accuracy and reliability of our model because of its consistent performance across different parameter configurations.
The choice of input variables is critical to the accuracy of expenditure forecasting. Traditionally, a priori expert knowledge, trial and error, or linear cross-correlation analysis are used to select these variables. However, expert knowledge is often challenging and time-consuming to acquire, and relying solely on expert knowledge can introduce bias. Trial and error input selection can be computationally intensive, especially for data-driven models with many potential input candidates. In addition, the linear correlation coefficients commonly used only assess linear relationships and do not capture the non-linear dynamics often present in data-driven models. Artificial intelligence encompasses a range of techniques that aim to mimic living things' behavior to improve decision-making and mitigate potential risks to economic stability and growth. One such technique is GRU, which focuses on automatically identifying meaningful pa erns in data. GRU has become a valuable tool for extracting information from large datasets [43]. Departure from traditional computing methods is a common feature of GRU applications. Due to the complexity of the pa erns identified, it is impractical for a human programmer to provide explicit and detailed instructions for performing these tasks. In addition, GRU techniques can learn from a wide variety of data types and recognize pa erns that may be impossible for a human to detect. This enables GRU to operate in resource-constrained environments [45]. As a result, GRU could play a critical role in public administration and provide insights that can inform decision-makers.
In general, GRU outperformed ARIMA, SVR, ETS, XGBoost, CNN, LSTM, BiRNN, and ASRNN in predicting government expenditure on the basis of data on prior government spending, household consumption, investment, imports, and exports, which are correlated with GDP. Furthermore, unlike ARIMA, GRU does not require refi ing to produce accurate predictions. ETS, CNN, LSTM, BiRNN, and ASRNN also provided reasonably accurate predictions.

Conclusions
Algorithms such as GRU are becoming increasingly capable of economic prediction as machine learning becomes more sophisticated and datasets become larger. In our study, our proposed GRU method outperformed ARIMA, ETS, SVR, XGBOOST, CNN, and LSTM in terms of MAE, RMSE, and MAPE (average MAPE = 2.774%) when used to predict government spending on the basis of 1990-2020 data from 15 of the largest economies in the world, namely Australia, Brazil, Canada, China, France, Germany, India, Italy, Japan, South Korea, Mexico, Russia, Spain, the United Kingdom, and the United States. Further studies should incorporate predictors pertaining to political events, social changes, and the environment, and apply the resultant models to emerging markets. In general, accurate forecasts of economic trends are indispensable for ensuring sustainable economic growth.  Data Availability Statement: Data were obtained from World Bank (h ps://data.worldbank.org/).

Conflicts of Interest:
The authors declare no conflict of interest.