Next Article in Journal
How Competing Retailers Invest in ESG: Strategic Behavior Under Heterogeneous Consumer Preferences
Previous Article in Journal
Experience and Word-of-Mouth—Breaking the Servitization Paradox from the Perspective of Matching Hidden Demands
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Supply Chain Model for Wheat Market

1
Central Economic Mathematical Institute, Russian Academy of Sciences, Moscow 117418, Russia
2
Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow 119333, Russia
3
School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
4
MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation, University of Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Systems 2025, 13(11), 1026; https://doi.org/10.3390/systems13111026
Submission received: 11 September 2025 / Revised: 6 November 2025 / Accepted: 14 November 2025 / Published: 17 November 2025
(This article belongs to the Section Supply Chain Management)

Abstract

Accurate modeling of wheat supply chains is of great importance. The methods for forecasting them can be utilized as strategic planning tools to manage sustainable and balanced supply chains, ensuring a high level of food security, economic growth, and social development. In this paper, we focus on wheat international trade indicators, and a regression model is a crucial component for the chain modeling. Trade indicators in the wheat market are inherently complex and exhibit significant stochasticity and non-stationarity due to the intricate interplay of various trade flows and factors, which pose challenges for accurate market forecasting. We proposed a novel hybrid recurrent and graph-transformer-based model to tackle these challenges. We collected and combined data from international providers such as UN FAOSTAT and UN Comtrade for all the world’s wheat exporters. The experiments show that the proposed model can accurately predict wheat export levels. We have also analyzed how the proposed model can be utilized to predict exports in the case of some pre-defined trade limitations. In the future, the proposed model could be naturally extended to various derivative products of wheat, supporting real-world grain chain models. Our forecasting methods could be used to create an analytical tool to support strategic decision-making in cognitive situation centers, taking into account the national interests and priorities of actors in the international wheat market.

1. Introduction

The global food market is a complex network of interacting factors that shape the balance of supply and demand through the integration of global supply chains. Food trade on the world market performs not only a commercial purpose but also solves the most important task of ensuring global food security. Wheat occupies a special place among the goods on the world market; it is one of the most traded commodities. In recent years, there has been a surge in research on the global trade system as a network. In the case of the food market, the reliance of diets on a small number of crops threatens global food security if the production or trade of one or more of these crops is curtailed for any reason. Wheat, maize, and rice comprise approximately half of all human diets and represent 86% of all cereal exports [1]. Global production of these staple crops is highly concentrated in regions known as breadbaskets. At least 72% of the global output of each of the four staple crops occurs in five countries [2]. Given the oversized role breadbaskets play in food production, understanding their stability under anticipated warming is critical for global food security. Applied to the study of food systems, network analysis indicates that crop trade networks, the volume of trade, and the number of links between countries have increased during the last decades [3,4,5]. Interdependence between countries leads to a high degree of complexity in international trade [6]. It not only increases the possibility for countries to be influenced by the global market [7] but also promotes the diversification of global supply risks [8].
At the same time, food supply chains should be considered within a larger scope as food systems, involving not only the different (possibly international) actors from farmer to consumer but also including all processes and infrastructure involved in ensuring a population is fed, such as providing inputs, growing, harvesting, processing, transporting, selling, consuming, and producing waste, within environmental, social, economic, and political contexts, and dependent on human labor. Moreover, the spatial boundaries of these systems are not always clear, and they are under stress from an increasing scarcity of inputs and available land, soil degradation, and climate change [9,10,11]. The primary results of previous research [12] point to a highly concentrated cereal network; therefore, the dependency on a small number of exporters and a low diversification of imports lead to a global cereal system vulnerable to shocks, such as weather extremes, economic shocks, pandemics, and political uproar.
Wheat supply chains are large-scale complex adaptive systems containing multiple countries/regions, multinational corporations, investors, and intermediaries [13]. The methods used to model and forecast these chains can be used as strategic planning tools to manage sustainable and balanced agri-food supply chains to ensure a high level of food security, economic growth, and social development. To model and simulate such large chains, one requires a framework that integrates a large-scale complex adaptive system containing multiple countries/regions, multinational corporations, investors, and intermediaries. To model and simulate such large chains, we require a framework that integrates advanced multi-agent modeling tools and accurate trade-forecasting models.
Agri-food international trade indicators are inherently complex and exhibit significant stochasticity and non-stationarity due to the intricate interplay of various trade flows, factors that pose challenges for accurate market forecasting. In addition to this problem, trade wars and sanctions increase the relevance of trade modeling, provided certain pre-defined limitations are met, similar to a multi-agent setting [14]. The sustainability of food systems requires the diversification of both production centers and trade routes, including the development of re-export hubs in Asia and Africa. However, modern challenges are exacerbating imbalances in global supply chains, which can lead to food shortages and other problems.
Considering these complexities, our primary focus is on developing a solid theoretical framework for building forecasting models. Namely, we investigated various autoregressive forecasting techniques on fixed and limited global trade data to develop a reliable wheat export forecasting model. Other factors, such as geopolitical risks, macroeconomic shocks, or trade policy interventions, were not specifically considered in the study, as they were deemed to be outside the scope of the research; therefore, they can be tested in the future. The main contributions of our paper are as follows (Experimental code and data are available at https://github.com/masterdoors/agro_graph_transformer (accessed on 5 November 2025):
  • We propose a hybrid model that combines a graph transformer and recurrent network architectures to tackle the problems above. This model utilizes the transformer to catch the interdependence between wheat export quantities in different countries. As a recurrent network, the proposed model also generates hidden embeddings for each country and export direction and utilizes these embeddings from the previous step to forecast exports. This way, the model summarizes the trading history via the hidden embeddings and uses them to perform accurate export predictions.
  • We show how the proposed model can be applied to implement if–then scenarios in a multi-agent-like setting.
This paper is structured as follows: Section 2 discusses the current state of the art in international food trade forecasting. Section 3 describes the dataset we utilized to perform the experiments. Section 4 contains the proposed hybrid wheat export forecasting model. Section 5 provides the experimental results. Finally, Section 6 and Section 7 contain the discussion and conclusions.

2. Related Work

Much of the work in the forecasting of food production and trade flows uses linear regression and autoregressive models such as seasonal ARIMA, which employ a limited set of indicators, including basic climatic features (temperature, precipitation) and trade indicators. This approach was used, for example, in [15]. Often, in addition to climatic and trade indicators, Earth remote sensing data is also used, which helps forecast production accurately. For example, in previous research [16], data from the Sentinel-2 database [17] showed that remote sensing-based indicators can improve the accuracy of forecasts compared to a basic model that considers only the planting area. One of the most established methods for ensuring trade network topology in market forecasting is the VAR (Vector Autoregression) model or its variants [18]. In VAR models, each output variable depends on other variables, which reflect the modeled trade indicators of market players. In VAR models, all the variables are treated as interdependent, which is the primary limitation of the model. The original VAR model ensures a stationary time series, though there are some ad hoc modifications mostly based on pre-defined linear or polynomial trend terms [19], or even recurrent neural networks (LSTM) [20]. Another option here is the VECM (vector error correction model) if the variables are cointegrated. For example, in a previous paper [21], such a model is used to forecast price transmission.
In our opinion, the primary drawbacks of the VAR models are that they ensure only linear dependencies between variables. Moreover, as the trade network and time lag grow (which is very relevant for worldwide trade modeling), the model becomes very large and tends to overfit. SC-VAR [22] tackles that problem by introducing sparsity to the model parameters, though the explicit introduction of sparsity constraints might lead to a decrease in modeling performance. It is worth noting that our Recurrent Graph-Transformer model tackles the problem out of the box with an attention mechanism, which naturally makes the dependencies between variables sparser.
PanelVAR and PanelVECM approaches introduce a cross-sectional dimension to the modeling. A panel VAR is similar to large-scale VARs, which allow for both dynamic and static interdependencies. For example, in a previous paper [23], countries are grouped into four stages. They applied a panel VAR model to study the effects of the market shocks to domestic product and fish production on fish consumption at the global and cluster levels.
In the case of medium-term and long-term food market forecasting, it is necessary to consider a large number of interrelated characteristics, some of which are nonlinearly related to the predicted value [24]. Therefore, nonlinear models are actively used. The most common way to create such models is to use machine learning and neural network models. For example, previous papers [25,26] presented several methods to forecast various crop productions. They tested several methods, such as Ridge Regression, Gradient Boosting Regression, and the Long-Short Term Memory recurrent network (LSTM). They noted the relationship between the amount of accumulated historical data on crop production in the region and the accuracy of forecasts. Recent advances in modeling complex non-linear dependencies are related to Kolmogorov–Arnold networks (KANs) [27] and their modifications to analyze time series (TKANs) [28]. KAN uses spline-parametrized univariate functions in place of traditional linear weights, enabling them to dynamically learn activation patterns and significantly enhancing interpretability. An interesting approach is also proposed in [29], where they combine KANs with the Transformer architecture, which helps consider long-term dependencies between indicators. Although this approach seems to be strong in catching the dependencies, the model size grows with the time-series length, which is a major drawback in the case of limited training data. Thus, in this paper, we employ recurrent models to catch those dependencies.
In another paper [30], TKAN (EleKAN) was applied to forecast the Australian New South Wales electricity market, which includes various features, including the total demand, demand delay, power consumption, and temporal factors such as weekdays, holidays, and time intervals. The experimental results demonstrated that EleKAN significantly improves the precision and dependability of power demand estimates. Le et al. employed TKAN [31] to forecast the world gold market. The results indicated that the proposed model consistently outperforms other models, achieving the lowest forecasting errors across various feature combinations.
Considering the complex non-linear dependencies between different features, which affect food supply chains, much research focuses on utilizing Transformer-based neural networks. These networks were initially designed to capture complex dependencies between text features. Recently, different approaches to make the Transformers applicable to modeling continuous time series have been developed. For example, a previous paper [32] proposed the time-series transformer to forecast crop production. Time-series transformers closely follow the original transformer architecture adapted for time-series data. Since the range of the time-series values is continuous, the time-series transformer uses a swappable distribution head as its final layer. Namely, it models the parameters of a continuous distribution of the target values and is trained by minimizing its corresponding negative log-likelihood loss. Another paper [33] introduced a framework that integrates trade forecasting with time-series decomposition to split trade value sequences into seasonal, trend, and residual elements, identifying a potential economic cycle. The framework utilizes a Transformer-based model with a spatiotemporal encoder to process these components. The experiments showed that the proposed approach overcomes other models in terms of accuracy.
One of the novel frameworks to forecast highly stochastic trade flows is to enable transformers to model the Fourier spectrum of the target values. Another paper [34] proposed a Fourier neural operator, which enables efficient learning of mappings between infinite-dimensional spaces using input–output pairs. The motivation here stems from the image-processing domain [35]. The signal-to-spectrum conversion can be used to suppress redundant or noisy frequency components and retain only those features that are beneficial for the modeling [36]. Another paper [37] introduced a modification of the method from reference [34] called the transformer-based neural operator. This model comprises several multi-head attention mechanisms with a Fourier-decomposition-based neural spectral regressor module. The results showed that the proposed model achieves better accuracy than the original spectrum regression models.
Although the models above provide a helpful tool for dealing with interconnected trade flows, they still lack a mechanism for utilizing information from past steps. A previous paper [38] proposed a graph Transformer architecture for temporal data. First, they learnt the structure of the graph that represents the spatial dependency between the data. Then, they sparsified the standard transformer based on that structure. This approach significantly outperformed other models on the taxi ride-hailing demand task. However, this approach leads to too large models, which are easily overfit for large graphs. Moreover, the model topology is data-dependent; therefore, it cannot be utilized to process data with a different dependency structure. In another paper [39], researchers combined graph convolution and recurrent networks, which made it possible to model time series with a rather compact model, though the recurrent network may suffer from gradient vanishing effects on large time series. To deal with long-term dependencies, another paper [40] proposed combining the graph convolution network with the self-attention mechanism of transformers. They used a sliding window to catch temporary dependencies; therefore, extending the size of the window for large time series leads to proportionate growth of the model size. More studies [41,42] developed that approach with the STGformer architecture. STGformer combines the attention mechanism with graph-convolution networks to build spatial-temporal attention layers and consider both global and local data patterns. Unlike other architectures that require multiple attention layers, this attention layer captures high-order spatiotemporal interactions in a single layer, reducing computational cost. However, the graph convolutions may lead to a lack of important information about the state of particular countries and trade flows, and the size of the attention weight matrix still depends on the length of the time series; therefore, the sequence length is limited. In this paper, we mitigate this problem with recurrent transformers, which combine a graph transformer for spatial dependencies with a recurrent layer for temporal ones. Another paper [43] utilized a similar general idea, though they used a pre-defined function to explicitly estimate edge weights based on the previous ones; therefore, the model can handle the graph’s topology dynamics. In our case, edges are assigned to trade (export/import) between countries, which makes that model useless for us. In contrast to that model, we encode hidden temporal states for the vertices and edges, which are then used to predict the wheat production and export, respectively.
Another related direction of research is the use of Transformer decoder networks to model temporal data. A previous paper [44] suggested using the Transformer decoder as a conditional quantile estimator to predict the quantiles of prediction residuals, which are used to estimate the prediction interval. They show that the decoder benefits the estimation of the prediction interval by learning temporal dependencies across past prediction residuals. In this paper, we utilize the Transformer’s encoder–decoder architecture to model conditional scenarios, where export volume level between some countries is limited to a pre-defined value.
One approach is the use of log-linear gravity regressions. The test is performed by adding a dummy variable representing the presence or absence of a particular policy [45]. Another approach has been to use computable and applied general equilibrium models (CGE, AGE). These are large-scale models that can be commonly used to evaluate changes in trade policy [46]. CGE modeling is typically used when the key purpose of the analysis is to understand the impacts of policies (for instance, to understand how policy changes affect the market [47]. The most significant works in this area include the multi-commodity, partial equilibrium SWOPSIM framework [48], the Global Trade Analysis Project (GTAP) [49], the International Institute for Applied Systems Analysis (IIASA) Model [50], the GTAP-AGR model [51], the Common Agricultural Policy Regionalised Impact modelling system (CAPRI) [52], and the GARCH-MIDAS model to study the influence of geopolitical risk on the commodity markets [53]. Although those models are considered to be precise, they require a great deal of manual labor and thorough analysis to run. There are also a lot of indicators and scenarios, which are hard to analyze with such a framework, including small-scale impacts, social welfare indicators, non-market factors, etc.
In contrast, Agent-based modeling (ABM) is a quantitative framework that does not need to compute optima, meaning they are simpler to run and more tractable, allowing a higher level of realism. In this framework, agents (countries, etc.) make decisions using heuristics, myopic reasoning, and/or learning algorithms. Recent research has developed quantitative agent-based models that make time-series predictions, modeling a specific economy at a specific point in time. For example, a previous paper [54] focused on modeling trade wars considering four countries and the rest world as key participants. In [55], it was shown that ABMs can compete with VAR and DSGE models in out-of-sample trade forecasting. Unfortunately, ABMs are highly demanding of hardware resources, especially in the case of world-scale trade modeling, which is why those models are simulations by nature.
Autoregressive models can also be applied to counterfactual intervention studies. For example, in [56], they proposed a VAR model and formulated the inference of a causal model as a joint regression task, where, for inference, they used both data with and without interventions. In this paper, we stick to a similar approach, also inspired by multiple Masked language modeling frameworks in natural language processing, in which the model learns to restore masked tokens based on the existing ones.
In summary, this study addresses the limitations of existing wheat trade forecasting research, including the trade-offs between Fourier modules’ frequency and time resolutions, the neglect of critical external factors (e.g., policies), and insufficient validation. This framework addresses the drawbacks of spectral features through adaptive graph attention, integrates key external variables to capture real-world market complexity, and enhances validation, thereby directly resolving the gaps in prior work.

3. Dataset

In our data collection, we integrated open information about wheat trade flows and production levels from the UN FAOSTAT and UN Comtrade databases [57,58] (Table 1). We obtained trading and production data for N countries and G products in the time period t. In this study, G = 1 (wheat), t = 1993–2023. That period is used because the earlier indicators are not interpretable due to the large changes in the political world map from 1989 to 1993. Thus, an initial dataset was compiled on wheat exports from 245 countries over a period of 30 years. This study focuses more on how models can utilize spatial and temporal relationships to achieve better prediction accuracy; therefore, we deliberately neglected complex features such as climate or global economy indicators. We filtered the data to leave only significant wheat-exporting countries; therefore, the dataset contains information about the export volumes, with a mean value for the 1993–2023 period of at least 500,000 kilograms per year. After filtering, the number of remaining countries for analysis decreased to 211 (N = 211). Thus, the final dataset contains 6330 records.
We filtered all the unions and other non-country trade subjects from the dataset. The collected data has a small number of gaps, especially for the 2022–2023 period. We utilized several imputation techniques, including ARIMA, forward fill, linear interpolation, and model-based imputation to fill these gaps. During the model-based imputation, we trained the same model that we tested (with the same hyperparameters) on the data that had been preprocessed with forward fill. Then, we used the trained model to fill the gaps. Then, we split the data into five-year-long non-overlapping ranges to obtain more time-series samples for training and validation. Finally, we normalized all features and targets to the interval [0.1] to prevent overflows during model training and inference.
As preliminary studies with the Kolmogorov–Smirnov test showed that the target export values (except some extreme values close to zero) are expected to be Beta-distributed, we tested several losses (Figure 1). First, we applied the mean squared error (MSE) of the model output logit (presuming the logit is normally distributed) and binary cross-entropy (BCE) as the losses to train all the models. To motivate BCE loss, one may imagine a binary output Z, which has a Bernoulli distribution. The Beta-distributed model output Y would define a probability for Z. Thus, we have a special case of Beta-Binomial distribution, which collapses back to Bernoulli (see [59], page 17). We also tried to employ beta-regression loss in a straightforward manner [60], but the performance was very low. We believe this is due to optimization issues, because in our case, the Beta values and most of the outputs and targets remain close to 0.

4. Forecast Models

4.1. Baseline Regression Models

In this study, we consider wheat export forecasting as a time-series prediction problem. In a food market, past-time indicators like the production level may have a great effect on future values. A common approach to solving this problem is to utilize a regression model, whereby the current output depends on the indicator values from previous time steps. The essential choice in this case is to use an autoregressive model, such as ARIMA, and a recurrent network. We utilized two well-known recurrent network architectures, namely, Long-Short Term Memory (LSTM) and the Gated Recurrent Unit (GRU), as the baselines [61,62]. These architectures apply gating mechanisms to solve the gradient-vanishing problem, which may help in catching long-term relationships between indicators and output values.
Besides LSTM and GRU, we utilized the Temporal Kolmogorov–Arnold Networks (TKANs) [28]. TKANs are recurrent extensions of the Kolmogorov–Arnold Networks, which have been empirically shown to be more accurate for many problems [27].

4.2. Recurrent Graph Transformer Model

The proposed hybrid regression model is based on a neural network with the Transformer architecture [63]. The transformer network allows one to consider the complex inter-relationships between the trade and production indicators and the predicted values for different countries. In this study, we extend the standard Graph Transformer architecture proposed in a previous paper [64]. The general idea of the Graph Transformer is to sequentially generate vector representations (embeddings) of graph vertices and edges, considering embeddings of their neighbors in the graph. As in the previous paper [43], we use temporal states for edges to create a recurrent model, i.e., to make the current vector representations independent of the previous time steps, but we encode the states in a hidden trainable embedding via a standard recurrent layer. In addition to the edge embeddings, we also make the vertex embeddings independent of the previous time steps. We use recurrent blocks (LSTM or GRU) to do so. In Figure 2b, hit denotes an embedding for the vertex i at step t, and eijt denotes an embedding for the edge ij at step t. In our case, the graph transformer operates not with the observable trade indicators or output values but rather with the embeddings, which may be arbitrarily large to achieve the required complexity of the model. Another difference is that we use Fast Fourier Transformation (FFT) blocks to work with discrete spectral data instead of original values, which is preferable for Transformers. We perform FFT on the time series, with a length of 5 and a step value of 1; therefore, the frequency resolution is 1/5.
Figure 2a shows the architecture of the proposed model with N graph transformer blocks. The model takes features of a pair of graph vertices i and j (production levels P r o d i and P r o d j in the countries i and j, and features of the edge between these vertices (export value E x p o r t i j ). First, we (optionally) apply the FFT blocks and add positional embeddings λ 1,2 to the obtained representations (Expressions (1), and (2)). As in the standard Graph Transformer, we use Laplacian eigenvectors to form the positional embeddings [38]. Then, we use recurrent blocks (RNN) to create embeddings of the vertex features h i , t 0 , h j , t 0 of size h and edge features e i , j , t 0 of size e at each timestep t. We use output vertex and edge embeddings from the previous time step   h i , t 1 N , e i , t 1 N as the hidden state embeddings in the RNN (Expressions (3), and (4)).
E x p o r t i j f f t = F F T E x p o r t i j + λ 1 .
P r o d i f f t = F F T P r o d i + λ 2 .
h i , t 0 = R N N P r o d i , t f f t , h i , t 1 N .
e i j , t 0 = R N N E x p o r t i j , t f f t + λ , e i , t 1 N .
Next, we iteratively use graph transformer blocks G T (Expression (5)) to form the output vertex and edge feature embeddings h i , t N , h j , t N , e i j , t N [38]:
h i , t l , h j , t l , e i j , t l = G T h i , t l 1 , h j , t l 1 , e i j , t l 1 .
The model has N graph transformation blocks; each of them contains H heads to create linear projections of the inputs. Each transformer block works as follows: First, it applies a standard multi-head attention mechanism [63,64] to edge embeddings. Next, it uses the obtained edge attention weights to adjust vertex embeddings. Then, it processes the embeddings with a residual network. In the next step t + 1, these embeddings h i , t N , h j , t N , e i j , t N are used to obtain h i , t + 1 0 , h j , t + 1 0 ,   e i j , t 0 with RNN blocks. This way, the graph transformer helps catch spatial dependences, while the RNN learns temporal dependencies (Figure 2b).
Finally, we utilize a multilayer feed-forward network F F N o u t with hidden relu activations and sigmoid output activation to predict next export values E x p o r t i j , t + 1 (Expression (6)) based on the edge embeddings e i j , t N .
E x p o r t i j , t + 1 = F F N o u t e i j , t N .
We use a dropout mechanism to regularize the networks. In the next sections, we will refer to this model as “encoder”.

4.3. Recurrent Graph Transformer Encoder–Decoder Model

We enhance the proposed export forecast model with a decoder to provide conditional forecasting. It might be helpful when, for example, one needs to predict the next export values in a cluster of countries if the next year’s export between some of them is limited to 0. During training, the decoder takes the current production and future export values P r o d i , t , P r o d j , t , E x p o r t i j , t + 1 , together with the current edge and vertex embeddings from the encoder, as inputs e i j , t N , h i , t N , h j , t N (Figure 3). In contrast to the encoder, we use a simple linear feed-forward layer (Expressions (7), and (8)) instead of the RNN block to create edge and vertex embeddings h i , t d , h j , t d , and e i j , t d (in this description, we deliberately omit FFT/inverse FFT blocks for simplicity).
h i , t d = F F N i n ( P r o d i , t + λ )
e i j , t d = F F N i n ( E x p o r t i j , t + 1 + λ )
where d > N.
Then, we apply M blocks with the following structure: The block utilizes a standard multihead attention A t t e n t i o n Q , K , V between the decoder vertex and edge embeddings h i , t d , h j , t d , and e i j , t d (as a query Q), and encoder embeddings e i j , t N , h i , t N , h j , t N as keys K and values V (Expressions (9), and (10)).
A t t e n t i o n Q , K , V = s o f t m a x Q K T d k V .
H t d + 1 = A t t e n t i o n ( H t d , H t N ) + H t d ,   E t d + 1 = A t t e n t i o n ( E t d , E t N ) + E t d ,
where H t d is the decoder vertex embeddings of a country cluster, H t N is the encoder vertex embeddings of the country cluster, E t d is the decoder edge embeddings of the country cluster, and E t N is the encoder edge embeddings of the country cluster. After that, the block applies a graph attention, as in the encoder.
Finally, as in the encoder, we utilize a multilayer feed-forward network FFN with hidden relu activations and sigmoid output activation to predict the next export values.
During training, we set some randomly picked decoder’s export inputs for the countries k and l to E x p o r t k l , t (e.g., to the values of the export from k to l for the current period t, not for the future period t + 1). As in BERT [65], a mask is used to hide all the decoder outputs related to the future export inputs E x p o r t i j , t + 1 during loss estimation. The best results in tests have been obtained when we masked half (50%) of the decoder export inputs.

5. Experiment Results

First, we separate the dataset into the train and validation subsets. We use all the data from 1993 to 2017 as the training dataset, and the data from 2018–2022 as the test dataset. It is worth noting that the test dataset does not have any imputed target values. We use cross-validation on the training subset to find the hyper-parameters (dropout level, number of layers, size of the hidden layers, and learning rate) using the Optuna library. All the networks have been trained on a relatively large epoch number (1000). For the ARIMA model (both for training and for data imputation), we used the AutoARIMA algorithm, which conducts the Kwiatkowski–Phillips–Schmidt–Shin differencing test to define the order of first-differencing, and then finds the model that optimizes the Akaike Information Criterion. We applied the mean squared error MSE, mean average error MAE, MAPE, and coefficient of determination R2 as the validation scores (Expressions (11), (12), (13), and (14)). We used two MAPE scores: MAPEall is evaluated on the whole test dataset, while MAPElarge covers only the exports, which are greater than 5% of the maximum wheat export between any two countries in 1993–2023. We repeated all the tests on random subsets of the training and testing datasets to obtain standard deviations of the scores.
M S E = 1 N i = 1 N ( Y i Y i * ) 2
M A E = 1 N i = 1 N | Y i Y i * |
M A P E = 1 N i = 1 N Y i Y i * Y i
R 2 = 1 i = 1 N ( Y i Y i * ) 2 i = 1 N ( Y i Y ¯ ) 2
Appendix A contains all the detailed scores and additional comments for different models and imputation parameters (ablation study). Table 2 shows the results for one-year-ahead export predictions with the baseline ARIMA model and recurrent networks on the test subset. The results confirm that recurrent neural network models significantly outperform ARIMA on that task. Namely, LSTM achieves the best R2 score of 0.80. We believe this is because LSTM has the most complex gating mechanism to deal with gradient vanishing. The novel TKAN model also shows a competitive R2 score of 0.79.
Representing all countries from the dataset as a single graph would make attention weight matrices too large; therefore, the models would tend to overfit. Thus, we split all the data into clusters and associate each cluster with one or several large wheat exporters and a set of countries–importers. Since the initial data for the clustering is generally a relatively small, weighted graph, we applied the Affinity propagation algorithm, which can work with such data out of the box and provide stable results [66]. We have set 10 top wheat exporters as the preference parameter (“exemplars”, representatives of clusters), and as a result, obtained 10 clusters, of which the sizes vary from 4 to 15 countries. Figure 4 shows some examples of such clusters, which are related to the world’s top wheat exporters.
Each cluster forms a graph to be processed with the Transformer-based models. Table 3 shows the scores of the proposed models on the test dataset. The recurrent graph transformer (in the single encoder configuration) achieved the best results in terms of the R2 score, significantly outperforming standard recurrent network-based models. The encoder–decoder model shows similar results. Introducing spectral features (with the FFT block) led to a very slight improvement in the modeling scores. We believe this might be related to the small block length; therefore, in the future, we will try FFT before splitting the dataset into time windows.
Then, we analyzed the ability of the encoder–decoder model to deal with the assumed limitation on the sale of grain to one of the importers. Trade flows between Australia and China serve as a special case to illustrate the features of the model and demonstrate experimental results. We have selected Australia as the largest player in the global wheat grain market to perform calculations with the encoder–decoder model. According to the Comtrade database, the share of global wheat trade in Australia in 2022 was 13%, and it has the potential to expand exports at a level of +4.8% since, for the 2017–2022 period, it was classified as a fast-growing market segment. In many ways, Australia’s export potential was facilitated by the deregulation of the grain market in 2008 and the subsequent reform of the grain trading market. The key was the mandatory accreditation of exporters in accordance with the Wheat Export Marketing Act 2008, which establishes strict quality requirements, including compliance with international phytosanitary norms and product safety standards. Australia has intensified the modernization of logistics infrastructure and the implementation of innovations in logistics to optimize the operation of port terminals, introducing the Wheat Port Code of Conduct to standardize it. This has increased the transparency of capacity allocation and reduced downtime, which is critical to meeting delivery times to key Asian markets (Indonesia, Vietnam, etc.). Australia is focused on reforming its grain export supply chains and reviewing the approaches to selecting exporters to remain competitive in the global grain market.
We have performed a retrospective forecast on the 2018–2022 interval to check two scenarios with the encoder–decoder model:
  • Forecast of wheat grain exports without any limitations.
  • Forecast of wheat grain exports, providing the limitation on the sale of grain to one of the importers.
We selected a wheat-importing country for the forecast calculations based on a combination of two parameters: (1) in the structure of Australia’s exports, the volume of wheat grain trade should be at least 5% of the total export volume (to exclude the importers with insignificant volumes); (2) the country should have a growing demand for wheat grain. Thus, the conditions above describe China’s import of wheat grain. In 2018, China’s share of Australian wheat exports was 6.1%. At the same time, for the 2017–2022 period, according to Comtrade, China was classified as a fast-growing market segment that has the potential to expand exports at a level of +16%. Figure 5 shows the modeling results for both scenarios.
The left column of the picture is the forecast of exports without restrictions, and the right column is the forecast of exports subject to restrictions on trade flows between Australia and China. For clarity of the trade flow assessment in the diagrams, the maximum volume of trade flows is colored in red (more than 20%), medium in yellow (at the level of 15–20%), low in green (5–15%), and very low in blue (less than 5%).
As part of the study, we considered a hypothetical example: the Australia–China zero-export experiment to assess possible risks and forecast export destinations when trade restrictions (sanctions) are imposed. Given that this situation does not exist in the real world, it is impossible to validate scenario forecasting using real historical trade data. However, due to the volatility of international trade, any scenario cannot be completely ruled out.

6. Discussion

The structural and dynamic analysis shows the resilience of the Australian market to the introduction of sanctions, including the restoration of the market concentration volume over the forecast period. The ratio of the export volume from the model without limitations to the export volume from the model with the trade restrictions shows the possibilities for restoring export flows. We propose a coefficient of the export recovery potential, RecoveryExport (RE):
R E = V e x p o r t T M V e x p o r t T M s
where V e x p o r t T M is the export volume from the model without any limitations and V e x p o r t T M s is the export volume from the model, which introduces the limitations.
The modeling shows that by the end of the forecast period (2022), the RE coefficient was 0.74, which allows us to discuss the restoration of Australian exports after the introduction of restrictions on trade with China. In addition, the model provides feasible results, redistributing flows among importers whose import volumes are significant and ignoring ones who cannot meet the growing demand for wheat (island states, etc.). When adjusting trade flows according to the model with restrictions on trade with China, Australia’s export destinations could be redirected, with an increase in the volume of deliveries; the bulk of deliveries could go to Indonesia (28.1%), Thailand (14.8%), Vietnam (20.3%), and Yemen (11.15%).
Given the roughness of the tested models, this example, of course, should be considered a demo to show the model’s potential. In the future, we shall find appropriate retrospective trade shock scenarios to perform a valid counterfactual study.

7. Conclusions

This study tackles the critical challenges of modern agrifood supply chains, characterized by globally interconnected networks marked by escalating complexity and interdependence. The main actors of the agrifood market have to place greater emphasis on modern methods of forecasting supplier networks to streamline supply chain operations and enhance overall performance. The high market concentration of key commodities such as wheat amplifies vulnerabilities in trade-restricted environments, where supplier selection directly impacts global food security.
This paper introduces a novel recurrent transformer architecture, extending graph transformer frameworks to temporal data analysis in supply networks. Unlike existing models, the model includes hidden temporal states at the vertex level, explicitly encoding the dynamic production capacities of national actors of the wheat grain market. This approach enables key advancements: (1) capturing time-dependent patterns, such as seasonal production fluctuations and long-term yield trends, allows the model to enhance its predictive accuracy in volatile markets; and (2) considering interconnections and dependencies between agrifood importers and exporters via the graph attention mechanism.
Another feature is that the model follows an encoder–decoder architecture, which enables, for instance, the ability to simulate alternative supplier configurations under trade restrictions. This provides insights for policymakers and agribusinesses, particularly in mitigating cascading risks from supplier concentration in grain markets. In contrast to equilibrium or agent-based approaches, the proposed model is simple to obtain (as it infers the required relationships from data) and run because it does not need hardware-demanding numerical simulations.
In the future, we will test the model in multi-agent environments integrated with behavioral economic factors (e.g., supplier risk preferences, policy response biases) to refine its predictive power under market shocks and perform counterfactual studies on retrospective data. However, critical questions remain to guide further exploration: How might quantifying dynamic trust and collaboration patterns among suppliers alter the model’s ability to forecast the redistribution of export flows after trade restrictions? Can the framework be adapted to disentangle the overlapping impacts of climate extremes and geopolitical tensions, two major drivers of wheat supply chain volatility rarely modeled together? How might incorporating sub-national production data (e.g., regional yield variations within export countries) enhance the granularity of resilience assessments? Beyond these questions, the results of this study confirm the proposed model’s accuracy in forecasting resilient wheat supply chains amid global uncertainties. Its novel hybrid architecture also contributes to the discourse on sustainable agrifood systems by aligning advanced forecasting methods with supply chain governance needs.

Author Contributions

Conceptualization and methodology, Y.O. and H.Z.; investigation and validation, D.D.; writing—review and editing, D.D., H.Z., and Y.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant from the Russian Science Foundation, project No. 25-28-00810. The research was also supported by the State assignment of the Ministry of Science and Higher Education of the Russian Federation, project “Software and analytical complexes for increasing the efficiency of management decisions in the Russian Federation” No. FMGF-2025-0001, No. 125031003392-3.

Data Availability Statement

The dataset is available at https://github.com/masterdoors/agro_graph_transformer (accessed on 5 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1 provides the performance scores for all the baseline models. All of the runs for all scenarios below have been repeated five times on random subsets (70% of the original sets, without repeats) of the training and testing data; std scores are provided. There is no significant difference between different imputation strategies, but the forward fill is slightly behind the others. We provide two MAPE scores: MAPEall is evaluated on the whole test dataset, while MAPElarge covers only the exports, which are greater than 5% of the maximum wheat export between any two countries in 1993–2023 (i.e., it ignores all of the blue lines in Figure 5). One can note that the MAPEall scores are poor for all the considered combinations (the best result is 2.28), while the MAPElarge scores are significantly better (the minimum error score is 0.31). This means the models cannot predict small exports between countries accurately, but perform better for large importers/exporters. It is worth noting that the use of BCE loss tends to improve R2 and MAPElarge scores, but MAPEall becomes arbitrarily large.
Table A1. Performance scores for baseline forecasting models with various hyperparameters and imputation.
Table A1. Performance scores for baseline forecasting models with various hyperparameters and imputation.
ModelImputationLossMSE, ×1017, kg2MAE, ×108, kgMAPEallMAPElargeR2
TKANForward fillBCE1.38 ± 0.401.73 ± 0.5841.80 ± 71.800.35 ± 0.090.76 ± 0.11
TKANForward fillMSE(logit)3.41 ± 3.022.14 ± 1.106.47 ± 9.360.43 ± 0.230.58 ± 0.36
LSTMForward fillBCE1.76 ± 0.181.57 ± 0.0610.28 ± 3.540.36 ± 0.020.73 ± 0.07
LSTMForward fillMSE(logit)1.97 ± 0.241.76 ± 0.172.56 ± 0.140.40 ± 0.030.74 ± 0.03
GRUForward fillBCE1.57 ± 0.571.53 ± 0.1813.88 ± 7.520.33 ± 0.020.77 ± 0.04
GRUForward fillMSE(logit)2.33 ± 0.711.85 ± 0.242.64 ± 0.720.46 ± 0.020.65 ± 0.04
TKANInterpolationBCE1.61 ± 0.471.60 ± 0.167.06 ± 2.530.33 ± 0.020.78 ± 0.05
TKANInterpolationMSE(logit)1.85 ± 0.531.53 ± 0.192.28 ± 0.660.31 ± 0.020.74 ± 0.05
LSTMInterpolationBCE1.37 ± 0.341.52 ± 0.155.97 ± 3.470.32 ± 0.020.80 ± 0.03
LSTMInterpolationMSE(logit)2.31 ± 0.551.87 ± 0.193.02 ± 1.310.43 ± 0.060.67 ± 0.09
GRUInterpolationBCE1.53 ± 0.451.53 ± 0.1511.34 ± 5.310.33 ± 0.030.78 ± 0.05
GRUInterpolationMSE(logit)2.54 ± 1.031.93 ± 0.323.19 ± 1.710.48 ± 0.080.65 ± 0.11
TKANModel-basedBCE1.71 ± 0.291.60 ± 0.137.91 ± 2.320.33 ± 0.010.77 ± 0.04
TKANModel-basedMSE(logit)3.23 ± 2.092.12 ± 0.9310.59 ± 16.220.44 ± 0.210.52 ± 0.33
LSTMModel-basedBCE1.23 ± 0.391.54 ± 0.2013.67 ± 6.050.34 ± 0.030.80 ± 0.03
LSTMModel-basedMSE(logit)2.48 ± 0.422.01 ± 0.162.29 ± 0.920.51 ± 0.030.64 ± 0.04
GRUModel-basedBCE1.94 ± 0.221.79 ± 0.1213.39 ± 3.220.35 ± 0.020.76 ± 0.02
GRUModel-basedMSE(logit)2.14 ± 0.481.80 ± 0.213.27 ± 0.930.46 ± 0.100.70 ± 0.06
TKANARIMABCE1.47 ± 0.351.52 ± 0.135.55 ± 2.710.32 ± 0.020.79 ± 0.03
TKANARIMAMSE(logit)2.32 ± 1.881.70 ± 0.763.58 ± 5.640.35 ± 0.120.68 ± 0.21
LSTMARIMABCE1.39 ± 0.381.49 ± 0.137.61 ± 4.220.33 ± 0.030.79 ± 0.06
LSTMARIMAMSE(logit)2.67 ± 0.641.99 ± 0.242.66 ± 1.080.47 ± 0.050.63 ± 0.06
GRUARIMABCE1.53 ± 0.411.58 ± 0.1810.17 ± 3.560.34 ± 0.030.77 ± 0.05
GRUARIMAMSE(logit)2.50 ± 0.571.89 ± 0.202.63 ± 0.980.46 ± 0.050.65 ± 0.06
Table A2 shows the ablation study results for the proposed hybrid model. The primary conclusions regarding imputation strategies and losses are the same as for baselines. The best MAPEall score is 0.87, which is achieved for MSE loss; the best MAPElarge is 0.25 (BCE loss). The best R2 is 0.88 (BCE loss). Models with a large number of attention heads (2, 4) and a small number of layers (1, 2) tend to show better results. Replacing the recurrent network with a linear layer (i.e., ignoring hidden states) leads to total degradation of the model performance. Enabling the FFT block to encode inputs leads to a hardly noticeable performance improvement.
Table A2. Performance scores for the proposed recurrent graph transformer forecasting model with various hyperparameters and imputation.
Table A2. Performance scores for the proposed recurrent graph transformer forecasting model with various hyperparameters and imputation.
ImputationLossLayersAttn. HeadsFFTRecurrent EmbeddingsMSE, ×1017, kg2MAE, ×108, kgMAPEallMAPElargeR2
Forward fillMSE(logit)11++1.60 ± 0.371.48 ± 0.181.03 ± 0.130.34 ± 0.080.74 ± 0.08
Forward fillBCE11++1.03 ± 0.461.28 ± 0.2212.94 ± 5.730.30 ± 0.050.84 ± 0.06
Forward fillMSE(logit)11+-6.68 ± 1.463.61 ± 0.403.40 ± 1.220.69 ± 0.080.15 ± 0.13
Forward fillBCE11+-6.01 ± 2.074.41 ± 0.4293.55 ± 97.000.59 ± 0.070.20 ± 0.15
Forward fillMSE(logit)12++2.18 ± 0.621.73 ± 0.241.04 ± 0.180.38 ± 0.050.69 ± 0.08
Forward fillBCE12++1.01 ± 0.401.32 ± 0.1513.37 ± 12.450.28 ± 0.020.86 ± 0.04
Forward fillMSE(logit)12+-6.22 ± 1.833.50 ± 0.553.92 ± 1.410.70 ± 0.070.25 ± 0.13
Forward fillBCE12+-7.66 ± 1.285.22 ± 0.13158.10 ± 65.710.52 ± 0.05−0.02 ± 0.02
Forward fillMSE(logit)14++2.63 ± 1.001.70 ± 0.501.21 ± 0.260.44 ± 0.050.57 ± 0.10
Forward fillBCE14++0.79 ± 0.391.13 ± 0.266.32 ± 3.950.25 ± 0.050.88 ± 0.05
Forward fillMSE(logit)14+-5.97 ± 1.613.31 ± 0.495.69 ± 2.050.71 ± 0.090.15 ± 0.16
Forward fillBCE14+-5.33 ± 1.594.18 ± 0.4882.81 ± 75.300.57 ± 0.070.26 ± 0.14
Forward fillMSE(logit)21++2.71 ± 1.111.69 ± 0.411.11 ± 0.180.37 ± 0.030.59 ± 0.12
Forward fillBCE21++1.25 ± 0.451.33 ± 0.328.34 ± 6.380.30 ± 0.030.82 ± 0.02
Forward fillMSE(logit)21+-5.43 ± 1.593.20 ± 0.384.32 ± 1.710.67 ± 0.040.31 ± 0.15
Forward fillBCE21+-6.53 ± 1.984.12 ± 0.5484.29 ± 110.030.59 ± 0.070.24 ± 0.14
Forward fillMSE(logit)22++2.11 ± 1.131.69 ± 0.571.17 ± 0.320.37 ± 0.080.69 ± 0.21
Forward fillBCE22++1.83 ± 0.571.69 ± 0.238.14 ± 2.850.32 ± 0.040.76 ± 0.06
Forward fillMSE(logit)22+-6.84 ± 3.343.54 ± 0.804.81 ± 2.680.68 ± 0.070.20 ± 0.14
Forward fillBCE22+-5.17 ± 1.323.59 ± 0.68102.70 ± 88.770.60 ± 0.020.25 ± 0.19
Forward fillMSE(logit)24++1.54 ± 0.861.43 ± 0.260.87 ± 0.060.32 ± 0.030.79 ± 0.12
Forward fillBCE24++0.84 ± 0.321.15 ± 0.216.48 ± 2.330.27 ± 0.050.86 ± 0.04
Forward fillMSE(logit)24+-4.70 ± 1.502.81 ± 0.534.81 ± 1.390.67 ± 0.050.23 ± 0.12
Forward fillBCE24+-6.13 ± 1.593.81 ± 0.5647.83 ± 25.290.63 ± 0.050.11 ± 0.07
Forward fillMSE(logit)31++2.30 ± 0.791.68 ± 0.311.07 ± 0.080.34 ± 0.040.70 ± 0.05
Forward fillBCE31++1.90 ± 0.641.62 ± 0.295.52 ± 3.090.32 ± 0.020.78 ± 0.04
Forward fillMSE(logit)31+-5.10 ± 0.473.09 ± 0.344.33 ± 1.300.67 ± 0.110.35 ± 0.02
Forward fillBCE31+-5.15 ± 1.203.57 ± 0.5129.56 ± 9.890.61 ± 0.070.26 ± 0.12
Forward fillMSE(logit)32++2.82 ± 0.471.75 ± 0.261.14 ± 0.310.38 ± 0.050.57 ± 0.05
Forward fillBCE32++0.95 ± 0.361.13 ± 0.206.73 ± 2.380.28 ± 0.050.85 ± 0.06
Forward fillMSE(logit)32+-4.35 ± 0.632.98 ± 0.323.24 ± 0.710.66 ± 0.080.40 ± 0.05
Forward fillBCE32+-4.45 ± 0.763.24 ± 0.1928.82 ± 6.160.60 ± 0.060.29 ± 0.10
Forward fillMSE(logit)34++1.61 ± 0.991.48 ± 0.350.98 ± 0.150.31 ± 0.070.78 ± 0.14
Forward fillBCE34++1.55 ± 0.681.41 ± 0.3612.02 ± 8.300.36 ± 0.120.77 ± 0.09
Forward fillMSE(logit)34+-4.93 ± 1.373.19 ± 0.553.91 ± 1.080.67 ± 0.090.31 ± 0.09
Forward fillBCE34+-6.69 ± 0.784.12 ± 0.2840.27 ± 22.370.60 ± 0.060.11 ± 0.12
Forward fillMSE(logit)41++1.36 ± 0.421.35 ± 0.230.90 ± 0.050.33 ± 0.020.77 ± 0.06
Forward fillBCE41++1.15 ± 0.461.38 ± 0.295.12 ± 2.220.29 ± 0.030.85 ± 0.04
Forward fillMSE(logit)41+-5.16 ± 1.713.24 ± 0.5510.54 ± 11.460.62 ± 0.040.24 ± 0.06
Forward fillBCE41+-3.78 ± 1.043.06 ± 0.4627.06 ± 10.960.57 ± 0.050.37 ± 0.04
Forward fillMSE(logit)42++2.82 ± 0.971.68 ± 0.231.17 ± 0.100.35 ± 0.050.62 ± 0.09
Forward fillBCE42++1.03 ± 0.211.33 ± 0.147.41 ± 1.350.30 ± 0.020.84 ± 0.03
Forward fillMSE(logit)42+-5.10 ± 1.732.94 ± 0.479.61 ± 6.620.66 ± 0.060.28 ± 0.18
Forward fillBCE42+-5.52 ± 1.253.71 ± 0.4546.39 ± 31.380.62 ± 0.070.20 ± 0.12
Forward fillMSE(logit)44++112.40 ± 3.0032.89 ± 0.56467.51 ± 67.113.38 ± 0.12−17.69 ± 4.11
Forward fillBCE44++1.96 ± 1.541.95 ± 1.0619.35 ± 32.260.32 ± 0.120.71 ± 0.29
Forward fillMSE(logit)44+-4.95 ± 1.323.01 ± 0.444.65 ± 1.970.66 ± 0.050.34 ± 0.06
Forward fillBCE44+-5.52 ± 1.433.69 ± 0.4437.92 ± 33.770.65 ± 0.100.25 ± 0.15
InterpolationMSE(logit)11++1.71 ± 0.641.51 ± 0.240.96 ± 0.130.33 ± 0.010.75 ± 0.11
InterpolationBCE11++1.27 ± 0.411.46 ± 0.3013.12 ± 11.630.30 ± 0.030.84 ± 0.03
InterpolationMSE(logit)11+-5.42 ± 1.413.09 ± 0.533.56 ± 0.470.65 ± 0.090.26 ± 0.15
InterpolationBCE11+-6.69 ± 2.374.42 ± 0.8255.59 ± 29.870.53 ± 0.040.18 ± 0.16
InterpolationMSE(logit)12++3.91 ± 1.352.17 ± 0.451.23 ± 0.210.38 ± 0.050.52 ± 0.15
InterpolationBCE12++1.07 ± 0.401.34 ± 0.239.12 ± 7.630.29 ± 0.050.85 ± 0.05
InterpolationMSE(logit)12+-4.91 ± 1.223.08 ± 0.404.63 ± 2.560.68 ± 0.120.19 ± 0.11
InterpolationBCE12+-5.82 ± 1.614.50 ± 0.4073.48 ± 19.070.52 ± 0.040.12 ± 0.06
InterpolationMSE(logit)14++2.46 ± 0.481.68 ± 0.161.11 ± 0.060.38 ± 0.040.63 ± 0.04
InterpolationBCE14++1.12 ± 0.391.21 ± 0.2210.15 ± 4.520.32 ± 0.040.81 ± 0.04
InterpolationMSE(logit)14+-5.97 ± 0.903.14 ± 0.222.89 ± 1.020.66 ± 0.080.21 ± 0.13
InterpolationBCE14+-7.18 ± 1.205.10 ± 0.28157.29 ± 71.370.51 ± 0.01−0.00 ± 0.00
InterpolationMSE(logit)21++3.66 ± 1.142.11 ± 0.401.07 ± 0.110.39 ± 0.050.58 ± 0.08
InterpolationBCE21++1.21 ± 0.421.33 ± 0.297.55 ± 4.620.29 ± 0.030.82 ± 0.05
InterpolationMSE(logit)21+-4.38 ± 1.042.89 ± 0.453.34 ± 0.710.62 ± 0.030.35 ± 0.07
InterpolationBCE21+-4.22 ± 0.873.25 ± 0.5924.27 ± 7.630.61 ± 0.080.31 ± 0.13
InterpolationMSE(logit)22++2.22 ± 1.081.59 ± 0.231.11 ± 0.110.34 ± 0.030.68 ± 0.16
InterpolationBCE22++1.20 ± 0.461.30 ± 0.253.81 ± 1.560.30 ± 0.050.83 ± 0.04
InterpolationMSE(logit)22+-4.32 ± 1.442.90 ± 0.493.14 ± 0.730.60 ± 0.080.36 ± 0.09
InterpolationBCE22+-4.37 ± 1.403.44 ± 0.4731.59 ± 17.690.57 ± 0.030.35 ± 0.10
InterpolationMSE(logit)24++2.16 ± 0.901.56 ± 0.231.23 ± 0.130.37 ± 0.020.68 ± 0.13
InterpolationBCE24++1.43 ± 0.281.41 ± 0.065.24 ± 1.380.32 ± 0.020.81 ± 0.03
InterpolationMSE(logit)24+-6.45 ± 1.493.46 ± 0.496.82 ± 2.650.81 ± 0.120.03 ± 0.23
InterpolationBCE24+-4.20 ± 1.032.97 ± 0.4625.45 ± 6.850.73 ± 0.090.32 ± 0.07
InterpolationMSE(logit)31++1.76 ± 0.891.53 ± 0.381.21 ± 0.340.36 ± 0.080.76 ± 0.07
InterpolationBCE31++0.91 ± 0.341.20 ± 0.163.97 ± 2.880.28 ± 0.030.86 ± 0.05
InterpolationMSE(logit)31+-5.19 ± 0.953.03 ± 0.284.63 ± 1.720.68 ± 0.070.23 ± 0.14
InterpolationBCE31+-4.80 ± 0.543.55 ± 0.2350.16 ± 47.810.66 ± 0.050.25 ± 0.11
InterpolationMSE(logit)32++3.48 ± 1.642.02 ± 0.481.21 ± 0.080.39 ± 0.070.51 ± 0.20
InterpolationBCE32++0.94 ± 0.251.23 ± 0.147.29 ± 2.850.25 ± 0.010.87 ± 0.02
InterpolationMSE(logit)32+-4.56 ± 1.102.92 ± 0.514.54 ± 2.570.65 ± 0.070.33 ± 0.13
InterpolationBCE32+-5.80 ± 1.204.05 ± 0.4535.55 ± 22.950.60 ± 0.050.25 ± 0.12
InterpolationMSE(logit)34++2.33 ± 0.791.68 ± 0.321.07 ± 0.180.32 ± 0.030.70 ± 0.04
InterpolationBCE34++0.91 ± 0.361.17 ± 0.234.46 ± 1.360.27 ± 0.030.87 ± 0.03
InterpolationMSE(logit)34+-4.59 ± 1.382.98 ± 0.544.19 ± 1.450.65 ± 0.090.30 ± 0.08
InterpolationBCE34+-5.80 ± 1.713.74 ± 0.8255.00 ± 22.260.59 ± 0.070.19 ± 0.15
InterpolationMSE(logit)41++8.08 ± 2.563.98 ± 0.959.69 ± 2.580.89 ± 0.01−0.17 ± 0.06
InterpolationBCE41++0.91 ± 0.461.20 ± 0.244.20 ± 1.190.26 ± 0.040.87 ± 0.05
InterpolationMSE(logit)41+-7.08 ± 2.783.75 ± 0.906.25 ± 2.090.81 ± 0.150.02 ± 0.28
InterpolationBCE41+-5.93 ± 2.153.58 ± 0.7837.12 ± 21.170.66 ± 0.050.25 ± 0.13
InterpolationMSE(logit)42++2.89 ± 0.891.83 ± 0.240.97 ± 0.070.36 ± 0.040.63 ± 0.11
InterpolationBCE42++1.03 ± 0.451.26 ± 0.256.66 ± 2.960.31 ± 0.050.85 ± 0.03
InterpolationMSE(logit)42+-5.65 ± 1.663.22 ± 0.605.64 ± 2.720.67 ± 0.080.27 ± 0.07
InterpolationBCE42+-6.04 ± 1.733.79 ± 0.5226.44 ± 12.150.67 ± 0.090.14 ± 0.14
InterpolationMSE(logit)44++19.49 ± 28.137.68 ± 9.8481.31 ± 138.850.92 ± 0.99−1.63 ± 3.83
Model-basedMSE(logit)11++2.31 ± 0.491.73 ± 0.321.03 ± 0.180.40 ± 0.100.68 ± 0.04
Model-basedBCE11++1.63 ± 0.891.59 ± 0.433.78 ± 1.430.31 ± 0.050.81 ± 0.08
Model-basedMSE(logit)11+-4.52 ± 1.692.87 ± 0.647.49 ± 5.390.74 ± 0.060.20 ± 0.15
Model-basedBCE11+-6.42 ± 0.684.50 ± 0.47112.95 ± 81.720.53 ± 0.040.08 ± 0.18
Model-basedMSE(logit)12++2.48 ± 0.891.60 ± 0.251.25 ± 0.330.37 ± 0.030.65 ± 0.11
Model-basedBCE12++3.96 ± 2.983.10 ± 1.8081.01 ± 76.310.37 ± 0.100.48 ± 0.39
Model-basedMSE(logit)12+-6.04 ± 1.763.44 ± 0.523.92 ± 0.830.68 ± 0.050.22 ± 0.08
Model-basedBCE12+-5.83 ± 0.604.47 ± 0.31135.05 ± 31.640.55 ± 0.040.15 ± 0.05
Model-basedMSE(logit)14++1.58 ± 0.581.43 ± 0.281.01 ± 0.200.38 ± 0.100.78 ± 0.08
Model-basedBCE14++1.99 ± 1.501.72 ± 0.8714.34 ± 10.810.33 ± 0.040.73 ± 0.18
Model-basedMSE(logit)14+-6.78 ± 1.933.44 ± 0.787.69 ± 4.010.74 ± 0.040.10 ± 0.10
Model-basedBCE14+-10.17 ± 8.315.30 ± 2.37102.72 ± 130.700.65 ± 0.08−0.36 ± 1.05
Model-basedMSE(logit)21++2.90 ± 0.271.86 ± 0.210.94 ± 0.070.37 ± 0.020.63 ± 0.02
Model-basedBCE21++1.61 ± 0.681.50 ± 0.1711.31 ± 7.100.30 ± 0.040.81 ± 0.09
Model-basedMSE(logit)21+-6.68 ± 1.433.51 ± 0.573.79 ± 1.210.76 ± 0.110.14 ± 0.23
Model-basedBCE21+-4.15 ± 0.523.67 ± 0.26121.03 ± 38.990.56 ± 0.030.27 ± 0.07
Model-basedMSE(logit)22++2.12 ± 1.241.64 ± 0.530.96 ± 0.080.35 ± 0.060.69 ± 0.17
Model-basedBCE22++1.27 ± 0.531.45 ± 0.365.05 ± 3.460.32 ± 0.070.84 ± 0.04
Model-basedMSE(logit)22+-5.86 ± 1.013.47 ± 0.224.79 ± 2.550.75 ± 0.040.14 ± 0.09
Model-basedBCE22+-6.55 ± 1.284.39 ± 0.6673.56 ± 77.170.61 ± 0.060.08 ± 0.07
Model-basedMSE(logit)24++3.57 ± 2.662.17 ± 0.733.91 ± 5.180.47 ± 0.210.56 ± 0.33
Model-basedBCE24++1.29 ± 0.491.44 ± 0.305.68 ± 3.960.31 ± 0.040.83 ± 0.04
Model-basedMSE(logit)24+-3.99 ± 0.682.77 ± 0.433.77 ± 1.890.67 ± 0.090.33 ± 0.13
Model-basedBCE24+-4.70 ± 1.153.48 ± 0.2629.98 ± 9.990.60 ± 0.060.30 ± 0.12
Model-basedMSE(logit)31++2.49 ± 1.391.70 ± 0.571.09 ± 0.160.36 ± 0.050.63 ± 0.11
Model-basedBCE31++0.81 ± 0.301.20 ± 0.174.66 ± 2.200.27 ± 0.020.87 ± 0.03
Model-basedMSE(logit)31+-5.46 ± 0.852.89 ± 0.354.04 ± 1.480.70 ± 0.040.29 ± 0.06
Model-basedBCE31+-5.23 ± 1.253.47 ± 0.6219.87 ± 2.740.66 ± 0.080.27 ± 0.13
Model-basedMSE(logit)32++1.69 ± 0.301.58 ± 0.161.16 ± 0.140.37 ± 0.040.73 ± 0.05
Model-basedBCE32++1.42 ± 0.561.40 ± 0.2112.40 ± 4.940.32 ± 0.040.78 ± 0.07
Model-basedMSE(logit)32+-6.67 ± 1.203.69 ± 0.5310.97 ± 8.260.80 ± 0.13−0.13 ± 0.44
Model-basedBCE32+-9.48 ± 9.174.23 ± 1.8972.63 ± 106.580.78 ± 0.18−0.61 ± 1.83
Model-basedMSE(logit)34++2.43 ± 0.691.77 ± 0.321.22 ± 0.220.38 ± 0.040.67 ± 0.08
Model-basedBCE34++2.59 ± 1.922.18 ± 1.4417.61 ± 6.420.37 ± 0.120.64 ± 0.26
Model-basedMSE(logit)34+-5.53 ± 1.563.51 ± 0.467.47 ± 5.620.71 ± 0.070.20 ± 0.12
Model-basedBCE34+-5.03 ± 1.333.68 ± 0.5940.47 ± 11.990.61 ± 0.060.19 ± 0.14
Model-basedMSE(logit)41++46.73 ± 55.3214.17 ± 15.61208.27 ± 254.441.60 ± 1.56−5.31 ± 7.45
Model-basedBCE41++1.42 ± 0.631.41 ± 0.354.71 ± 1.910.31 ± 0.060.81 ± 0.08
Model-basedMSE(logit)41+-4.11 ± 1.293.00 ± 0.463.45 ± 0.910.65 ± 0.070.35 ± 0.15
Model-basedBCE41+-6.56 ± 1.934.26 ± 0.75263.20 ± 446.900.67 ± 0.07−0.05 ± 0.19
Model-basedMSE(logit)42++3.05 ± 1.332.09 ± 0.621.19 ± 0.140.41 ± 0.040.57 ± 0.18
Model-basedBCE42++1.10 ± 0.571.26 ± 0.327.37 ± 5.680.29 ± 0.070.84 ± 0.05
Model-basedMSE(logit)42+-4.74 ± 0.802.88 ± 0.465.14 ± 2.530.69 ± 0.090.22 ± 0.15
Model-basedBCE42+-5.12 ± 0.973.57 ± 0.4251.89 ± 32.710.66 ± 0.110.25 ± 0.08
Model-basedMSE(logit)44++2.13 ± 0.641.71 ± 0.411.04 ± 0.220.36 ± 0.030.69 ± 0.05
Model-basedBCE44++1.74 ± 1.221.71 ± 0.7810.18 ± 12.190.34 ± 0.150.78 ± 0.16
Model-basedMSE(logit)44+-6.02 ± 2.333.43 ± 1.066.31 ± 3.820.71 ± 0.040.22 ± 0.12
Model-basedBCE44+-5.37 ± 0.433.70 ± 0.3138.21 ± 32.850.68 ± 0.050.30 ± 0.05
ARIMAMSE(logit)11++2.82 ± 0.681.72 ± 0.211.11 ± 0.320.38 ± 0.030.61 ± 0.08
ARIMABCE11++1.43 ± 0.461.66 ± 0.364.93 ± 5.150.30 ± 0.030.84 ± 0.03
ARIMAMSE(logit)11+-7.46 ± 1.703.92 ± 0.647.10 ± 2.790.79 ± 0.060.09 ± 0.19
ARIMABCE11+-6.87 ± 1.964.31 ± 0.7537.82 ± 12.430.58 ± 0.030.19 ± 0.13
ARIMAMSE(logit)12++1.83 ± 0.351.67 ± 0.241.07 ± 0.100.33 ± 0.020.74 ± 0.04
ARIMABCE12++1.13 ± 0.511.25 ± 0.225.81 ± 2.840.28 ± 0.040.85 ± 0.04
ARIMAMSE(logit)12+-4.29 ± 0.732.77 ± 0.704.50 ± 1.300.65 ± 0.070.27 ± 0.08
ARIMABCE12+-4.48 ± 0.683.65 ± 0.3761.71 ± 18.070.58 ± 0.100.30 ± 0.13
ARIMAMSE(logit)14++2.13 ± 0.641.55 ± 0.271.19 ± 0.170.34 ± 0.030.68 ± 0.02
ARIMABCE14++1.36 ± 0.531.45 ± 0.305.02 ± 3.510.28 ± 0.030.83 ± 0.06
ARIMAMSE(logit)14+-5.11 ± 2.063.03 ± 0.784.60 ± 2.120.72 ± 0.080.18 ± 0.13
ARIMABCE14+-5.18 ± 0.374.06 ± 0.52103.04 ± 71.290.59 ± 0.100.23 ± 0.12
ARIMAMSE(logit)21++2.26 ± 0.551.64 ± 0.221.18 ± 0.270.36 ± 0.050.66 ± 0.06
ARIMABCE21++1.31 ± 0.351.54 ± 0.334.90 ± 2.210.31 ± 0.050.84 ± 0.02
ARIMAMSE(logit)21+-4.80 ± 0.653.08 ± 0.292.98 ± 0.560.71 ± 0.100.29 ± 0.11
ARIMABCE21+-4.37 ± 0.613.33 ± 0.2320.57 ± 12.200.62 ± 0.020.34 ± 0.08
ARIMAMSE(logit)22++1.60 ± 0.701.37 ± 0.301.09 ± 0.210.31 ± 0.050.74 ± 0.08
ARIMABCE22++1.18 ± 0.311.35 ± 0.283.71 ± 1.080.28 ± 0.030.83 ± 0.05
ARIMAMSE(logit)22+-5.69 ± 0.843.28 ± 0.204.15 ± 2.240.65 ± 0.040.22 ± 0.09
ARIMABCE22+-5.58 ± 3.103.85 ± 0.9633.48 ± 13.820.59 ± 0.040.32 ± 0.13
ARIMAMSE(logit)24++2.22 ± 0.651.75 ± 0.390.89 ± 0.130.36 ± 0.070.71 ± 0.07
ARIMABCE24++1.14 ± 0.471.33 ± 0.277.47 ± 4.630.29 ± 0.050.82 ± 0.07
ARIMAMSE(logit)24+-5.63 ± 1.243.26 ± 0.624.74 ± 1.950.66 ± 0.070.23 ± 0.15
ARIMABCE24+-6.03 ± 0.214.97 ± 0.12165.61 ± 19.610.52 ± 0.02−0.00 ± 0.00
ARIMAMSE(logit)31++1.88 ± 1.101.53 ± 0.400.91 ± 0.070.32 ± 0.040.75 ± 0.10
ARIMABCE31++1.10 ± 0.281.29 ± 0.122.83 ± 1.150.28 ± 0.020.85 ± 0.02
ARIMAMSE(logit)31+-5.09 ± 1.563.19 ± 0.576.22 ± 2.540.74 ± 0.070.27 ± 0.04
ARIMABCE31+-5.57 ± 1.874.11 ± 0.8045.67 ± 13.260.62 ± 0.100.04 ± 0.26
ARIMAMSE(logit)32++1.52 ± 0.401.34 ± 0.310.95 ± 0.070.33 ± 0.060.76 ± 0.05
ARIMABCE32++1.28 ± 0.341.40 ± 0.164.33 ± 3.430.29 ± 0.010.83 ± 0.02
ARIMAMSE(logit)32+-6.15 ± 1.713.38 ± 0.656.42 ± 4.260.74 ± 0.090.20 ± 0.17
ARIMABCE32+-5.71 ± 1.513.88 ± 0.4159.30 ± 27.340.64 ± 0.060.17 ± 0.12
ARIMAMSE(logit)34++2.47 ± 0.941.76 ± 0.371.06 ± 0.210.35 ± 0.050.68 ± 0.10
ARIMABCE34++1.18 ± 0.351.31 ± 0.208.88 ± 1.920.31 ± 0.020.83 ± 0.02
ARIMAMSE(logit)34+-6.67 ± 1.123.66 ± 0.455.23 ± 1.560.67 ± 0.080.17 ± 0.14
ARIMABCE34+-5.56 ± 1.263.96 ± 0.4642.09 ± 22.790.61 ± 0.080.17 ± 0.06
ARIMAMSE(logit)41++2.46 ± 1.101.75 ± 0.471.19 ± 0.220.34 ± 0.020.68 ± 0.06
ARIMABCE41++1.14 ± 0.351.36 ± 0.245.56 ± 2.550.31 ± 0.040.83 ± 0.04
ARIMAMSE(logit)41+-4.19 ± 1.122.77 ± 0.453.37 ± 0.640.68 ± 0.100.26 ± 0.14
ARIMABCE41+-5.69 ± 1.943.48 ± 0.6028.93 ± 8.540.71 ± 0.130.18 ± 0.17
ARIMAMSE(logit)42++1.96 ± 0.801.46 ± 0.321.08 ± 0.230.34 ± 0.070.68 ± 0.10
ARIMABCE42++1.34 ± 0.501.33 ± 0.239.42 ± 13.700.28 ± 0.030.82 ± 0.04
ARIMAMSE(logit)42+-4.32 ± 0.672.70 ± 0.214.91 ± 1.340.67 ± 0.080.30 ± 0.07
ARIMABCE42+-5.71 ± 1.483.65 ± 0.5739.23 ± 24.220.59 ± 0.060.22 ± 0.11
ARIMAMSE(logit)44++2.93 ± 1.472.02 ± 0.491.26 ± 0.280.39 ± 0.050.63 ± 0.12
ARIMABCE44++1.43 ± 0.651.45 ± 0.337.75 ± 4.610.31 ± 0.040.80 ± 0.05
ARIMAMSE(logit)44+-5.30 ± 0.722.96 ± 0.397.09 ± 4.870.65 ± 0.040.30 ± 0.09
ARIMABCE44+-5.02 ± 1.933.49 ± 0.6728.37 ± 15.940.61 ± 0.080.23 ± 0.21
ARIMAMSE(logit)11-+1.94 ± 0.501.59 ± 0.221.26 ± 0.190.36 ± 0.060.74 ± 0.09
ARIMABCE11-+1.15 ± 0.281.28 ± 0.197.44 ± 0.990.32 ± 0.020.82 ± 0.02
ARIMAMSE(logit)11--1.01 ± 0.751.26 ± 0.341.35 ± 0.100.31 ± 0.040.85 ± 0.09
ARIMABCE11--1.08 ± 0.421.35 ± 0.2610.39 ± 2.990.30 ± 0.020.83 ± 0.05
ARIMAMSE(logit)12-+2.23 ± 0.531.66 ± 0.241.19 ± 0.180.34 ± 0.020.69 ± 0.06
ARIMABCE12-+1.01 ± 0.591.20 ± 0.189.77 ± 2.420.29 ± 0.030.85 ± 0.06
ARIMAMSE(logit)12--1.17 ± 0.551.25 ± 0.191.37 ± 0.160.32 ± 0.040.81 ± 0.05
ARIMABCE12--1.34 ± 0.371.45 ± 0.117.49 ± 4.180.29 ± 0.020.84 ± 0.03
ARIMAMSE(logit)14-+2.08 ± 0.731.53 ± 0.241.14 ± 0.080.31 ± 0.040.74 ± 0.07
ARIMABCE14-+1.36 ± 0.411.44 ± 0.248.82 ± 5.090.30 ± 0.010.81 ± 0.03
ARIMAMSE(logit)14--1.59 ± 0.411.41 ± 0.161.21 ± 0.100.31 ± 0.030.79 ± 0.04
ARIMABCE14--1.35 ± 0.461.33 ± 0.1210.27 ± 2.820.31 ± 0.020.82 ± 0.05
ARIMAMSE(logit)21-+1.70 ± 0.531.54 ± 0.231.48 ± 0.380.33 ± 0.020.77 ± 0.05
ARIMABCE21-+1.33 ± 0.571.44 ± 0.215.52 ± 2.160.30 ± 0.030.83 ± 0.05
ARIMAMSE(logit)21--1.72 ± 0.991.39 ± 0.241.36 ± 0.160.33 ± 0.040.77 ± 0.07
ARIMABCE21--1.69 ± 0.671.42 ± 0.117.03 ± 2.510.31 ± 0.020.77 ± 0.08
ARIMAMSE(logit)22-+2.17 ± 0.421.59 ± 0.101.16 ± 0.070.35 ± 0.030.72 ± 0.05
ARIMABCE22-+1.17 ± 0.301.33 ± 0.156.51 ± 3.280.28 ± 0.020.83 ± 0.03
ARIMAMSE(logit)22--1.93 ± 1.171.61 ± 0.421.45 ± 0.190.31 ± 0.020.76 ± 0.07
ARIMABCE22--1.90 ± 0.901.50 ± 0.367.16 ± 2.520.32 ± 0.040.76 ± 0.10
ARIMAMSE(logit)24-+2.62 ± 0.691.65 ± 0.181.34 ± 0.150.37 ± 0.030.60 ± 0.07
ARIMABCE24-+1.91 ± 0.431.64 ± 0.288.38 ± 5.630.31 ± 0.030.75 ± 0.05
ARIMAMSE(logit)24--1.65 ± 0.421.56 ± 0.221.26 ± 0.230.35 ± 0.040.79 ± 0.04
ARIMABCE24--1.26 ± 0.331.43 ± 0.147.09 ± 3.060.28 ± 0.030.82 ± 0.03
ARIMAMSE(logit)31-+1.16 ± 0.361.25 ± 0.101.21 ± 0.130.32 ± 0.020.81 ± 0.04
ARIMABCE31-+1.81 ± 0.471.47 ± 0.246.81 ± 2.720.32 ± 0.030.76 ± 0.05
ARIMAMSE(logit)31--1.72 ± 0.811.62 ± 0.521.53 ± 0.480.37 ± 0.110.78 ± 0.09
ARIMABCE31--1.52 ± 0.691.53 ± 0.377.38 ± 2.110.29 ± 0.020.80 ± 0.05
ARIMAMSE(logit)32-+1.45 ± 0.401.39 ± 0.161.67 ± 0.330.32 ± 0.040.79 ± 0.04
ARIMABCE32-+1.56 ± 0.751.58 ± 0.419.16 ± 8.660.30 ± 0.030.80 ± 0.06
ARIMAMSE(logit)32--1.30 ± 0.001.14 ± 0.001.47 ± 0.000.30 ± 0.000.82 ± 0.00
Table A3 shows the performance score for the encoder–decoder model with different hyperparameters. Most of the outcomes here are the same as for the encoder-only model. The “Masked ratio” shows the ratio of exports from the current year t that we use as the input in the decoder (i.e., this is the ratio of exports to be restored).
Table A3. Performance scores for the proposed encoder–decoder model with various hyperparameters and imputation.
Table A3. Performance scores for the proposed encoder–decoder model with various hyperparameters and imputation.
ImputationLossFFTRecurrent EmbeddingsMasked RatioMSE, ×1017, kg2MAE, ×108, kgMAPEallMAPElargeR2
Forward fillMSE(logit)+-0.18.82 ± 2.534.22 ± 0.8210.25 ± 0.990.88 ± 0.02−0.17 ± 0.04
Forward fillBCE+-0.11.35 ± 0.671.76 ± 0.4412.53 ± 9.510.35 ± 0.030.82 ± 0.05
Forward fillMSE(logit)++0.12.53 ± 0.631.92 ± 0.331.92 ± 0.410.37 ± 0.050.70 ± 0.04
Forward fillBCE++0.11.69 ± 0.221.89 ± 0.097.73 ± 1.560.38 ± 0.060.79 ± 0.04
Forward fillMSE(logit)++0.51.58 ± 0.181.46 ± 0.111.10 ± 0.150.32 ± 0.030.73 ± 0.04
Forward fillBCE++0.50.71 ± 0.161.13 ± 0.1217.46 ± 2.060.26 ± 0.050.88 ± 0.04
Model-basedMSE(logit)+-0.18.78 ± 0.004.56 ± 0.006.88 ± 0.000.80 ± 0.00−0.13 ± 0.00
Model-basedMSE(logit)++0.12.91 ± 1.391.92 ± 0.371.45 ± 0.160.37 ± 0.030.63 ± 0.06
ARIMAMSE(logit)++0.11.86 ± 1.041.77 ± 0.512.79 ± 0.750.35 ± 0.050.73 ± 0.13
ARIMABCE++0.11.35 ± 0.381.64 ± 0.2610.90 ± 4.180.35 ± 0.020.81 ± 0.03
InterpolationMSE(logit)-+0.12.02 ± 0.961.71 ± 0.311.98 ± 0.500.37 ± 0.070.68 ± 0.16
InterpolationBCE-+0.12.40 ± 0.412.21 ± 0.165.44 ± 2.360.40 ± 0.050.74 ± 0.03
InterpolationMSE(logit)--0.12.28 ± 1.282.12 ± 0.643.27 ± 1.200.62 ± 0.180.66 ± 0.13
InterpolationBCE--0.11.97 ± 0.391.64 ± 0.196.67 ± 2.650.34 ± 0.030.76 ± 0.03
InterpolationMSE(logit)-+0.31.68 ± 0.431.50 ± 0.171.46 ± 0.380.34 ± 0.040.73 ± 0.08
InterpolationBCE-+0.33.33 ± 1.822.82 ± 1.4776.73 ± 90.060.38 ± 0.100.50 ± 0.36

References

  1. Food and Agriculture Organisation. World Food and Agriculture; FAO: Rome, Italy, 2015. [Google Scholar]
  2. Caparas, M.; Zobel, Z.; Castanho, A.D.; Schwalm, C.R. Increasing risks of crop failure and water scarcity in global breadbaskets by 2030. Environ. Res. Lett. 2021, 16, 104013. [Google Scholar] [CrossRef]
  3. Zhang, Y.T.; Zhou, W.-X. Structural evolution of international crop trade networks. Front. Phys. 2022, 10, 926764. [Google Scholar] [CrossRef]
  4. Duan, J.; Nie, C.; Wang, Y.; Yan, D.; Xiong, W. Research on Global Grain Trade Network Pattern and Its Driving Factors. Sustainability 2022, 14, 245. [Google Scholar] [CrossRef]
  5. Food and Agriculture Organisation. The State of Agricultural Commodity Markets; FAO: Rome, Italy, 2022. [Google Scholar]
  6. Burkholz, R.; Schweitzer, F. International crop trade networks: The impact of shocks and cascades. Environ. Res. Lett. 2019, 14, 114013. [Google Scholar] [CrossRef]
  7. Seekell, D.; Carr, J.; Dell’Angelo, J.; D’Odorico, P.; Fader, M.; Gephart, J.; Kummu, M.; Magliocca, N.; Porkka, M.; Puma, M. Resilience in the global food system. Environ. Res. Lett. 2017, 12, 025010. [Google Scholar] [CrossRef] [PubMed]
  8. Burkholz, R.; Garas, A.; Schweitzer, F. How damage diversification can reduce systemic risk. Phys. Rev. E 2016, 93, 042313. [Google Scholar] [CrossRef]
  9. Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food security: The challenge of feeding 9 billion people. Science 2010, 327, 812–818. [Google Scholar] [CrossRef]
  10. Cabell, J.F.; Myles, O. An indicator framework for assessing agroecosystem resilience. Ecol. Soc. 2012, 17, 1–13. [Google Scholar] [CrossRef]
  11. Schipanski, M.E.; MacDonald, G.K.; Rosenzweig, S.; Chappell, M.J.; Bennett, E.M.; Kerr, R.B.; Blesh, J.; Crews, T.; Drinkwater, L.; Lundgren, J.G.; et al. Realizing resilient food systems. BioScience 2016, 66, 600–610. [Google Scholar] [CrossRef]
  12. Robu, R.G.; Alexoaei, A.P.; Cojanu, V.; Miron, D. The cereal network: A baseline approach to current configurations of trade communities. Agric. Food Econ. 2024, 12, 24. [Google Scholar] [CrossRef]
  13. Schoenherr, T.; Kanak, G.; Montalbano, A.; Patel, S.; Bourlakis, M.; Sawyerr, E.; Cong, W.F. Frontiers in Agri-Food Supply Chains: Frameworks and Case Studies; Burleigh Dodds Science Publishing: Cambridge, UK, 2024. [Google Scholar]
  14. Davide, M. S-MARL: An Algorithm for Single-To-Multi-Agent Reinforcement Learning: Case Study: Formula 1 Race Strategies. Available online: https://www.diva-portal.org/smash/get/diva2:1763095/FULLTEXT01.pdf (accessed on 2 August 2025).
  15. Ahmar, A.S.; Singh, P.K.; Ruliana, R.; Pandey, A.K.; Gupta, S. Comparison of ARIMA, SutteARIMA, and Holt-Winters, and NNAR Models to Predict Food Grain in India. Forecasting 2023, 5, 138–152. [Google Scholar] [CrossRef]
  16. Qader, S.H.; Utazi, C.E.; Priyatikanto, R.; Najmaddin, P.; Hama-Ali, E.O.; Khwarahm, N.R.; Dash, J. Exploring the use of Sentinel-2 datasets and environmental variables to model wheat crop yield in smallholder arid and semi-arid farming systems. Sci. Total Environ. 2023, 869, 161716. [Google Scholar] [CrossRef]
  17. Phiri, D.; Simwanda, M.; Salekin, S.; Nyirenda, V.R.; Murayama, Y.; Ranagalage, M. Sentinel-2 Data for Land Cover/Use Mapping: A Review. Remote Sens. 2020, 12, 2291. [Google Scholar] [CrossRef]
  18. Leucci, A.C.; Ghinoi, S.; Sgargi, D.; Wesz Junior, V.J. VAR models for dynamic analysis of prices in the agri-food system. In Agricultural Cooperative Management and Policy: New Robust, Reliable and Coherent Modelling Tools; Springer International Publishing: Cham, Switzerland, 2014; pp. 3–21. [Google Scholar]
  19. Lutkepohl, H. New Introduction to Multiple Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  20. Li, X.; Yuan, J. DeepVARwT: Deep learning for a VAR model with trend. J. Appl. Stat. 2025, 52, 1–27. [Google Scholar] [CrossRef]
  21. Rumánková, L. Evaluation of market relations in soft milling wheat agri-food chain. AGRIS on-line. Pap. Econ. Inform. 2016, 8, 133–141. [Google Scholar]
  22. Zhao, Y.; Ye, L.; Pinson, P.; Tang, Y.; Lu, P. Correlation-constrained and sparsity-controlled vector autoregressive model for spatio-temporal wind power forecasting. IEEE Trans. Power Syst. 2018, 33, 5029–5040. [Google Scholar] [CrossRef]
  23. Han, K.; Leem, K.; Choi, Y.R.; Chung, K. What drives a country’s fish consumption? Market growth phase and the causal relations among fish consumption, production and income growth. Fish. Res. 2022, 254, 106435. [Google Scholar] [CrossRef]
  24. Xiong, T.; Li, C.; Bao, Y.; Hu, Z.; Zhang, L. A combination method for interval forecasting of agricultural commodity futures prices. Knowl. Based Syst. 2015, 77, 92–102. [Google Scholar] [CrossRef]
  25. Iniyan, S.; Varma, V.A.; Naidu, C.T. Crop yield prediction using machine learning techniques. Adv. Eng. Softw. 2023, 175, 103326. [Google Scholar] [CrossRef]
  26. Panda, S.K.; Mohanty, S.N. Time series forecasting and modeling of food demand supply chain based on regressors analysis. IEEE Access 2023, 11, 42679–42700. [Google Scholar] [CrossRef]
  27. Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-Arnold networks. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
  28. Xu, K.; Chen, L.; Wang, S. Kolmogorov-Arnold networks for time series: Bridging predictive power and interpretability. arXiv 2024, arXiv:2406.02496. [Google Scholar] [CrossRef]
  29. Genet, R.; Inzirillo, H. A Temporal Kolmogorov-Arnold Transformer for Time Series Forecasting. arXiv 2024, arXiv:2406.02486. [Google Scholar] [CrossRef]
  30. Bhattacharya, P.; Mukherjee, T. EleKAN: Temporal Kolmogorov-Arnold Networks for Price and Demand Forecasting Framework in Smart Cities. In Proceedings of the International Conference on Frontiers of Electronics, Information and Computation Technologies, Singapore, 22–24 June 2024; pp. 182–192. [Google Scholar]
  31. Le, D.; Rajasegarar, S.; Luo, W.; Nguyen, T.T.; Angelova, M. Navigating Uncertainty: Gold Price Forecasting with Kolmogorov-Arnold Networks in Volatile Markets. In Proceedings of the 2024 IEEE Conference on Engineering Informatics (ICEI), Melbourne, Australia, 20–21 November 2024; pp. 1–9. [Google Scholar]
  32. Ibañez, S.C.; Monterola, C.P. A Global Forecasting Approach to Large-Scale Crop Production Prediction with Time Series Transformers. Agriculture 2023, 13, 1855. [Google Scholar] [CrossRef]
  33. Ma, B.; Xue, Y.; Chen, J.; Sun, F. Meta-Learning Enhanced Trade Forecasting: A Neural Framework Leveraging Efficient Multicommodity STL Decomposition. Int. J. Intell. Syst. 2024, 2024, 6176898. [Google Scholar] [CrossRef]
  34. Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Fourier neural operator for parametric partial differential equations. arXiv 2020, arXiv:2010.08895. [Google Scholar]
  35. Buchholz, T.O.; Jug, F. Fourier image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1846–1854. [Google Scholar]
  36. He, S.; Lin, G.; Li, T.; Chen, Y. Frequency-Domain Fusion Transformer for Image Inpainting. arXiv 2025, arXiv:2506.18437. [Google Scholar] [CrossRef]
  37. Li, Z.; Liu, T.; Peng, W.; Yuan, Z.; Wang, J. A transformer-based neural operator for large-eddy simulation of turbulence. Phys. Fluids 2024, 36, 065167. [Google Scholar] [CrossRef]
  38. Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. Learning deep time-index models for time series forecasting. In Proceedings of the 14th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  39. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
  40. Li, Y.; Moura, J.M.F. Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data. In Proceedings of the European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 31 August–8 September 2020. [Google Scholar]
  41. Banerjee, S.; Dong, M.; Shi, W. Spatial–temporal synchronous graph transformer network (stsgt) for COVID-19 forecasting. Smart Health 2022, 26, 100348. [Google Scholar] [CrossRef]
  42. Liu, H.; Dong, Z.; Jiang, R.; Deng, J.; Chen, Q.; Song, X. Spatio-temporal adaptive embedding makes vanilla transformer SOTA for traffic forecasting. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023. [Google Scholar]
  43. Wang, H.; Chen, J.; Pan, T.; Dong, Z. STGformer: Efficient Spatiotemporal Graph Transformer for Traffic Forecasting. arXiv 2024, arXiv:2410.00385. [Google Scholar] [CrossRef]
  44. Hu, S.; Zou, G.; Lin, S.; Wu, L.; Zhou, C.; Zhang, B.; Chen, Y. Recurrent transformer for dynamic graph representation learning with edge temporal states. arXiv 2023, arXiv:2304.10079. [Google Scholar]
  45. Lee, J.; Xu, C.; Xie, Y. Transformer Conformal Prediction for Time Series. In Proceedings of the ICML 2024 Workshop on Structured Probabilistic Inference and Generative Modeling, Vienna, Austria, 21–27 July 2024. [Google Scholar]
  46. Caliendo, L.; Parro, F. Gains from Trade: A Model for Counterfactual Trade Policy Analysis; The University of Chicago Working Papers; The University of Chicago: Chicago, IL, USA, 2009. [Google Scholar]
  47. Yu, Z.; Han, J.; Shi, X.; Yang, Y. Estimates of the trade and global value chain (GVC) effects of China’s Pilot Free Trade Zones: A research based on the quantitative trade model. J. Asia Pac. Econ. 2025, 30, 1255–1302. [Google Scholar] [CrossRef]
  48. Roningen, V.O.; Dixit, P.M. Economic Implications of Agricultural Policy Reform in Industrial Market Economics; Rapport Techniques de I’USDA, Ages 80–36; United States Department of Agriculture, Economic Research Service: Washington, DC, USA, 1989.
  49. Hertel, T.W. Global Trade Analysis: Modeling and Applications; Cambridge University Press: Cambridge, MA, USA, 1997. [Google Scholar]
  50. Parikh, K.S.; Fisher, G.; Frohberg, K.; Gulbrandsen, O. Towards Free Trade in Agriculture; Martinus Mijhoff Publishers: Laxemburg, Austria, 1988. [Google Scholar]
  51. Keeney, R.M.; Hertel, T. GTAP-AGR: A Framework for Assessing the Implications of Multilateral Changes in Agricultural Policies: GTAP Technical Paper; Purdue University: West Lafayette, IN, USA, 2005. [Google Scholar]
  52. Britz, W.; Leip, A. Development of marginal emission factors for N losses from agricultural soils with the DNDC-CAPRI meta-model. Agric. Ecosyst. Environ. 2009, 133, 267–279. [Google Scholar] [CrossRef]
  53. Gong, X.; Xu, J. Geopolitical risk and dynamic connectedness between commodity markets. Energy Econ. 2022, 110, 106028. [Google Scholar] [CrossRef]
  54. Mashkova, A.; Bakhtizin, A. Algorithm and data structures of the agent-based model of trade wars. In Proceedings of the 2021 IEEE 15th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan, 13–15 October 2021; pp. 1–6. [Google Scholar]
  55. Poledna, S.; Miess, M.G.; Hommes, C.; Rabitsch, K. Economic forecasting with an agent-based model. Eur. Econ. Rev. 2023, 151, 104306. [Google Scholar] [CrossRef]
  56. Butler, K.; Iloska, M.; Djurić, P.M. On counterfactual interventions in vector autoregressive models. In Proceedings of the 2024 32nd European Signal Processing Conference (EUSIPCO), Lyon, France, 26–30 August 2024; IEEE: Piscataway, NJ, USA; pp. 1987–1991. [Google Scholar]
  57. Food and Agriculture Organization of the United Nations. Available online: http://www.fao.org/faostat/en/ (accessed on 2 August 2025).
  58. UN Comtrade: International Trade Statistics. Available online: https://comtradeplus.un.org/TradeFlow (accessed on 2 August 2025).
  59. Hodel, F.; Booth, J. The Beta-Binomial Distribution; Cornell University: Ithaca, NY, USA, 2023. [Google Scholar]
  60. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  61. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  62. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
  63. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the 30th Advances in Neural Information Processing Systems, Long Beach, CA, USA, 7–9 December 2017. [Google Scholar]
  64. Dwivedi, V.P.; Bresson, X. A generalization of transformer networks to graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]
  65. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  66. Dueck, D. Affinity Propagation: Clustering Data by Passing Messages. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2009. [Google Scholar]
Figure 1. Distribution of the normalized export values in the dataset.
Figure 1. Distribution of the normalized export values in the dataset.
Systems 13 01026 g001
Figure 2. The hybrid wheat export forecast model. (a) The model architecture. (b) Recurrent dependencies (dashed lines) between hidden (latent) embeddings of the vertices (countries) and edges (trade flows, colored lines) at steps t and t + 1.
Figure 2. The hybrid wheat export forecast model. (a) The model architecture. (b) Recurrent dependencies (dashed lines) between hidden (latent) embeddings of the vertices (countries) and edges (trade flows, colored lines) at steps t and t + 1.
Systems 13 01026 g002
Figure 3. Encoder–decoder approach to ensure predefined constraints on the predicted export values. (a) During training, the decoder takes export and production values from the current step t as input (for example, production in countries i, j, and export between them), but for some randomly picked countries k and l, it takes export for the future step t + 1. The loss for these k, l pairs is masked (red dashed line). (b) During inference, the decoder takes current export and production values, but for some countries k and l, it also takes our predefined limitations (for example, it takes the limit value for the export from k to l, green line).
Figure 3. Encoder–decoder approach to ensure predefined constraints on the predicted export values. (a) During training, the decoder takes export and production values from the current step t as input (for example, production in countries i, j, and export between them), but for some randomly picked countries k and l, it takes export for the future step t + 1. The loss for these k, l pairs is masked (red dashed line). (b) During inference, the decoder takes current export and production values, but for some countries k and l, it also takes our predefined limitations (for example, it takes the limit value for the export from k to l, green line).
Systems 13 01026 g003aSystems 13 01026 g003b
Figure 4. Wheat trade clusters for some top exporters. Vertices are countries, and edges are sustained trade flows. Edges are colorized as follows: red denotes trade flows that are above average, blue denotes trade flows that are below average, and green denotes trade flows that are close to average. (a) Australia; (b) USA; (c) Russian Federation; (d) France.
Figure 4. Wheat trade clusters for some top exporters. Vertices are countries, and edges are sustained trade flows. Edges are colorized as follows: red denotes trade flows that are above average, blue denotes trade flows that are below average, and green denotes trade flows that are close to average. (a) Australia; (b) USA; (c) Russian Federation; (d) France.
Systems 13 01026 g004
Figure 5. Modeling results for the Australian wheat export dynamics in 2018–2022. Left column shows results without any limitations; right column shows the scenario where export to China is set to zero.
Figure 5. Modeling results for the Australian wheat export dynamics in 2018–2022. Left column shows results without any limitations; right column shows the scenario where export to China is set to zero.
Systems 13 01026 g005
Table 1. Feature set for the wheat export forecast.
Table 1. Feature set for the wheat export forecast.
Feature SetFrequencyForFeaturesUnit
Trade flowsAnnualPair of countriesExport quantity
Import quantityKilograms
Re-export quantity
Re-import quantity
ProductionAnnualCountryProduction quantityTonnes
Table 2. Average wheat export trade forecasting errors for the baseline methods.
Table 2. Average wheat export trade forecasting errors for the baseline methods.
ModelMSE, ×1017 kg2MAE, ×108 kgMAPEallMAPElargeR2
ARIMA1.99 ± 0.131.82 ± 0.09(3.03 ± 0.03) × 10210.46 ± 0.020.72 ± 0.03
GRU1.57 ± 0.571.53 ± 0.1813.88 ± 7.520.33 ± 0.020.77 ± 0.04
LSTM1.37 ± 0.341.52 ± 0.155.97 ± 3.470.32 ± 0.020.80 ± 0.03
TKAN1.47 ± 0.351.52 ± 0.135.55 ± 2.710.32 ± 0.020.79 ± 0.03
The most accurate results are highlighted with bold font.
Table 3. Average wheat export trade forecasting errors for the proposed model.
Table 3. Average wheat export trade forecasting errors for the proposed model.
ModelMSE, ×1017 kg2MAE, ×108 kgMAPEallMAPElargeR2
Recurrent graph transformer1.01 ± 0.751.26 ± 0.341.35 ± 0.100.31 ± 0.040.85 ± 0.09
Recurrent graph transformer + spectral features0.79 ± 0.391.13 ± 0.266.32 ± 3.950.25 ± 0.050.88 ± 0.05
Recurrent graph transformer (Encoder-Decoder)0.71± 0.161.13± 0.1217.46 ± 2.060.26 ± 0.050.88 ± 0.04
The most accurate results are highlighted with bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Otmakhova, Y.; Devyatkin, D.; Zhou, H. Hybrid Supply Chain Model for Wheat Market. Systems 2025, 13, 1026. https://doi.org/10.3390/systems13111026

AMA Style

Otmakhova Y, Devyatkin D, Zhou H. Hybrid Supply Chain Model for Wheat Market. Systems. 2025; 13(11):1026. https://doi.org/10.3390/systems13111026

Chicago/Turabian Style

Otmakhova, Yulia, Dmitry Devyatkin, and He Zhou. 2025. "Hybrid Supply Chain Model for Wheat Market" Systems 13, no. 11: 1026. https://doi.org/10.3390/systems13111026

APA Style

Otmakhova, Y., Devyatkin, D., & Zhou, H. (2025). Hybrid Supply Chain Model for Wheat Market. Systems, 13(11), 1026. https://doi.org/10.3390/systems13111026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop