Next Article in Journal
Development of an Ozone (O3) Predictive Emissions Model Using the XGBoost Machine Learning Algorithm
Next Article in Special Issue
An Experimental Study on Harassment Moderation in Llama and Alpaca
Previous Article in Journal
Adversarial Perturbations for Defeating Cryptographic Algorithm Identification
Previous Article in Special Issue
Ensemble-Based Short Text Similarity: An Easy Approach for Multilingual Datasets Using Transformers and WordNet in Real-World Scenarios
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration

1
School of Humanities and Arts, Tianjin University, Tianjin 300072, China
2
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2026, 10(1), 14; https://doi.org/10.3390/bdcc10010014
Submission received: 6 November 2025 / Revised: 15 December 2025 / Accepted: 24 December 2025 / Published: 1 January 2026
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)

Abstract

Accurate forecasting of China’s Consumer Price Index (CPI) is crucial for effective macroeconomic policymaking, yet remains challenging due to structural breaks and nonlinear dynamics inherent in the inflation process. Traditional linear models, such as ARIMA, often fail to capture threshold effects and regime shifts. This study introduces a Threshold Autoregressive Moving Average (TARMA) model that embeds a nonlinear threshold mechanism within the conventional ARMA framework, enabling it to better capture the CPI’s complex behavior. Leveraging an evolutionary modeling approach, the TARMA model effectively identifies high- and low-inflation regimes, offering enhanced flexibility and interpretability. Empirical results demonstrate that TARMA significantly outperforms standard models. Specifically, regarding the CPI Index level, the out-of-sample Mean Absolute Percentage Error (MAPE) is reduced to approximately 0.35% (under the S-BIC integration scheme), significantly improving upon the baseline ARIMA model. The model adapts well to inflation regime shifts and delivers substantial improvements near turning points. Furthermore, integrating an information-criterion-based weighting scheme further refines forecasts and reduces errors. By addressing the limitations of linear models through threshold-driven nonlinearity, this study offers a more accurate and interpretable framework for forecasting China’s CPI inflation.

1. Introduction

The Consumer Price Index (CPI) serves as a fundamental measure of inflation, closely tied to public welfare and macroeconomic stability. Fluctuations in CPI affect not only the real purchasing power of households but also inform critical macroeconomic decisions, including monetary policy direction and capital market trends. As both a global manufacturing hub and a major consumer market, China plays a pivotal role in international supply chains and commodity pricing. Consequently, changes in China’s CPI often generate international spillover effects via commodity prices and trade flows. Thus, CPI volatility is not only a key focus of domestic macroeconomic regulation but also a critical indicator for global markets evaluating inflationary dynamics [1]. Maintaining overall price stability remains a central objective of national macroeconomic control. Achieving this goal requires forward-looking assessments of inflation trends, which underscores the urgent need for accurate CPI forecasting tools. Reliable inflation forecasts enable policymakers to implement timely interventions that stabilize prices and anchor inflation expectations, thereby promoting sustainable economic growth [2]. As early as 1965, Fama distinguished between linear and nonlinear stationary time series models. In many cases, linear models fail to fully capture the dynamics of economic or financial variables, particularly when systems exhibit nonlinear characteristics [3]. While linear models such as autoregressive (AR) models may suffice when nonlinear effects are weak or confined within specific regimes, they are inadequate when nonlinearities significantly shape system behavior. In such cases, nonlinear time series models are more appropriate.
While deep generative architectures—such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion models—represent the current frontier in artificial intelligence, their application to monthly macroeconomic indicators faces specific challenges regarding interpretability and sample size. Policymakers prioritize white-box models that can explicitly identify inflation thresholds and regime-switching mechanisms to guide monetary interventions. Deep learning models, often operating as “black boxes,” lack this structural transparency. Furthermore, China’s monthly CPI data constitutes a “small data” environment (limited to a few hundred observations) where high-capacity neural networks are prone to overfitting. Therefore, this study adopts the Adaptive Threshold Autoregressive Moving Average (TARMA) framework. By adhering to the parsimony principle, our approach effectively captures nonlinear asymmetries and structural breaks without the computational opacity and data hunger associated with deep generative models.
Compared to their linear counterparts, nonlinear models involve greater complexity in theoretical formulation, parameter estimation, and forecasting. As a result, it is necessary to assess the presence and significance of nonlinearity before applying such models. If nonlinear features are statistically insignificant, the series can be approximated as linear. Otherwise, a nonlinear modeling framework is warranted to capture the underlying dynamics accurately. Various nonlinear phenomena observed in engineering systems—including limit cycles, resonance jumps, and amplitude–frequency dependence—highlight the richness of nonlinear behavior [4]. These phenomena are modeled using equations such as the Van der Pol equation for limit cycles, the Duffing equation for resonance, and formulations for systems with variable stiffness in amplitude–frequency dependency. Capturing such dynamics requires nonlinear modeling approaches.
To address these complexities, a range of nonlinear parametric models has been developed, including the Self-Exciting Threshold autoregressive model (SETAR), bilinear models, neural networks, exponential models, and state-dependent models. Among these, the Threshold Autoregressive (TAR) model stands out for its flexibility in modeling regime-switching behavior and for offering physically interpretable parameters [5]. It has therefore found wide application in time series analysis. Building upon this foundation, the present study proposes a Threshold Autoregressive Moving Average (TARMA) model, constructed using evolutionary algorithms. This approach combines the threshold-switching capability of TAR models with the short-run dynamics captured by moving average components, providing a robust and interpretable framework for forecasting China’s CPI inflation.
The TAR model was first proposed by Dr. H. Tong in 1978. The core idea involves introducing one or more threshold values within the range of the observed time series { x t } , thereby partitioning the state space R into l mutually exclusive subsets. Within each subset, the time series follows a distinct linear function, commonly referred to as a piecewise linear function. This can be formally defined by Equation (1):
x t = j = 1 l φ j 0 + k = 1 p j φ j k x t k ( j ) + a t ( j ) I x d R j
In Equation (1), I ( · ) denotes the indicator function (also known as the Hausdorff function), and R j represents a mutually exclusive subset of the real-valued space R , such that R = R 1 R 2 R l , where each R j = { r j x d r j + 1 , j = 1 , , l } . The positive integer d denotes the delay parameter, r 1 , r 2 , , r l represents the threshold parameter, and { a t ( j ) } is a white noise sequence. Equation (1) thus defines the classical Threshold autoregressive model (TAR). The vector of model parameter, denoted by θ , is composed of the coefficients associated with each regime, as well as the threshold and delay parameters: { l , d ; φ j k , j = 1 , , l , k = 1 , , p j ; r i , i = 1 , , l } .
Currently, the main approaches to estimating the parameters of the TAR model include the H·Tong method, the D·C method, and local interval search methods. However, each method suffers from notable limitations:
1.
H. Tong Method: This approach employs a one-dimensional optimization strategy. However, given the high dimensionality and nonlinear complexity of the TAR model’s parameter space, such a method is limited in its ability to identify the global optimum. One-dimensional searches are inherently inefficient for navigating complex, multidimensional landscapes.
2.
D.C. Method: This technique uses a pointwise plotting strategy to delineate the parameter search space. While intuitive, it introduces a high degree of subjectivity and lacks scalability. Specifically, the method becomes inefficient—or entirely impractical—when the threshold space is finely partitioned or when multiple candidate thresholds must be considered simultaneously.
3.
Local Interval Search Method: This method is restricted to TAR models with exactly two regimes (i.e., l = 2 ). In addition, the procedures used to determine key parameters, such as the threshold value r 1 and the delay parameter d are often arbitrary. These values are typically selected on the basis of limited empirical trials, which may not generalize well across datasets with different characteristics.
Due to the inherent limitations of these traditional techniques, this study adopts an evolutionary algorithm to optimize the threshold structure of the model. As noted by prior research, the Autoregressive Moving Average (ARMA) model typically achieves a more parsimonious structure and lower model order than a pure autoregressive (AR) model [5]. Building on this insight, we extend the classical TAR model to a Threshold Autoregressive Moving Average (TARMA) model. Under this extension, Equation (1) is reformulated as follows:
x t = j = 1 l φ j 0 + k = 1 p j φ j k x t k ( j ) + m = 1 q j θ j m a t m ( j ) + a t ( j ) I x d R j
In Equation (2), all parameters retain the same definitions as in Equation (1), with the exception of an additional set of parameters { θ j m , j 1 , , l , m = 1 , , q j } , which represent the moving average (MA) components introduced to capture short-term stochastic dependencies.
This study aims to embed a threshold mechanism within the ARMA framework, resulting in the development of a Threshold Autoregressive Moving Average (TARMA) model for modeling and forecasting China’s Consumer Price Index (CPI) time series. The research is guided by two central questions:
1.
Threshold Effects: Does China’s CPI time series exhibit statistically significant threshold effects—i.e., do its dynamic properties and volatility patterns differ when the inflation rate crosses critical levels?
2.
Forecast Performance: Can the proposed TARMA model deliver improved forecast accuracy relative to conventional linear models?
This study makes several significant contributions to the literature on inflation forecasting and nonlinear dynamics. Methodologically, we pioneer the application of the Threshold Autoregressive Moving Average (TARMA) model to the analysis of China’s CPI. The TARMA framework integrates short-term ARMA dynamics with threshold nonlinearity, effectively modeling regime-dependent structural shifts—an improvement over linear ARIMA and classical TAR models. Empirically, TARMA shows superior accuracy and robustness over linear benchmarks and selected machine learning models. Crucially, it captures regime transitions during key inflation turning points, yielding a Mean Absolute Percentage Error (MAPE) below 3%, which significantly outperforms the approximately 5% error margin observed in linear models. Theoretically, this study bridges the gap between nonlinear time series analysis and macroeconomic theory by articulating the conceptual mechanisms driving regime shifts. We interpret the identified thresholds not as arbitrary statistical boundaries, but as empirical manifestations of State-Dependent Pricing Theory and Asymmetric Policy Preferences. Specifically, the identified “Low-Inflation Regime” aligns with the “Rational Inattention” hypothesis, where mild price fluctuations fail to overcome “menu costs,” leading to structural inertia. Conversely, the “High-Inflation Regime” reflects a state of “active adjustment,” where breached tolerance levels force both economic agents (to adjust prices frequently) and policymakers (to intervene via administrative controls) to react aggressively. By mapping the structural breaks of the TARMA model to these behavioral tipping points, our work reinforces the originality of using evolutionary computation to decode the nonlinear logic governing the inflation dynamics of China.
While we recognize that the frontier of macroeconomic forecasting is increasingly shifting toward probabilistic frameworks that explicitly quantify uncertainty (e.g., density forecasting), it is important to clarify that the primary objective of this study is to investigate the structural interpretability and nonlinear point-forecasting accuracy of the TARMA framework. Given the “black-box” nature of many emerging AI models, our focus remains on establishing a transparent, interpretable mechanism for identifying regime switches in China’s CPI. Consequently, this study reports deterministic point forecasts to validate the model’s ability to capture mean dynamics near turning points. We acknowledge that quantifying the full predictive distribution is crucial for risk assessment; therefore, we outline the theoretical extension of our model to probabilistic intervals in the Discussion section, positioning the full empirical implementation of density forecasting as a critical avenue for future research.
While traditional “Big Data” research often focuses on the volume of data, the scope of Cognitive Computing extends to understanding the complex, nonlinear dynamics of collective human behavior through interpretable computational frameworks. Our study aligns with this paradigm in two specific ways:
1.
We treat the inflation process not merely as a mechanical time series, but as a reflection of collective economic cognition. The threshold mechanism in our TARMA model, where economic agents switch regimes only when specific latent thresholds are breached. By employing Evolutionary Algorithms (EAs) to adaptively learn these thresholds, we effectively reverse-engineer the cognitive tipping points of the market—a core objective of AI in Digital Humanities.
2.
We address the critical challenge of Interpretability in AI. In high-stakes economic forecasting, “black-box” Deep Learning models often fail to provide actionable insights. Our IC-Guided Integration strategy offers a “glass-box” alternative that balances the high-dimensional search capability of computational intelligence with the structural interpretability required for human understanding. Thus, this work serves as a case study in applying interpretable AI to decode the complex behavioral rules governing macroeconomic systems.
To present the research in a clear and organized manner, the remainder of this paper is structured as follows. Section 2 reviews the relevant literature on CPI forecasting. It critically examines prior studies, discusses the pros and cons of linear and nonlinear models, and identifies the research gaps addressed by this study. Section 3 details the data and methodology. It describes the data source, sample characteristics, and statistical properties of the CPI series, followed by a detailed exposition of the Threshold Autoregressive Moving Average (TARMA) modeling process, including threshold identification, parameter estimation, and diagnostic checks. Section 4 presents the empirical results. It reports model estimation and forecasting outcomes, evaluates in-sample fit and out-of-sample performance, interprets regime-specific parameters, and analyzes the sources of forecasting error. Section 5 concludes the study. It summarizes the key findings regarding TARMA’s effectiveness in forecasting China’s CPI and distinguishing inflation regimes, underscores the theoretical and practical implications, acknowledges the study’s limitations, and suggests avenues for future research.

2. Related Work

2.1. Theoretical Underpinnings of Nonlinear Inflation Dynamics

The rationale for employing the TARMA framework rests on two theoretical pillars: state-dependent pricing and asymmetric policy reactions.
First, Menu Cost Theory posits that fixed adjustment costs create a “zone of inaction” under low inflation, resulting in high persistence [6,7]. However, accumulated shocks breaching a critical threshold trigger collective repricing, naturally generating macro-level nonlinearities.
Second, the Asymmetric Policy Preference hypothesis implies structural discontinuities in governance. In China, authorities typically tolerate mild inflation (e.g., <3%) to prioritize growth, fostering an inertial regime. Conversely, breaching a “social stability threshold” triggers aggressive intervention and rapid mean-reversion. Our TARMA model is explicitly designed to capture these latent structural breakpoints driven by market frictions and policy asymmetry.

2.2. Traditional Time Series Models in CPI Forecasting

Autoregressive Moving Average (ARMA) models, along with their generalized form—the Autoregressive Integrated Moving Average (ARIMA) models—have long been foundational tools for forecasting economic time series, including inflation indicators such as the Consumer Price Index (CPI). The Box–Jenkins methodology, introduced by Box and Jenkins [8], provides a systematic framework for identifying, estimating, and diagnosing ARIMA models. Within this framework, the ARIMA model is typically denoted as ARIMA ( p , d , q ) , where p is the autoregressive order, d is the degree of differencing required to achieve stationarity, and q is the moving average order.
The general structure of an ARIMA model is given as:
( 1 ϕ 1 L ϕ p L p ) ( 1 L ) d y t = 1 + θ 1 L + + θ q L q ϵ t
where:
L denotes the lag operator;
d is the number of differences taken to render the series stationary;
ϕ i are the autoregressive (AR) coefficients;
θ j are the moving average (MA) coefficients;
ϵ t is a white noise error term.
In the simplified case when d = 0 (i.e., when the series is already stationary), the model reduces to a standard ARMA form, which expresses the current value of the time series as a linear combination of its past values and past shocks:
y t = i = 1 p ϕ i y t i + j = 1 q θ j ϵ t j + ϵ t
This flexible linear structure enables ARMA models to effectively capture autocorrelation patterns within stationary time series. Since the late 20th century, ARIMA models have been widely used in CPI (inflation) forecasting across various countries. Numerous early studies have shown that univariate ARIMA models perform well in short-term inflation forecasting, especially when employed as benchmark models. For example, central bank researchers have observed that ARIMA models provide robust predictive accuracy for near-term CPI during periods of economic stability [9]. The widespread adoption of ARIMA models in inflation forecasting is largely due to their simplicity and solid theoretical foundation in stochastic process theory.
Clements and co-authors, in their study on Irish inflation, found that univariate ARIMA models yielded relatively low extrapolation errors in short-term forecasts. Similarly, Álvarez-Díaz, M through a comprehensive meta-analysis, concluded that simple univariate forecasting models—including AR-type models—performed on par with more complex models in predicting U.S. inflation, particularly when inflation fluctuated around a stable mean [10].
In the Chinese context, many analysts have adopted ARIMA models for short-term CPI forecasting and reported satisfactory results. These studies generally find that ARIMA models can effectively capture both the seasonal and trend components of China’s CPI, offering reasonably accurate short-term inflation forecasts. Since the ARIMA model is based on historical values, it performs particularly well when future patterns closely resemble past behavior. In fact, during periods of low and stable inflation—often referred to as “inflation stability” regimes—ARIMA-based forecasts tend to align closely with actual CPI data, resulting in low mean absolute errors.
However, while ARIMA remains a useful benchmark model under stable conditions, it lacks the structural flexibility to account for nonlinear regime shifts, structural breaks, or dynamic heteroskedasticity. This limitation has prompted researchers to investigate more advanced and nonlinear modeling techniques that are better suited to capture the complex dynamics of inflation.

2.3. Machine Learning Approaches in Macroeconomic Forecasting

Advancements in computational power and algorithmic development have led to the growing application of machine learning (ML) techniques in macroeconomic forecasting. Feng employed a Long Short-Term Memory (LSTM) neural network to forecast China’s CPI and found that its prediction error was significantly lower than that of the classical Vector Autoregression (VAR) model, indicating the superior accuracy of LSTM in inflation forecasting [11]. Naghi et al. compared various ML models—including regression trees, random forests, and neural networks—with traditional benchmarks such as ARIMA [12]. Their results showed that ensemble learning algorithms consistently outperformed autoregressive models across all forecast horizons. Similarly, Medeiros et al. demonstrated that random forests reduced the variance of U.S. inflation forecast errors by up to 30%, with particularly strong performance during periods of elevated volatility [13]. These studies suggest that ML models can effectively capture complex inflation dynamics and often outperform linear models when sufficient data are available.
However, ML approaches also present notable limitations. First, policymakers often require interpretable justifications for inflation forecasts, whereas many ML models operate as “black boxes” without transparent mechanisms grounded in economic theory. Second, ML models are highly data-intensive and susceptible to overfitting. This limitation is particularly acute for advanced generative architectures (e.g., GANs and diffusion models), which typically require massive datasets to learn stable latent representations. In the context of monthly macroeconomic series, where data points are scarce, the theoretical advantage of these complex architectures over parsimonious nonlinear statistical models remains an open question. Consequently, traditional nonlinear extensions that balance flexibility with structural interpretability remain the preferred tool for policy-oriented empirical analysis. Macroeconomic datasets are typically limited in sample size and subject to structural shifts over time. As a result, models trained on short or nonstationary datasets may fit historical noise rather than generalizable patterns.

2.4. Nonlinear Time Series Models Under Threshold Frameworks

To address the limitations of linear models, researchers have developed nonlinear time series models capable of capturing regime shifts, asymmetries, and other complex behaviors in economic data. Compared to the “black-box” nature of many ML methods, nonlinear econometric models offer greater interpretability and are specifically designed to handle nonlinear dynamics that linear ARIMA models cannot accommodate. Representative models in this category include the Threshold Autoregressive (TAR) model, Smooth Transition Autoregressive (STAR) model, and Markov-Switching Autoregressive (MS-AR) model. These frameworks enable the time series dynamics to evolve across distinct regimes.
The TAR model, originally proposed by Tong and further developed by Tong and Lim, assumes that the series follows different autoregressive processes depending on whether certain threshold conditions are met [14]. TAR models can produce limit cycles and are well-suited for capturing periodic economic behaviors.
The STAR model, introduced by Granger and Teräsvirta and refined by Teräsvirta, allows for smooth transitions between regimes. It typically employs logistic or exponential transition functions to interpolate between two regime-specific autoregressive equations, offering a continuous alternative to discrete switching [15].
The MS-AR model, developed by Hamilton, posits that the time series is governed by a latent Markov process with probabilistic state transitions. Each state corresponds to a distinct AR (or ARIMA) process, and the switching mechanism follows a first-order Markov chain [16].
The Threshold Autoregressive Moving Average (TARMA) model extends the linear ARMA structure by incorporating regime-specific dynamics. Unlike TAR or SETAR models, which allow only the AR coefficients to vary across regimes, the TARMA model permits both AR and MA coefficients to change with regime shifts. This added flexibility enables the TARMA framework to more effectively capture the persistence and transmission of shocks under varying economic conditions.
A two-regime TARMA model can be formally expressed as:
y t = ϕ 1 , 0 + i = 1 p 1 ϕ 1 , i y t i + j = 1 q 1 θ 1 , j ϵ t j + ϵ t , if y t d r ϕ 2 , 0 + i = 1 p 2 ϕ 2 , i y t i + j = 1 q 2 θ 2 , j ϵ t j + ϵ t , if y t d > r
where r is the threshold parameter, d is the delay lag, ϕ k , i and θ k , j denote the autoregressive and moving average coefficients under regime k = 1 , 2 , and ϵ t is a white noise error term.
In the above formulation, s t denotes the threshold variable, which is typically chosen as y t d or some relevant exogenous indicator. Each regime k is allowed to have its own autoregressive order p k and moving average order q k . In empirical applications, it is common—but not strictly necessary—to specify identical lag structures across regimes to simplify estimation and interpretation.
It is important to distinguish the proposed TARMA framework from the classical TAR models of the 1980s. While pure autoregressive threshold models are well-established, the integration of Moving Average (MA) components into regime-switching contexts remains a frontier challenge in econometrics due to identification complexities. Far from being obsolete, the theoretical and empirical investigation of TARMA models has experienced a significant renaissance in recent years.
Notably, Goracci et al. [17] recently provided the first rigorous theoretical justification for the robust estimation of TARMA models in the Journal of Business & Economic Statistics, highlighting their superior parsimony over high-order TAR models. Similarly, Giannerini et al. and Angelini et al. have advanced the testing procedures for threshold effects in Journal of Econometrics and Oxford Bulletin of Economics and Statistics, respectively [18,19]. These studies confirm that TARMA models represent a current “sweet spot” in time series analysis—balancing the structural interpretability required for policy analysis with the capability to capture complex, non-Markovian shock transmission. Our study contributes to this emerging stream by introducing a heuristic evolutionary identification strategy that complements these recent theoretical advances.
Despite the growing body of research on linear models, machine learning algorithms, and relatively simple nonlinear approaches, a notable gap remains: no existing study has applied the threshold ARMA framework to model or forecast CPI inflation in China—or in comparable emerging economies.
This paper seeks to fill that gap by constructing a novel and potentially more robust forecasting model. The proposed framework integrates the foundational ARIMA structure with empirically validated threshold effects and employs the latest estimation techniques for TARMA models. The objective is to enhance the accuracy of CPI inflation forecasting, inform macroeconomic policy and investment decisions, and demonstrate the practical applicability of TARMA models in macroeconomic analysis.

3. Threshold ARMA (TARMA) Model Framework

Macroeconomic time series often exhibit state dependence and nonlinear dynamics; across different economic regimes, the persistence of shocks and the degree of volatility can vary substantially. Traditional linear ARMA models, which rely on a single-mechanism structure, are limited in their ability to simultaneously capture such regime-specific behaviors.
The Threshold Autoregressive Moving Average (TARMA) model addresses this limitation by embedding a threshold mechanism within the ARMA framework. This mechanism enables the process to switch automatically between regime-specific submodels, depending on whether a threshold variable falls within a defined interval. In doing so, the model approximates regime heterogeneity using a piecewise-linear structure.
For example, when modeling year-over-year CPI inflation, once a lagged value exceeds a predetermined threshold, the process may transition into a “high-inflation regime,” characterized by stronger persistence or increased volatility. Conversely, if the lagged value remains below the threshold, the process follows a “low-inflation regime” governed by a different, typically more moderate, dynamic. This design improves both the model’s alignment with empirical economic patterns and its predictive performance, while maintaining interpretability.

3.1. Mathematical Specification of the TARMA Model

Let { X t } denote the year-on-year CPI inflation rate for China at time t, computed from the official CPI published by the National Bureau of Statistics. The TARMA model is estimated on this inflation-rate series, and the threshold mechanism is specified with respect to lagged values of { X t } . In what follows, we use { X t } to represent the dependent variable throughout the model specification.
In the two-regime case, let the threshold variable be a specific lag of the series. There exists a threshold value r and a delay parameter d such that, depending on the value of the threshold variable, the series is governed by one of two distinct ARMA submodels. The error term is assumed to be a white noise process with zero mean and constant variance, although extensions may allow for regime-specific variances.
1.
When the threshold mechanism is inactive (i.e., only one regime is present), the model is reduced to the standard linear ARMA.
2.
When the model includes only autoregressive terms, it reduces to the traditional TAR model.
To streamline the model specification, the regime indicator functions are introduced. These functions determine, based on the threshold condition, whether an observation belongs to the low- or high-inflation regime. Each regime is associated with its own set of ARMA parameters. This formulation explicitly captures the behavior of “threshold-triggered switching” while preserving the ability of the ARMA model to describe short-run autocorrelation and shock propagation.
Formally, the two-regime TARMA model can be written as follows: let the time series X t satisfy the relation defined by a threshold value r and a delay d, such that:
X t = ϕ 1 , 0 + i = 1 p 1 ϕ 1 , i X t i + j = 1 q 1 θ 1 , j ε t j + ε t , if X t d r ϕ 2 , 0 + i = 1 p 2 ϕ 2 , i X t i + j = 1 q 2 θ 2 , j ε t j + ε t , if X t d > r
Among these terms, ε t denotes the stochastic error, assumed to be a zero-mean, constant-variance white-noise process. For simplicity, one may assume ε t N ( 0 , σ 2 ) and that σ 2 is identical across regimes, which can later be relaxed to allow regime-specific variances. The model comprises two sets of ARMA ( p ,   q ) parameters: ϕ 1 , i , θ 1 , j for the low-inflation regime (regime 1) and ϕ 2 , i , θ 2 , j for the high-inflation regime (regime 2), together with a common threshold r and delay d. When q = 0 , the above specification degenerates to the classical threshold autoregressive (TAR) model [20]; when there is only a single regime, it reduces to the standard linear ARMA model. Accordingly, the threshold ARMA may be viewed as one nonlinear generalization of the conventional ARMA. The foregoing model can also be written more compactly using a regime indicator. Define the indicator variable I t = 1 ( x t d > γ ) when x t d exceeds the threshold and I t = 0 otherwise. The model can then be written uniformly as:
x t = ϕ 1 , 0 1 I t + ϕ 2 , 0 I t + i = 1 p ϕ 1 , i 1 I t + ϕ 2 , i I t x t i + j = 1 q θ 1 , j 1 I t + θ 2 , j I t ε t j + ε t
Here, I t is determined by x t d . This formulation intuitively indicates that when I t = 0 (low-inflation regime), x t is governed by the first set of parameters; when I t = 1 (high-inflation regime), x t is governed by the second set of parameters.

3.2. Parameter Interpretation and Economic Significance

(1)
Threshold γ
The threshold γ is a key parameter in the model, serving as the critical value that divides the inflation process into different regimes. Determines the level of the threshold variable (typically a lagged inflation value) at which the model switches from one set of dynamics to another. Economically, the estimated γ can be interpreted as a reference inflation level. For example, if γ 3 % , it implies that when inflation exceeds 3%, the economy enters a “high-inflation” regime characterized by different behavioral dynamics. This threshold may align with policy targets—such as a central bank’s inflation ceiling—or reflect a psychologically significant limit in public or market expectations.
(2)
Autoregressive Coefficients ϕ k , i
The coefficients ϕ k , i represent the effect of the i-th inflation delay on the current inflation in the regime k, where k = 1 corresponds to the low-inflation regime and k = 2 to the high-inflation regime. Of particular interest is the first-order coefficient ϕ k , 1 , which measures the persistence or inertia of inflation [21]. If ϕ 2 , 1 is significantly greater than ϕ 1 , 1 , this suggests that inflation in the high inflation regime tends to persist more strongly over time; that is, a large increase in the previous period is likely to be followed by continued high inflation. Conversely, if ϕ 2 , 1 is relatively small or near zero while ϕ 1 , 1 is large, this indicates that inflation in the high-inflation regime is less persistent—possibly due to effective policy interventions or demand suppression—while low-inflation regimes exhibit more history-dependent behavior. The comparison of ϕ 1 , i and ϕ 2 , i thus provides insight into how the autocorrelation structure of inflation varies between regimes.
(3)
Moving Average Coefficients θ k , j
The coefficients θ k , j capture the influence of the j-th delay of the stochastic error term ε t j on the current inflation in regime k. These parameters describe how short-term random shocks propagate through the inflation process. The differences between θ 2 , j and θ 1 , j indicate that the response to identical shocks varies between regimes. For example, if θ 2 , 1 is small while θ 1 , 1 is larger, it implies that in high-inflation regimes, such as sudden price spikes, shocks have weaker and more short-lived effects on inflation—potentially due to rapid policy or market adjustments. In contrast, during low-inflation periods, such shocks may persist longer, suggesting slower responses or lower price flexibility. Therefore, analyzing MA coefficients across regimes reveals the differing transmission mechanisms of shocks in high- and low-inflation environments. By estimating the parameters of the threshold ARMA model, we gain a deeper understanding of the mechanisms underlying the inflation dynamics. For example, if the first-order autoregressive coefficient (AR(1)) is significantly lower in the high inflation regime than in the low-inflation regime, this suggests that inflation is less self-persistent under high inflation. A possible explanation is that elevated inflation typically triggers tighter monetary policy responses, which prevent inflation from remaining high for an extended period. In contrast, during moderate inflation, price levels are more influenced by market-driven supply and demand fundamentals, resulting in greater inertia [22]. Similarly, comparing the variances of the error terms between regimes provides insight into inflation volatility. Although the baseline TARMA model assumes equal variance between regimes (i.e., σ 1 2 = σ 2 2 ), relaxing this assumption to allow σ 1 2 σ 2 2 enables hypothesis testing regarding whether volatility is greater during high-inflation periods.

3.3. Implementation of the Evolutionary TARMA Extension

(1)
Concept of Evolutionary Modeling
The parameters of the TARMA model, the parameters—including the delay d, the number of regimes l, the threshold values r j for j = 1 , , l , and the regime-specific AR and MA coefficients ϕ j k and θ j m —can be estimated using mixed-integer programming or conventional optimization methods. However, because the values of d and l determine the structure of subsequent parameters, a naïve hybrid algorithm may generate an inconsistent number of r j , ϕ j k , θ j m relative to l, making the model ill-defined and unvaluable. To address this issue, we adopt a hierarchical approach inspired by the structural framework of H. First, the delay d and the number of regimes l are determined; then, based on these values, the threshold locations r j and the corresponding ARMA parameters ϕ j k and θ j m are estimated. Since d and l govern only how the series is partitioned into intervals, decoupling them from the subsequent parameter estimation does not compromise the global search capability of the algorithm.
(2)
Hyperparameter Configuration for Reproducibility
To ensure the reproducibility of our experimental results and to facilitate future research, we explicitly report the hyperparameter settings used in the evolutionary optimization process. The configuration of the Genetic Algorithm (GA) plays a pivotal role in balancing the algorithm’s exploration capability (searching the global parameter space) and exploitation capability (refining local solutions). Based on preliminary convergence sensitivity tests and standard practices in the evolutionary computation literature, the specific hyperparameters employed in this study are detailed in Table 1. A population size of 50 was selected to maintain diversity without incurring excessive computational cost given the sample size. The crossover and mutation probabilities were set to 0.8 and 0.1, respectively, to ensure sufficient recombination of gene segments while preventing premature convergence through random perturbations.
(3)
Algorithmic Mechanics: Encoding, Fitness, and Convergence
To further clarify the operational logic of the Evolutionary Threshold ARMA (ETA) algorithm, we detail the core mechanisms governing the search process.
Encoding Scheme: We employ a hybrid encoding strategy to handle the mixed-integer nature of the TARMA structure. The delay parameter d is encoded as an integer gene constrained within [ 1 ,   d m a x ] , while the threshold values r are encoded as genes of real value within the continuous range of the observed time series. This hybrid chromosome structure C = { d , r 1 , , r l 1 } allows the algorithm to simultaneously optimize discrete structural lags and continuous regime boundaries.
Fitness Computation: The objective function is directly derived from the information criterion to enforce parsimony. For a candidate individual i, the fitness function F ( i ) is defined to minimize the AIC:
Minimize J ( i ) = AIC ( i ) = N ln ( σ ^ i 2 ) + 2 k
where σ ^ i 2 is the residual variance of the TARMA model implied by chromosome i, and k is the total number of estimated parameters. Individuals with lower AIC values are assigned higher selection probabilities in the tournament process.
Randomness Control and Replicability: To ensure that the stochastic nature of the evolutionary search does not compromise replicability, we fix the pseudo-random number generator (PRNG) seed (e.g., Seed = 42 ) at the initialization stage. This guarantees that the sequence of genetic operations (mutation, crossover) remains identical across replication attempts.
Convergence Criteria: The search terminates if either of two conditions is met: (1) The maximum number of generations (set to 100) is reached; or (2) The global best fitness value does not improve by more than a tolerance threshold ( ϵ = 10 4 ) for 15 consecutive generations (Stall Generations), indicating convergence to a stable solution.
Accordingly, the evolutionary process is structured into three hierarchical layers:
1.
Layer 1 evolves the delay d and the number of regimes l. The fitness of each candidate at this layer is determined by the best fitness value obtained from Layer 2.
2.
Layer 2 evolves the threshold values r j , with each individual’s fitness determined by the best fitness achieved in Layer 3.
3.
Layer 3 evolves the ARMA models specific to each regime interval, optimizing their respective coefficients.
Given the constraints d < 20 and l < 5 , the evolution of d and l can be efficiently implemented using only two nested control loops.
(4)
Algorithmic Implementation Procedure
The proposed evolutionary TARMA model is implemented via a hierarchically nested evolutionary algorithm, driven by model selection criteria such as the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). The final model is selected as the one yielding the lowest AIC among all candidate structures. The core algorithm, referred to as ETA (Evolutionary Threshold ARMA), proceeds according to the following pseudo-Pascal steps:
After completing model construction, we use the trained model to forecast the test set and evaluate predictive accuracy. This section will introduce the forecasting scheme, the mode of result presentation, and the performance evaluation metrics.
(5)
Sensitivity Analysis and Algorithmic Stability
Given the stochastic nature of evolutionary algorithms, it is essential to assess whether the identified optimal structure is stable or merely an artifact of a specific random initialization. To evaluate this, we conducted a sensitivity analysis by executing the ETA algorithm 20 times with independent random seeds:
a. Initialization Sensitivity: The results demonstrated high structural stability. In 18 out of 20 runs (90%), the algorithm converged to the exact same delay parameter ( d = 1 ) and threshold values deviating by less than 0.5% (e.g., r [ 102.38 ,   102.42 ] ). This suggests that the objective function landscape, while complex, possesses a dominant global basin of attraction that corresponds to the identified regimes.
b. Sample Variation: We observed that the detection of the ’middle regime’ (in the three-regime model) is sensitive to sample size. When the training window is reduced by 10%, the algorithm occasionally reverts to a two-regime structure. This instability underscores the marginal nature of the transitionary period, justifying our use of Model Averaging to mitigate the risk of over-reliance on a single structural specification.
c. Search Limitations: A potential limitation of this approach is the computational cost associated with the nested estimation (CLS inside the evolutionary loop). While effective for univariate series, the search time increases exponentially with the number of thresholds. Furthermore, without the ‘Minimum Sample Constraint’ (discussed in Algorithm 1), the algorithm may gravitate towards ’outlier regimes’ containing extremely few observations to artificially lower variance. The imposition of this constraint effectively stabilizes the search, forcing the solution towards econometrically meaningful partitions.
Algorithm 1 Hierarchical Threshold Search Algorithm (HTSA)
Input:
- Time series data { x t } t = 1 T (e.g., Year-on-Year CPI)
- Maximum delay lag d m a x (e.g., 12 for monthly data)
- Threshold candidate set R (grid of values within the interquartile range)
Output:
- Optimal structural parameters: delay d * and threshold r *
- Estimated model coefficients: Φ
Procedure:
1. Initialize global minimum Information Criterion: AIC m i n
2. Initialize optimal parameter set: Θ *
3. For each delay parameter d from 1 to d m a x do:
       1. Construct the threshold variable vector z t = x t d
       2. For each threshold candidate r in R do:
           1. Partition the sample into two regimes based on z t :
              - Regime 1: J 1 = { t x t d r }
              - Regime 2: J 2 = { t x t d > r }
           2. Validity Check (Minimum Sample Constraint):
       Calculate the sample size N k for each regime k.
       If min ( N k ) < N m i n (where N m i n 20 30 is the threshold for statistical validity per the Central Limit Theorem), then:
       Assign AIC c u r r (Discard this candidate structure).
       Continue to next iteration.
        End If
           3. Estimate ARMA parameters for each regime using Conditional Least Squares (CLS):
              “To ensure the statistical validity of the estimated parameters, we impose a constraint that each identified regime must contain a minimum number of observations (typically N 25 ) to satisfy the asymptotic properties required by the Central Limit Theorem.”
              * Fit ϕ ^ 1 , θ ^ 1 on data in J 1
              * Fit ϕ ^ 2 , θ ^ 2 on data in J 2
            4. Compute the joint Akaike Information Criterion (AIC) for the split model:
AIC c u r r = n ln σ ^ 2 + 2 k
           (where σ ^ 2 is the residual variance and k is the number of parameters)
           5. If AIC c u r r < AIC m i n then:
              * Update AIC m i n AIC c u r r
              * Update d * d , r * r
              * Store current coefficients into Θ *
            6. End If
            7. End For
4. End For
5. Return d * , r * , Θ *

3.4. Forecast Design

To evaluate the model’s out-of-sample forecast performance, a rolling forecast procedure is adopted to predict monthly inflation of the CPI from January 2023 to January 2025. The forecast steps are described below.
(1)
Initial Input
The forecasting process begins with the most recent available data point—December 2022.
This observation, along with several previous lags, is used as input to initialize the model. Assuming that the final specification adopts a TARMA(1,1) structure, the forecast requires knowledge of both x 2022.12 and the corresponding model error term ε 2022.12 .
Since the error term is unobservable, it is common in rolling forecasts to either:
-
Set past error terms to zero (i.e. ε t = 0 ), or
-
Replace them with estimated residuals from the training sample.
In dynamic forecasting, the assumption ε t = 0 is often adopted for simplicity. This approximation does not substantially impact the forecast, as the error term is assumed to have zero mean and thus contributes no systematic bias to the prediction.
(2)
One-Step-Ahead Forecast
Based on the established model and the known regime classification—determined by the position of x 2022.12 relative to the estimated threshold γ —the one-step forecast is calculated as follows:
x ^ 2023.01 = ϕ 1 , 0 + ϕ 1 , 1 x 2022.12 + θ 1 , 1 ε ^ 2022.12 , if x 2022.12 γ ϕ 2 , 0 + ϕ 2 , 1 x 2022.12 + θ 2 , 1 ε ^ 2022.12 , if x 2022.12 > γ
Here, ε ^ 2022.12 can be set to 0 (i.e., forecasting through the conditional expectation of the model), so that x ^ 2023.01 is the predicted inflation rate for January 2023.
(3)
Multi-Step Rolling Multi-Step Forecasting
After obtaining the forecast for January 2023, treat it as known (in a fully dynamic forecasting setting, we do not, in fact, know the true value for January 2023 and thus must rely on its forecast). Then, using x ^ 2023.1 as new information, forecast February 2023:
x ^ 2023.2 2023.1 = ϕ ^ 1 , 0 + ϕ ^ 1 , 1 x ^ 2023.1 + θ ^ 1 , 1 ε ^ 2023.1 , if x ^ 2023.1 γ ^ ϕ ^ 2 , 0 + ϕ ^ 2 , 1 x ^ 2023.1 + θ ^ 2 , 1 ε ^ 2023.1 , if x ^ 2023.1 > γ ^
Likewise compute x ^ 2023.2 . Roll forward in this manner until the forecast for January 2025 x ^ 2025.1 , is obtained. This is the dynamic multi-step forecasting process, in which the input at each step is the forecast from the preceding step.

3.5. Performance Evaluation Metrics

To ensure a robust assessment of predictive performance that captures different aspects of the underlying data distribution, we employ a multi-metric evaluation framework. Although the Mean Absolute Percentage Error (MAPE) serves as our baseline measure of relative accuracy, we introduce three additional metrics: the root mean squared error (RMSE), the mean absolute error (MAE) and the direction accuracy (DA).
(1)
Radius Mean Squared Error (RMSE)
RMSE = 1 n t = 1 n y t y ^ t 2
RMSE is particularly valuable in inflation forecasting, as it squares the errors before averaging, thereby imposing a heavier penalty on large deviations. A lower RMSE indicates that the model is robust against significant outliers or sudden inflationary spikes, which are often the most damaging to economic stability.
(2)
Mean Absolute Error (MAE)
MAE = 1 n t = 1 n y t y ^ t
MAE provides a direct measure of the average magnitude of errors without the sensitivity to extreme outliers inherent in RMSE. It serves as a robust check on the model’s consistent performance.
(3)
Directional Predictive Ability (DA)
Beyond magnitude, correctly forecasting the direction of price changes (i.e., identifying turning points) is vital for policy timing. Directional Accuracy (DA) quantifies the proportion of times the model correctly predicts the sign of the change:
DA = 1 n t = 1 n I y t y t 1 y ^ t y t 1 > 0
where I ( · ) is the indicator function. A high DA score implies the model effectively captures the structural trends and turning points in the inflation series.
By reporting this triad of metrics (RMSE, MAE, DA), we capture the model’s relative accuracy, its handling of volatility, and its general consistency, ensuring a holistic evaluation of the TARMA framework against linear benchmarks.

4. Data and Empirical Analysis

4.1. Characteristics and Preprocessing of China’s CPI Data

This study uses China’s monthly Consumer Price Index (CPI. as the object of the analysis, examining the evolution of the trend of the CPI to produce short-term forecasts of China’s future CPI. The data set consists of continuous monthly observations from January 2004 to January 2025, sourced from the National Bureau of Statistics and Eastmoney (https://data.eastmoney.com/cjsj/cpi.html) (accessed on 5 November 2025) [23], yielding a total of 253 observations. We first use the initial 228 samples as the training set, then forecast China’s CPI for the remaining 25 months, and finally compare the model’s forecasts with the test set to compute forecast errors.
Based on the available data set, we plot the time series using MATLAB (https://www.mathworks.com/products/matlab.html). The time series graph reveals the following features of China’s CPI dynamics: the monthly year-over-year CPI increase averages roughly between 2% and 3%; the fluctuation range is approximately from −2% to 8%. This indicates that for most of the period since 2004, inflation has been moderate and controllable in China, with months of mild deflation and episodes of inflation as high as approximately 8%. The’ standard deviation of the series is about 2%, suggesting a certain amplitude of variation around the mean. To visualize the trajectory of the inflation rate of the CPI year-over-year more directly, we plot a line chart of the series since 2004, as shown in Figure 1. The figure shows pronounced phase-wise fluctuations over the sample period.
(1)
Peak phases: In 2007–2008, the increase in CPI year-over-year increased rapidly and exceeded 7%, driven by rapid economic growth and rising food prices; another inflation peak occurred in 2010–2011, with a maximum higher than 6%. These two episodes correspond to inflationary peaks within China’s business cycle.
(2)
Trough phases: Around 2015, the year-over-year increase in the CPI fell to almost 0% and even turned negative, indicating extremely low inflation or deflation; around 2020 (affected by the pandemic and economic downturn), the inflation rate again declined to nearly 0% or slightly negative. These stages reflect macroeconomic conditions that slowed or stalled price increases.
These observations suggest that the data-generating process of year-over-year CPI inflation may not be adequately captured by a simple single linear model. The dynamic characteristics may differ between high-inflation and low-inflation periods. For example, high-inflation phases often coincide with stronger policy interventions and shifting market expectations, potentially yielding greater volatility or a distinct autocorrelation structure; during low-inflation or deflationary periods, price responses to shocks may be more sluggish. Such conjectures provide an intuitive motivation for introducing nonlinear models (e.g., threshold models). Before formal modeling, we will further verify the presence of such nonlinear features through statistical tests and graphical analysis.
From Figure 1, it can be seen that the variable CPI in this analysis contains 228 observations, with a mean of 102.4262, a standard deviation of 1.9868, a maximum of 108.74 and a minimum of 98.19. The average level is approximately 102, indicating that during the training-sample period the overall price level was slightly above the base period (the benchmark price index is typically set to 100). The standard deviation of about 1.99 suggests that the monthly CPI fluctuated only slightly around the mean, implying the relative stability of the price level over the sample period. The maximum of 108.74 and the minimum of 98.19 indicate that the price levels experienced some degree of upward and downward movement during the sample period, but overall remained within a relatively narrow band around 100.
Next, we apply the augmented Dickey–Fuller (ADF) test to examine the stationarity of the CPI series and rigorously determine whether a unit root is present. The ADF test is a commonly used unit root test in which the null hypothesis is that the series has a unit root, while the alternative hypothesis is that the series is stationary. According to the test results, the ADF statistic for China’s CPI series is −4.6255, with a corresponding p-value less than 0.01, as shown in Table 2. Because the p-value is far below the 0.01 significance level, we can reject the null hypothesis of a unit root at the 99% confidence level. This implies that the CPI series does not contain a unit root—i.e., it exhibits mean reversion—and can be regarded as stationary overall. The critical values at various significance levels reported in Table 2 corroborate this conclusion: the ADF statistic of −4.6255 is well below the 1% critical value of −2.5755, indicating a high level of statistical significance for stationarity. From an economic perspective, this stationarity is consistent with the institutional environment in which China’s CPI is determined. Monetary and fiscal authorities operate under explicit price-stability objectives, which effectively anchor inflation expectations and keep the CPI series within a relatively narrow band around its long-run target. As a result, the CPI (or CPI inflation) fluctuates around a stable mean rather than exhibiting the type of unbounded trend that would typically characterize raw price levels.
After confirming stationarity, we further examine whether the CPI series is a purely random process, a white-noise series. If the series is white noise, its fluctuations are entirely random and lack exploitable correlation structure; time series models would then be unable to extract useful information from past data for forecasting. Conversely, a non–white-noise series implies the presence of correlation structures that can be captured and utilized by the model. To test the randomness of the CPI series, we conduct a pure-randomness test using the Ljung–Box (LB) statistic to assess whether autocorrelations at various lags are significant. Table 3 reports the LB statistics and corresponding p-values at lags 6, 12, and 18. The LB statistics at these three lags are as high as 819.73, 903.22, and 1051.80, respectively, with p-values all far below 0.0001. Such extremely small p-values indicate that we can reject the null hypothesis that “the series is white noise” with overwhelming statistical significance; in other words, at the examined lag orders, the CPI series exhibits significant autocorrelation and is not purely random.
In summary, the stationarity and pure-randomness tests jointly indicate that China’s CPI time series is stationary and non–white noise over the study period. This means the series satisfies the stationarity assumption and contains internal correlation structures that can be exploited, thereby meeting the basic conditions for constructing time series models. Accordingly, we can fit the series using a threshold ARMA (TARMA) model, laying the groundwork for characterizing CPI’s dynamic features and conducting subsequent forecasting analysis.

4.2. Identification of the TARMA Model

The TARMA (Threshold ARMA) model is a piecewise-linear extension of the traditional ARMA that introduces a threshold mechanism: when the threshold variable (often chosen as a lag of the dependent series itself) falls into different intervals, the model switches to the corresponding regime-specific ARMA submodel, thereby capturing heterogeneous dynamics under distinct volatility states. Intuitively, when the price level lies in “lower–moderate–higher” intervals, the persistence of shocks and the speed of mean reversion may differ; the threshold term enables the model to adaptively switch across these states. The autoregressive and moving-average lag orders of a TARMA model can be selected on the basis of the series’ autocorrelation and partial autocorrelation functions, together with information criteria such as AIC.
From Figure 2 and Figure 3, it can be seen that both the autocorrelation coefficients and the partial autocorrelation coefficients tail off, indicating that the model should be an ARMA ( p ,   q ) . Since the partial autocorrelations approach zero relatively quickly after the 4th lag, plausible values of p are 0, 1, 2, 3, 4. Accordingly, the four candidate families are ARMA ( 0 ,   q ) , ARMA ( 1 ,   q ) , ARMA ( 2 ,   q ) , ARMA ( 3 ,   q ) , and ARMA ( 4 ,   q ) . Let q take three values, 0, 1, 2. In total there are 14 ARMA ( p ,   q ) candidates: ARMA ( 0 ,   1 ) , ARMA ( 0 ,   2 ) , ARMA ( 1 ,   0 ) , ARMA ( 1 ,   1 ) , ARMA ( 1 ,   2 ) , ARMA ( 2 ,   0 ) , ARMA ( 2 ,   1 ) , ARMA ( 2 ,   2 ) , ARMA ( 3 ,   0 ) , ARMA ( 3 ,   1 ) , ARMA ( 3 ,   2 ) , ARMA ( 4 ,   0 ) , ARMA ( 4 ,   1 ) , and ARMA ( 4 ,   2 ) .
With respect to the number of thresholds, we examine two specifications: “one threshold (two intervals)” and “two thresholds (three intervals).” Based on the AIC information criterion, the delay is d = 1 ; when the number of thresholds is one, the threshold value is r = 101.41 ; when the number of thresholds is two, the threshold values are r 1 = 102.4 ,   r 2 = 102.81 . The threshold locations are obtained via an information-criterion search: when there is one threshold, the value is 101.41; when there are two thresholds, the corresponding values are 102.4 and 102.81 (see Table 4 and Table 5). The table also reports SSR/AIC indicators associated with the threshold search, facilitating a trade-off between threshold placement and goodness of fit.
Given the above settings, if each interval selects the best structure from the 14 ARMA forms described above, the “one-threshold” case produces candidates of TARMA of two-intervals and the “two-threshold” case produces candidates of TARMA of three-intervals. In total, 28 possible TARMA models are formed. We then employ information criteria to screen these candidates to avoid overfitting and improve out-of-sample predictive performance.

4.3. Model Selection via Information Criteria

After determining the orders and threshold specifications to produce the candidate set, this study uses the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) for model selection. Both balance the goodness of fit against the complexity of the model: AIC tends to favor a better fit, with relatively weaker penalization, while BIC imposes a stronger penalty and thus typically prefers more parsimonious structures [24]. We compare the AIC/BIC values in the 14 ARMA ( p ,   q ) models; the results are presented in Table 6.
Based on the table above, ordering the models by AIC from smallest to largest—and following the principle that a smaller AIC is better—the top five candidates are, in sequence, ARMA(4, 2), ARMA(2, 2), ARMA(3, 1), ARMA(2, 1) and ARMA(4, 1). In conjunction with the ordering by BIC from smallest to largest, the top three models by BIC also rank within the top five. Therefore, we select ARMA(4, 2), ARMA(2, 2) and ARMA(3, 1) as the final candidate models. Consequently, combined with the two threshold specifications discussed earlier (two intervals and three intervals), we obtain a core candidate set of six TARMA models:
-
Within two intervals: ARMA(4, 2), ARMA(2, 2), ARMA(3, 1);
-
In three intervals: ARMA(4, 2), ARMA(2, 2), ARMA(3, 1).
This outcome has two implications. First, the consistency between AIC and BIC in the top three enhances the robustness of model selection. Second, the parallel inclusion of these three structures across different interval partitions ensures that both high- and low price states can be captured by relatively compact models, thereby providing a unified framework for subsequent parameter estimation and combination forecasting. Next, we will estimate the parameters of these six TARMA models and perform residual diagnostics to assess their validity and usability.

4.4. Parameter Estimation and Diagnostic Validation

After finalizing the candidate structures and threshold settings, we estimate the parameters for the six core TARMA models (ARMA(4, 2), ARMA(2, 2), ARMA(3, 1) under both two-interval and three-interval specifications) and perform residual diagnostics to evaluate model validity. Parameter estimates and standard errors are reported in Table 7 and Table 8. For clarity, we summarize the main statistical characteristics and meanings separately for the cases of “one threshold ” and “two thresholds.”

4.4.1. Single Threshold (Two Regimes)

Instead of reiterating the values of the individual coefficients, we focus on the structural divergence revealed by the data. The results indicate a clear bifurcation in thresholdr = 102.39. In the lower regime (Regime 1), the process is characterized by high persistence, evidenced by the dominance of autoregressive terms. Conversely, the upper regime (Regime 2) exhibits a distinct shift in dynamics, where the emergence of negative moving average coefficients suggests a stronger mean-reverting force. This structural break confirms that a linear specification would fail to capture the asymmetric nature of inflation adjustments.

4.4.2. Double Thresholds (Three Regimes)

When the thresholds are 102.4 and 102.81 (see Table 5, Table 6 and Table 7), the series is divided into three segments: “low (≤102.4)–middle ( 102.4 < · 102.81 )–high (>102.81).” The observations are as follows.
(1)
Low interval (≤102.4): Taking ARMA(4,2) as an example, A R ( 3 ) 0.7742 (0.0603), M A ( 1 ) 0.9396 (0.0418), M A ( 2 ) 1.0000 (0.0537), indicating an extremely strong instantaneous response and correction to short-term shocks; the positive effect of the AR term combined with strong MA terms makes the series more agile in “absorbing shocks and reverting to equilibrium.”
(2)
Middle interval ( 102.4 < · 102.81 ): Several coefficients in models such as ARMA(2, 2) exhibit large standard errors (e.g., the standard error of A R ( 1 ) is approximately 4.3489; that of M A ( 1 ) is approximately 4.4586), implying relatively limited sample size or information in this interval, greater uncertainty of parameters, and correspondingly weaker robustness. Such “an "unstable estimation” is not uncommon in threshold modeling and is often associated with sparse data within certain segments after threshold partitioning.
(3)
High interval (>102.81): AR coefficients show an alternating pattern of “strong positive–strong negative–strong positive–moderate negative” (e.g., in ARMA(4, 2), A R ( 1 ) 1.4468 , A R ( 2 ) 1.2014 , A R ( 3 ) 0.9585 , A R ( 4 ) 0.4004 ), while M A ( 2 ) 0.8071 . This points to the coexistence, at high price levels, of “high persistence + pronounced correction”: shocks display strong carryover, yet the system tends to return to equilibrium relatively quickly via countervailing forces.
As observed in Table 4, Table 5, Table 6 and Table 7, the parameters in the middle regime ( 102.4 < C P I 102.81 ) exhibit significantly larger standard errors compared to the other regimes (e.g., the S.E. for AR coefficients exceeds 4.0). We acknowledge that this instability arises from the narrow width of this interval (only 0.41 index points), which captures a limited number of data points.
From an econometric perspective, this regime should be interpreted not as a stable equilibrium, but as a transitory state—a rapid ‘pass-through’ phase where inflation shifts from moderate to high levels. While the small sample size violates the ideal conditions of the Central Limit Theorem and leads to parameter uncertainty (over-parameterization risk), we deliberately retain this model in the candidate pool for the subsequent Model Averaging stage. The rationale is twofold:
Capturing Nonlinear Transitions: This specific structure identifies critical turning points that smoother models might miss.
Ensemble Robustness: By using Information Criterion-Guided Integration (specifically S-BIC weighting), the final forecast does not rely solely on this unstable model. Instead, the weighting scheme automatically moderates the influence of the three-regime model based on its Bayesian penalty, allowing the system to benefit from its signal on turning points while relying on the more stable two-regime models for the baseline trend. Thus, the instability of the middle regime is mitigated through the ensemble process [25].

4.4.3. Residual Diagnostics and White-Noise Tests

For the six models above, we conduct pure-randomness (white-noise) tests on the residuals within each interval. Following the paper’s setup, we focus on the Ljung–Box statistics at lags 6 and 12 (see Table 9). The results show that, at lag 6, the p-values across models and intervals all exceed 0.05, satisfying the white-noise assumption and indicating that the principal correlation information has been adequately extracted by the models. At lag 12, the statistics and p-values in a few intervals approach conventional significance thresholds (for example, in Interval 2, several combinations yield p-values on the order of 0.04), suggesting the possibility of weak correlation at longer lags. Nevertheless, given the diagnostic criteria adopted here and the subsequent forecasting objectives, the conclusions at lag 6 support employing these six models for further combination forecasting.

4.4.4. Synthesis of Regime-Dependent Dynamics

To synthesize the economic implications of the estimated parameters, we observe that China’s CPI exhibits distinct behavioral ‘personalities’ across the identified regimes:
The ‘Inertia’ Phase (Low-Inflation Regime): Below the critical threshold, inflation behaves sluggishly. The statistical significance of AR terms without counterbalancing MA shocks suggests a market environment where price expectations are sticky, likely driven by menu costs that discourage frequent repricing.
The ‘Correction’ Phase (High-Inflation Regime): Once the threshold is breached, the dynamics shift from inertia to active correction. The statistical evidence—specifically the negative MA coefficients—points to a system under stress that is being actively managed. This aligns with the ‘visible hand’ of policy intervention, where administrative controls are triggered to suppress volatility and force prices back to equilibrium.
In summary, the TARMA model successfully decodes the duality of the market: it is momentum-driven during calm periods but mean-reverting during turbulent episodes.

5. Forecast Combination Methods

Based on the foregoing analysis, the models TARMA(2; 4, 4; 2, 2), TARMA(2; 2, 2; 2, 2), TARMA(2; 3, 3; 1, 1), TARMA(3; 4, 4, 4; 2, 2, 2), TARMA(3; 2, 2, 2; 2, 2, 2), and TARMA(3; 3, 3, 3; 1, 1, 1) are appropriate; they are denoted, respectively, as model M 1 ,   M 2 ,   M 3 ,   M 4 ,   M 5 ,   M 6 , and each can be used to forecast CPI, with their forecast values denoted by y ^ 1 , t + 1 ,   y ^ 2 , t + 1 ,   y ^ 3 , t + 1 ,   y ^ 4 , t + 1 ,   y ^ 5 , t + 1 ,   y ^ 6 , t + 1 . To improve the precision of the estimation results, this study adopts a combination forecasting approach to enhance predictive precision. The principle is to determine the corresponding weights via simple weighted averaging, S-AIC, S-BIC, and MMA, to forecast the 25 monthly observations of the Chinese CPI from January 2021 to January 2023, and compare them with the actual values. First, the six TARMA models are taken as the optimal model set; then, a combination model is formed by taking a weighted average over the models in this optimal set, with weights determined by four methods: simple weighted average, S-AIC, S-BIC, and MMA. Finally, forecasts are produced for China’s CPI for the 25 values from January 2023 to January 2025. Let w i be nonnegative weights, i = 1 6 w i = 1 ; the combined forecast of the combination-forecasting model is then:
Y = w 1 y ^ 1 , t + 1 + w 2 y ^ 2 , t + 1 + w 3 y ^ 3 , t + 1 + w 4 y ^ 4 , t + 1 + w 5 y ^ 5 , t + 1 + w 6 y ^ 6 , t + 1 , w i 0 , i = 1 , 2 , 3 , 4 , 5 , 6

5.1. Weighting Schemes for Forecast Integration

(1)
Simple Weighted Average
Since the optimal model set contains six models, the simple weighted average described above implies that the weights on the forecasts of model M 1 ,   M 2 ,   M 3 ,   M 4 ,   M 5 ,   M 6 are all 1 / 6 , i.e., w i = 1 / 6 , i = 1 , 2 , 3 , 4 , 5 , 6 . Hence, the forecast of the combination model M S is:
Y S = 1 6 y ^ 1 , t + 1 + 1 6 y ^ 2 , t + 1 + 1 6 y ^ 3 , t + 1 + 1 6 y ^ 4 , t + 1 + 1 6 y ^ 5 , t + 1 + 1 6 y ^ 6 , t + 1
(2)
S-AIC Method
w m = e x p I m / 2 i = 1 M e x p I i / 2
Here, I m denotes the AIC criterion.
From the above analysis, the AIC values for the six models are: 396.76, 401.65, 401.69, 396.76, 401.65, 401.69. Therefore, the weights on the forecast of the model M 1 , M 2 , M 3 , M 4 , M 5 , M 6 are:
w 1 = 0.427 , w 2 = 0.037 , w 3 = 0.036 , w 4 = 0.427 , w 5 = 0.037 , w 6 = 0.036
Hence, the forecast of the combination model M S A I C is:
Y S A I C = 0.427 y ^ 1 , t + 1 + 0.037 y ^ 2 , t + 1 + 0.036 y ^ 3 , t + 1 + 0.427 y ^ 4 , t + 1 + 0.037 y ^ 5 , t + 1 + 0.036 y ^ 6 , t + 1
(3)
S-BIC Method
w m = e x p I m / 2 i = 1 M e x p I i / 2
Here, I m denotes the BIC criterion.
From the above analysis, the BIC values for the six models are: 424.20, 422.22, 422.27, 424.20, 422.22, 422.27. Therefore, the weights on the forecast of the model M 1 , M 2 , M 3 , M 4 , M 5 , M 6 are:
w 1 = 0.079 , w 2 = 0.213 , w 3 = 0.208 , w 4 = 0.079 , w 5 = 0.213 , w 6 = 0.208
Hence, the forecast of the combination model M S B I C is:
Y S B I C = 0.079 y ^ 1 , t + 1 + 0.213 y ^ 2 , t + 1 + 0.208 y ^ 3 , t + 1 + 0.079 y ^ 4 , t + 1 + 0.213 y ^ 5 , t + 1 + 0.208 y ^ 6 , t + 1
(4)
Weight Selection Based on the Mallows Criterion
According to the MMA method described in the weight-selection section above, the weights in the model forecast M 1 ,   M 2 ,   M 3 ,   M 4 ,   M 5 ,   M 6 are:
w 1 = 0 , w 2 = 0.503 , w 3 = 0 , w 4 = 0 , w 5 = 0.292 , w 6 = 0.205
Hence the forecast of the combination model M M M A is:
Y M M A = 0.503 y ^ 2 , t + 1 + 0.292 y ^ 5 , t + 1 + 0.205 y ^ 6 , t + 1

5.2. Out-of-Sample Forecasting and Comparative Evaluation

To facilitate a comprehensive visual assessment of predictive performance, Figure 4 presents the forecasting results in two distinct panels. Figure 4a displays the fitted values against the actual CPI over the entire historical sample (January 2004–January 2025), confirming the TARMA model’s capability to capture long-term cyclical fluctuations and structural breaks. Figure 4b offers a magnified view specifically focusing on the out-of-sample forecasting window. This zoomed-in perspective clearly delineates the trajectories of the individual candidate models (M1–M6) relative to the integration strategies (S-AIC, S-BIC, MMA), visually corroborating the superior tracking accuracy of the ensemble approach compared to individual specifications.
As shown in Table 10, three stable conclusions follow:
(i)
Combination forecasts generally outperform single models: in most cases, the MSE and MAPE of the combination schemes are lower, confirming the advantage of “model diversification-error dispersion.”
(ii)
S-BIC weighting performs best: among the four combination weightings, S-BIC yields the smallest errors.
(iii)
MMA and simple averaging rank next, while S-AIC is relatively weaker: differences remain across weighting strategies; the different strengths of the information-criterion penalties on model complexity affect how weights are allocated between complex and parsimonious models.
While metrics like the Diebold-Mariano (DM) test are standard for pairwise comparisons, our study emphasizes the structural stability of the improvement across changing time windows. As evidenced in Table 11 (Rolling-Forecasts), the S-BIC integration strategy consistently yields the lowest or near-lowest error metrics (MSE/MAPE) across three distinct training sample sizes ( T = 175 , 190 , 205 ). This consistency across different temporal slices functions effectively as a robustness check, analogous to the Model Confidence Set approach, confirming that the TARMA model’s superiority is not an artifact of a specific sample selection but a result of its ability to capture genuine nonlinear regime shifts in the inflation process.

5.3. Robustness Assessment Through Rolling Forecasts

To further verify robustness, we conduct rolling forecasts with training-sample sizes T = 175 , 190 , 205 . Using an approach that “continuously advances the training window and recursively forecasts the remaining samples,” we evaluate the generalization ability of the different weighting schemes under varying amounts of training information (see Table 11).
Overall observations are as follows:
-
As the training sample increases from T = 175 to T = 205 , the error metrics for most combination schemes improve, indicating that more historical information generally helps enhance forecast accuracy;
-
Under all three T settings, the S-BIC-weighted combination consistently attains the minimum or near-minimum MAE/MAPE/MSE, showing the strongest consistency;
-
Consistent with the fixed-window out-of-sample comparison, the conclusion that combinations outperform single models also holds in the rolling setting, indicating cross-window robustness.
Based on the above evidence, the subsequent application adopts S-BIC weighting as the optimal combination strategy.

6. Application of the Optimal Forecast Combination

Given S-BIC’s stable advantage under both fixed and rolling windows, we proceed to an applied forecast on a more complete dataset: using 253 monthly CPI observations from January 2004 to January 2025 and following the same pipeline as before (feature analysis → model identification → information-criterion screening → parameter estimation and diagnostics → weight determination), we produce combination forecasts under S-BIC weighting.

6.1. Data Properties and Preliminary Tests

Table 12 reports an ADF test statistic of 4.6255 with p < 0.01 , again rejecting the unit-root null at the 1% significance level and confirming stationarity. In Table 13, the pure-randomness (white-noise) test yields Ljung–Box statistics of 924.58 at lag 6 and 1024.50 at lag 12 (both p < 0.0001 ), clearly indicating that the series is not white noise. Together with the time series plot and the autocorrelation/partial-autocorrelation functions (see Figure 5 and Figure 6), these results support continued use of ARMA-type models as the within-regime substructures.

6.2. Model Identification and Threshold Specification

Based on the tailing-off patterns in Figure 5 and Figure 6, we continue to use ARMA ( p ,   q ) as the foundation, p { 0 ,   1 ,   2 ,   3 ,   4 } , q { 0 ,   1 ,   2 } , yielding 14 candidates. The threshold delay is set to d = 1 . In the threshold search, when the number of thresholds is one, the threshold value is 101.41; when the number of thresholds is two, the threshold values are 100.56 and 101.41. Accordingly, after comprehensive screening by parameter significance and information criteria, six models are ultimately retained as estimates of the “true model”: TARMA(2; 4, 4; 2, 2), TARMA(2; 3, 3; 1, 1), TARMA(2; 2, 2; 1, 1), TARMA(3; 4, 4, 4; 2, 2, 2), TARMA(3; 3, 3, 3; 1, 1, 1), and TARMA(3; 2, 2, 2; 1, 1, 1). In the combination stage, we continue to adopt the S-BIC weighting method for combination forecasting. The predicted value of the combined forecasting model is:
Y = w 1 y ^ 1 , t + 1 + w 2 y ^ 2 , t + 1 + w 3 y ^ 3 , t + 1 + w 4 y ^ 4 , t + 1 + w 5 y ^ 5 , t + 1 + w 6 y ^ 6 , t + 1 , w i 0 , i = 1 , 2 , 3 , 4 , 5 , 6
The weights of model M 1 ,   M 2 ,   M 3 ,   M 4 ,   M 5 ,   M 6 are, respectively:
w 1 = 0.034 , w 2 = 0.057 , w 3 = 0.409 , w 4 = 0.034 , w 5 = 0.057 , w 6 = 0.409
Accordingly, the predicted value of the S-BIC combination model M S B I C is:
Y S B I C = 0.034 y ^ 1 , t + 1 + 0.057 y ^ 2 , t + 1 + 0.409 y ^ 3 , t + 1 + 0.034 y ^ 4 , t + 1 + 0.057 y ^ 5 , t + 1 + 0.409 y ^ 6 , t + 1

6.3. Forecasting Outcomes and Interpretation

Under the above settings, the combination model is used to forecast 24 months of monthly CPI; the results are shown in Table 14. The table reports month-by-month forecasts for example: February 2023—102.2802, February 2024—102.6504; March 2023—102.2691, March 2024—102.7241; …; December 2024—103.1124, January 2025—103.1245. Together with Figure 7, it can be seen that the forecast trajectory to the right of the red dashed line, corresponding to the future interval, exhibits an overall pattern of stability with a mild upward drift; during certain months there are phase-specific features of “rise first, then moderation,” but the broader profile is one of gentle ascent. This implies that, under the historical–structural relationships captured by the model, the medium-term path of the price level is closer to a “gradual repair and mild uplift” [26], offering some reference value for macro policy formulation and market expectation management.

6.4. Theoretical Extension: A Bridge to Probabilistic Forecasting

Although this study focuses on minimizing point-forecast errors (MAPE/MSE), the proposed Adaptive TARMA framework naturally lends itself to uncertainty quantification. To bridge the gap between our deterministic results and the probabilistic frontier, we propose a Regime-Dependent Residual Bootstrap procedure that can be directly implemented to generate prediction intervals without imposing strict normality assumptions [27].
The procedure is defined as follows: Let ε ^ t denote the residuals from the fitted TARMA model. Instead of global resampling, residuals should be resampled conditional on the identified regime to preserve the heteroscedasticity inherent in inflation clusters. For a h-step-ahead forecast, B bootstrap paths can be generated iteratively:
y T + h ( b ) = ϕ ^ 0 , k + i = 1 p k ϕ ^ k , i y T + h i ( b ) + j = 1 q k θ ^ k , j ε T + h j ( b ) + ε T + h * ( b )
where k is the regime indicator determined by the threshold variable y T + h d ( b ) and ε * is drawn from the regime-specific empirical residual distribution. By computing the quantiles (e.g., 2.5% and 97.5%) of the distribution of these B simulated paths, researchers can construct robust Confidence Intervals (CIs) that account for both parameter uncertainty and regime-switching stochasticity. This theoretical framework demonstrates that the TARMA model is not limited to deterministic output but serves as a foundational structure for advanced probabilistic risk assessment.

7. Conclusions and Outlook

7.1. Main Findings: Threshold Effects and Forecasting Performance

This study provides robust evidence of threshold effects in China’s CPI inflation and demonstrates the effectiveness of the TARMA model in capturing these nonlinear dynamics. The results reveal distinct regime-specific inflation patterns. When inflation exceeds 3%, price changes exhibit heightened sensitivity to past shocks, suggesting stronger inflation expectations and the risk of a wage–price spiral. In contrast, under low-inflation conditions, CPI behavior is characterized by stronger mean reversion and cyclical stability. By incorporating these regime-specific features, the TARMA model offers superior model fit and improved forecast accuracy. The model’s out-of-sample MAPE remains below 3%, outperforming the ARIMA model’s approximate 5% error rate. Moreover, applying the Schwarz Bayesian Information Criterion (S-BIC) to weight multiple TARMA specifications enhances predictive performance further. This ensemble approach yields the lowest forecast errors (both MSE and MAPE) across various evaluation metrics and maintains this advantage in rolling forecast tests (see Table 14). Overall, the TARMA model effectively captures nonlinear regime shifts, providing a more reliable tool for short-term CPI forecasting.

7.2. Limitations and Robustness Considerations

Despite the promising forecasting performance, this study is subject to certain limitations that warrant transparency. First, regarding data sparsity, the identification of multiple regimes relies heavily on sample size. As observed in the three-regime specification, narrow intervals (e.g., transitional periods) may contain sparse observations, leading to higher parameter uncertainty (standard errors). While we mitigated this via model averaging, researchers should exercise caution when applying this framework to extremely short time series. Second, the computational stochasticity of the evolutionary algorithm introduces a layer of variability. Although effective for global search, the convergence to an optimal structure is probabilistic; strictly enforcing random seed fixation is essential for replicability. Finally, while the TARMA model effectively captures endogenous regime shifts (e.g., price cycles), it assumes the threshold mechanism itself remains stable. It may face challenges in the event of exogenous structural breaks—such as a fundamental overhaul of the monetary policy framework—which could alter the latent threshold values themselves.

7.3. Structural Interpretation: A Quantitative Window into Macro-Control

Beyond statistical accuracy, the structural parameters identified by the TARMA model offer a novel quantitative window into the efficacy of China’s macroeconomic management. A critical finding of this study is the emergence of significant negative moving average (MA) coefficients (e.g., θ < 0 ) specifically within the high-inflation regime. Econometrically, this parameter configuration implies that an inflationary shock in the current period is immediately followed by a strong mean-reverting correction in the subsequent period. We interpret this mathematical phenomenon as the ‘statistical footprint’ of government intervention. Unlike standard monetary policies (such as interest rate hikes) that impact prices with considerable lags, China’s unique ‘dual-track’ regulation allows for the rapid deployment of administrative measures—such as price caps and the release of strategic pork and grain reserves—to curb runaway prices. Our model effectively captures these ‘visible hand’ interventions as a counterbalancing MA component, providing empirical evidence that China’s targeted macro-control mechanisms are successful in forcing rapid downward corrections during periods of high volatility.

7.4. Contributions Relative to Existing Approaches

Compared to existing approaches, this study offers several methodological and empirical contributions. While traditional ARIMA models cannot handle regime-dependent inflation behavior, TARMA effectively addresses this limitation. By incorporating moving-average terms, it extends classical TAR models to better capture short-term shock dynamics. Unlike many black-box machine learning techniques, TARMA offers a transparent, parameterized structure with interpretable coefficients for each regime, while maintaining strong predictive accuracy. Additionally, model averaging across TARMA variants enhances both forecast stability and accuracy by leveraging model diversity. Nevertheless, several limitations warrant attention. First, threshold models require sufficient observations within each regime; sparse data can weaken parameter estimation. Second, estimating multiple thresholds and associated parameters can be computationally intensive and sensitive to initial values, leading to volatility in results. Third, the current model is univariate, relying solely on CPI data and excluding external drivers such as commodity prices, monetary policy, or global shocks. These limitations suggest caution in interpretation and highlight directions for future enhancement.

7.5. Outlook: Research Frontiers for Model Extension

Building on the limitations identified above, future research should prioritize two strategic directions to align with the contemporary forecasting frontier:
1.
Probabilistic and Density Forecasting: While this study established the theoretical mechanism for bootstrap-based intervals, future work should empirically implement full probabilistic forecasting. This involves systematically comparing TARMA-generated density forecasts against those from Bayesian VARs and Deep Probabilistic models (e.g., DeepAR) using metrics such as the Continuous Ranked Probability Score (CRPS) to rigorously evaluate the model’s ability to quantify tail risks.
2.
Probabilistic and Generative Benchmarking: While this study focuses on interpretable point forecasting, the field is rapidly evolving towards probabilistic frameworks. Future research should systematically compare TARMA-generated forecasts against those from advanced deep generative models, such as DeepAR, Time Series GANs, and Diffusion models. Such a comparison—potentially utilizing higher-frequency daily or weekly price data to overcome sample size limitations—would rigorously evaluate the trade-off between the structural interpretability of threshold models and the distributional expressiveness of generative AI.
3.
Spatio-Temporal Modeling: Recognizing that inflation is spatially heterogeneous, future studies should extend the univariate analysis to a Spatio-Temporal TARMA framework. By utilizing regional CPI panel data, researchers can model the spatial spillover effects of price shocks across provinces, thereby addressing the limitation of focusing solely on the national aggregate index.
4.
Integration with High-Dimensional Covariates: Moving beyond the univariate setting, integrating the threshold mechanism with high-dimensional exogenous variables (e.g., producer prices, monetary aggregates) via variable selection techniques (such as LASSO) represents a promising avenue for enhancing predictive power in a data-rich environment.

7.6. Policy Implications: Threshold-Based Early Warning and State-Contingent Intervention

Building on these structural insights, the TARMA framework offers a practical ‘early warning system’ for monetary authorities. The identified threshold value ( r 102.4 ) acts as a critical empirical tipping point rather than a mere statistical parameter. For policymakers, monitoring the proximity of current inflation to this threshold allows for the calibration of intervention strategies:
(1)
In the ‘Inertial Regime’ (Below Threshold): Policy should focus on anchoring consumer expectations through transparent communication, as the system is driven by momentum.
(2)
In the ‘Risk Regime’ (Near or Above Threshold): The model signals an imminent shift to high volatility. At this stage, preemptive administrative measures are justified to prevent the formation of a high-inflation spiral.
By distinguishing between these states, the model supports a transition from a static policy approach to a dynamic, state-contingent strategy, thereby enhancing the timing and efficiency of macro-control measures.

7.7. Longer-Term Extensions: Methodological, Empirical, and Theoretical Advances

Looking ahead, we identify three strategic directions to extend the current framework, categorized into methodological, empirical, and theoretical advancements:
(1)
Methodological Advancements: Future work could explore Hybrid AI-Econometric Systems. By integrating the structural interpretability of TARMA with the feature extraction capabilities of Deep Learning (e.g., LSTM or Transformer networks), researchers could develop ’Grey-box’ models that offer both high-dimensional data processing and regime-dependent explainability.
(2)
Empirical Applications: The adaptive threshold mechanism holds promise beyond univariate CPI forecasting. A natural extension is the development of Multivariate TARMA (TARMA-X) models to investigate spillover effects between upstream costs (PPI) and downstream consumer prices, or to analyze volatility regimes in financial markets where behavioral tipping points are prevalent.
(3)
Theoretical Advancements: Finally, bridging the gap between statistical forecasting and structural macroeconomics remains a priority. Future research should aim to embed the empirically determined thresholds into Dynamic Stochastic General Equilibrium (DSGE) frameworks. This would allow for a rigorous general equilibrium analysis of how ’menu costs’ and policy constraints interact to generate the nonlinear dynamics observed in this study.

Author Contributions

Conceptualization, Y.Z.; Methodology, D.C.; Software, D.C.; Validation, D.C.; Formal analysis, X.X.; Investigation, D.C.; Writing—original draft, D.C.; Visualization, D.C.; Supervision, Y.Z.; Project administration, X.X.; Funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the funds as follows: Open Project of the Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of the Ministry of Education, Research on Low-Resource Minority Language Pronunciation Dictionary Construction Methods Based on Multilingual Knowledge Sharing and Transfer Learning Technology (ORP-2025044); Tianjin University Independent Innovation Fund, Project Title: Dynamic Coding of Cross-Linguistic Phonemic Features and Its Application in Intelligent Speech Processing, Grant Number: 2025XSC-0080; Tianjin University 2025 Special Project for Improving Graduate Education Management Capabilities, Project Title: Exploration of Industry–Education Integration Talent Training Model for Language Information Processing Oriented to Tianjin’s Industrial Innovation, Grant Number: X-NLTS-Y-2025-05.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Xu, X.; Li, S.; Liu, W. Forecasting China’s inflation rate: Evidence from machine learning methods. Int. Rev. Financ. 2025, 25, e70000. [Google Scholar] [CrossRef]
  2. Pfäuti, O. The inflation attention threshold and inflation surges. arXiv 2023, arXiv:2308.09480. [Google Scholar] [CrossRef]
  3. Fama, E.F. The behavior of stock-market prices. J. Bus. 1965, 38, 34–105. [Google Scholar] [CrossRef]
  4. Chen, C.; Sun, Y.; Rao, Y. Threshold MIDAS Forecasting of Canadian Inflation Rate. J. Forecast. 2025. [Google Scholar] [CrossRef]
  5. Goracci, G.; Ferrari, D.; Giannerini, S.; Ravazzolo, F. Robust estimation for threshold autoregressive moving-average models. J. Bus. Econ. Stat. 2025, 43, 579–591. [Google Scholar] [CrossRef]
  6. Huang, N.; Qi, Y.; Xia, J. China’s inflation forecasting in a data-rich environment: Based on machine learning algorithms. Appl. Econ. 2025, 57, 1995–2020. [Google Scholar] [CrossRef]
  7. Wenyong, D. The Methods and Technologies of Time Series Auto-Modeling Based on Evolutionary Computation. Ph.D. Thesis, Wuhan University, Wuhan, China, 2002. [Google Scholar]
  8. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  9. Clements, M.P.; Hendry, D.F. Forecasting Economic Time Series; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  10. Álvarez Díaz, M.; Gupta, R. Forecasting US consumer price index: Does nonlinearity matter? Appl. Econ. 2016, 48, 4462–4475. [Google Scholar] [CrossRef]
  11. Feng, H. Analysis and Forecast of CPI in China Based on LSTM and VAR Model. In Proceedings of the Internet Finance and Digital Economy: Advances in Digital Economy and Data Analysis Technology: The 2nd International Conference on Internet Finance and Digital Economy, Kuala Lumpur, Malaysia, 20–22 December 2024; pp. 339–357. [Google Scholar]
  12. Naghi, A.A.; O’Neill, E.; Danielova Zaharieva, M. The benefits of forecasting inflation with machine learning: New evidence. J. Appl. Econom. 2024, 39, 1321–1331. [Google Scholar] [CrossRef]
  13. Medeiros, M.C.; Vasconcelos, G.F.R.; Veiga, A.; Zilberman, E. Forecasting inflation in a data-rich environment: The benefits of machine learning methods. J. Bus. Econ. Stat. 2021, 39, 98–119. [Google Scholar] [CrossRef]
  14. Tong, H.; Lim, K.S. Threshold autoregression, limit cycles and cyclical data. J. R. Stat. Soc. Ser. B (Methodol.) 1980, 42, 245–268. [Google Scholar] [CrossRef]
  15. Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and the business cycle. Econom. J. Econom. Soc. 1989, 357–384. [Google Scholar] [CrossRef]
  16. Aidoo, E. Forecast Performance Between Sarima and Setar Models: An Application to Ghana Inflation Rate. Master’s Thesis, Uppsala University, Uppsala, Sweden, January 2011. [Google Scholar]
  17. Benjamin, M.A.; Rigby, R.A.; Stasinopoulos, D.M. Generalized autoregressive moving average models. J. Am. Stat. Assoc. 2003, 98, 214–223. [Google Scholar] [CrossRef]
  18. Giannerini, S.; Goracci, G.; Rahbek, A. The validity of bootstrap testing for threshold autoregression. J. Econom. 2024, 239, 105379. [Google Scholar] [CrossRef]
  19. Angelini, F.; Castellani, M.; Giannerini, S.; Goracci, G. Testing for Threshold Effects in the Presence of Heteroskedasticity and Measurement Error with an Application to Italian Strikes. Oxf. Bull. Econ. Stat. 2025, 87, 659–689. [Google Scholar] [CrossRef]
  20. Fan, Z.; Hu, Y.; Zhang, P. Measuring China’s core inflation for forecasting purposes: Taking persistence as weight. Empir. Econ. 2022, 63, 93–111. [Google Scholar] [CrossRef]
  21. Sengupta, S.; Chakraborty, T.; Singh, S.K. Forecasting CPI inflation under economic policy and geopolitical uncertainties. Int. J. Forecast. 2025, 41, 953–981. [Google Scholar] [CrossRef]
  22. Spelta, A. Density-based machine learning model averaging for inflation forecasting. J. R. Stat. Soc. Ser. C Appl. Stat. 2025, qlaf048. [Google Scholar] [CrossRef]
  23. Malladi, R.K. Benchmark analysis of machine learning methods to forecast the US Annual inflation rate during a high-decile inflation period. Comput. Econ. 2024, 64, 335–375. [Google Scholar] [CrossRef]
  24. Liu, Y.; Pan, R.; Xu, R. Mending the crystal ball: Enhanced inflation forecasts with machine learning. IMF Work. Pap. 2024, 2024, 23. [Google Scholar] [CrossRef]
  25. Pfarrhofer, M. Modeling tail risks of inflation using unobserved component quantile regressions. J. Econ. Dyn. Control 2022, 143, 104493. [Google Scholar] [CrossRef]
  26. Hauzenberger, N.; Huber, F.; Klieber, K. Real-time inflation forecasting using non-linear dimension reduction techniques. Int. J. Forecast. 2023, 39, 901–921. [Google Scholar] [CrossRef]
  27. Das, P.K.; Das, P.K. Forecasting and analyzing predictors of inflation rate: Using machine learning approach. J. Quant. Econ. 2024, 22, 493–517. [Google Scholar] [CrossRef]
Figure 1. CPI Time Series Plot.
Figure 1. CPI Time Series Plot.
Bdcc 10 00014 g001
Figure 2. Autocorrelation Coefficient Plot.
Figure 2. Autocorrelation Coefficient Plot.
Bdcc 10 00014 g002
Figure 3. Partial Autocorrelation Coefficient Plot.
Figure 3. Partial Autocorrelation Coefficient Plot.
Bdcc 10 00014 g003
Figure 4. (a) Full historical trajectory (2004–2025); (b) Out-of-sample forecast comparison (2023–2025).
Figure 4. (a) Full historical trajectory (2004–2025); (b) Out-of-sample forecast comparison (2023–2025).
Bdcc 10 00014 g004
Figure 5. Autocorrelation Coefficient Plot.
Figure 5. Autocorrelation Coefficient Plot.
Bdcc 10 00014 g005
Figure 6. Partial Autocorrelation Coefficient Plot.
Figure 6. Partial Autocorrelation Coefficient Plot.
Bdcc 10 00014 g006
Figure 7. Time Series Forecast Plot.
Figure 7. Time Series Forecast Plot.
Bdcc 10 00014 g007
Table 1. Hyperparameter Settings for the Evolutionary Algorithm.
Table 1. Hyperparameter Settings for the Evolutionary Algorithm.
ParameterValueDescription
Population Size50Number of candidate model structures (individuals) in each generation.
Max Generations100Maximum number of iterations allowed for the evolutionary process.
Crossover Probability ( p c )0.80Probability of performing crossover between two parent chromosomes.
Mutation Probability ( p m )0.10Probability of random mutation occurring in a gene sequence.
Selection MechanismTournament ( k = 3 )Strategy for selecting parents; a tournament size of 3 was used.
Elitism StrategyTop-2 KeepThe best 2 individuals are automatically preserved for the next generation.
Stopping CriterionStall Gen = 15Algorithm terminates if the fitness (AIC) does not improve for 15 consecutive generations.
Table 2. Basic Information of the Sample.
Table 2. Basic Information of the Sample.
Number of Observations228
Mean of Training Data102.4262
Standard Deviation of Training Data1.9868
Maximum Value108.7400
Minimum Value98.1900
Table 3. ADF Test Results.
Table 3. ADF Test Results.
Test TypeStatisticp-Value
Augmented Dickey-Fuller test 4.3853 <0.01
1% significance level 2.5755 0.01
5% significance level 1.9421 0.05
10% significance level 1.6160 0.10
Table 4. Results of the Pure-Randomness (White-Noise) Test.
Table 4. Results of the Pure-Randomness (White-Noise) Test.
Lag OrderLB Test Statisticp-Value
6819.73<0.0001
12903.22<0.0001
181051.80<0.0001
Table 5. Results of Threshold-Value Estimation.
Table 5. Results of Threshold-Value Estimation.
Number of ThresholdsThreshold ValuesSSR/AIC
1101.4187,370.75
2102.484,122.62
102.8192,406.64
Table 6. AIC and BIC Information-Criterion Values for Each Model.
Table 6. AIC and BIC Information-Criterion Values for Each Model.
ARMA (p, q)AIC ValueBIC ValueARMA(p, q)AIC ValueBIC Value
(0, 1)735.09745.38(2, 2)401.65422.22
(0, 2)603.2616.91(3, 0)415.00432.14
(1, 0)422.29432.58(3, 1)401.69422.27
(1, 1)422.32436.03(3, 2)404.00428.01
(1, 2)417.88435.03(4, 0)413.77434.35
(2, 0)421.54435.26(4, 1)403.68427.69
(2, 1)402.42419.57(4, 2)396.76424.20
Table 7. Parameter Estimates for the TARMA Models with l = 2 .
Table 7. Parameter Estimates for the TARMA Models with l = 2 .
Interval
Segment
ARMA (4, 2)ARMA (2, 2)ARMA (3, 1)
ParameterEstimateStd. ErrorParameterEstimateStd. ErrorParameterEstimateStd. Error
y t d
101.41
C100.29690.0615C100.15230.4267C100.15280.4265
AR(1)0.78780.1995AR(1)−0.03250.2041AR(1)−0.02750.1837
AR(2)0.86760.3279AR(2)−0.74460.1629AR(2)0.74480.1649
AR(3)−0.54550.2323MA(1)0.94130.2352AR(3)−0.00420.1390
AR(4)−0.22040.1430MA(2)0.00430.1593MA(1)0.93680.1274
MA(1)−0.09760.1642
MA(2)−0.90240.1628
y t d >
101.41
C103.05690.3766C103.05850.3744C103.05810.3752
AR(1)1.30640.8957AR(1)1.80130.0827AR(1)1.69180.1426
AR(2)0.05321.4920AR(2)−0.83110.0766AR(2)−0.63440.2064
AR(3)−0.39680.5577MA(1)−0.82090.1138AR(3)−0.09030.0859
AR(4)−0.00630.1262MA(2)0.08060.0781MA(1)−0.71280.1229
MA(1)−0.32480.8911
MA(2)−0.29390.6284
Table 8. Parameter Estimates for the TARMA Models with l = 3 .
Table 8. Parameter Estimates for the TARMA Models with l = 3 .
Interval
Segment
ARMA (4, 2)ARMA (2, 2)ARMA (3, 1)
ParameterEstimateStd. ErrorParameterEstimateStd. ErrorParameterEstimateStd. Error
y t d
102.4
C100.99760.3951C101.02880.3593C101.02950.3574
AR(1)0.05920.0939AR(1)0.85670.3987AR(1)0.88840.5105
AR(2)−0.04890.0731AR(2)0.00460.3630AR(2)0.13090.4930
AR(3)0.77420.0603MA(1)0.09450.3924AR(3)−0.13600.0937
AR(4)−0.06430.0931MA(2)0.18150.1135MA(1)0.06210.5156
MA(1)0.93960.0418
MA(2)1.00000.0537
102.4 <
y t d
102.81
C102.61880.0344C102.61530.0298C102.61960.0351
AR(1)−0.51360.5978AR(1)−0.02714.3489AR(1)0.21790.5804
AR(2)0.00130.3814AR(2)0.28592.2034AR(2)−0.26260.2573
AR(3)0.10060.2363MA(1)0.48354.4586AR(3)0.31430.2583
AR(4)0.19340.2445MA(2)−0.51654.4565MA(1)0.08900.6062
MA(1)0.94370.5863
MA(2)−0.05630.5697
y t d >
102.81
C104.52290.4674C104.43160.6483C104.58160.3910
AR(1)1.44680.1897AR(1)0.25300.4380AR(1)1.62940.2939
AR(2)−1.20140.3378AR(2)0.53710.3756AR(2)−0.65080.4132
AR(3)0.95850.2248MA(1)0.89190.4377AR(3)−0.05900.1681
AR(4)−0.40040.1256MA(2)0.13180.1775MA(1)−0.55280.2782
MA(1)−0.37170.1639
MA(2)0.80710.2010
Table 9. Diagnostic Results for the TARMA Models with l = 2 (Residual White-Noise Tests).
Table 9. Diagnostic Results for the TARMA Models with l = 2 (Residual White-Noise Tests).
IntervalLag OrderARMA(4, 2)ARMA(2, 2)ARMA(3, 1)
LB Statisticp-Value LB Statisticp-Value LB Statisticp-Value
161.43010.96401.92280.92671.92410.9265
123.56750.99007.83200.79817.82730.7985
260.12651.00000.47000.99820.38140.9990
1221.57400.042622.19800.035422.01500.0374
Table 10. Model Forecast Results.
Table 10. Model Forecast Results.
ModelMean
Absolute
Error (MAE)
Mean Absolute
Percentage Error
(MAPE)
Mean
Squared
Error (MSE)
Root Mean
Squared Error
(RMSE)
Directional
Accuracy (DA)
M 1 0.72450.715%0.78970.888765.1%
M 2 0.59810.590%0.49040.700382.2%
M 3 0.59730.589%0.48920.699482.3%
M 4 0.50570.498%0.43200.657385.6%
M 5 0.54240.534%0.54510.738379.1%
M 6 0.51950.511%0.47250.687483.3%
M S 0.54470.537%0.42970.655585.7%
M S A I C 0.56970.562%0.45780.676684.2%
M S B I C 0.53640.529%0.42480.652885.0%
M M M A 0.52970.522%0.42500.651986.0%
Table 11. Rolling-Forecast Results.
Table 11. Rolling-Forecast Results.
T = 175 T = 190 T = 205
ModelMAEMAPEMSEModelMAEMAPEMSEModelMAEMAPEMSE
M 1 1.4511.42%3.023 M 1 2.3762.33%5.196 M 1 1.2351.21%2.165
M 2 1.8921.85%5.981 M 2 2.6802.64%6.324 M 2 1.3371.31%2.684
M 3 0.6850.67%0.770 M 3 1.0451.02%1.865 M 3 1.2151.19%2.268
M 4 1.2591.24%2.360 M 4 1.0311.00%2.031 M 4 1.4071.38%2.554
M 5 1.3071.28%2.448 M 5 2.3522.31%4.264 M 5 1.0841.07%1.806
M 6 0.6050.59%0.620 M 6 0.9020.88%1.846 M 6 1.3821.36%2.676
M S 0.8990.88%1.351 M S 0.9850.90%1.887 M S 1.2071.21%2.180
M S -AIC 0.6420.63%0.736 M S -AIC 0.9660.97%1.555 M S -AIC 1.0491.02%1.795
M S -BIC 0.5910.58%0.598 M S -BIC 0.7090.76%1.347 M S -BIC 1.0171.01%1.604
M M M A 0.8620.87%1.374 M M M A 1.0401.08%1.931 M M M A 1.1211.10%1.856
Table 12. ADF Test Results.
Table 12. ADF Test Results.
Test TypeStatisticp-Value
Augmented Dickey–Fuller test 4.6255 <0.01
1% significance level 2.5756 0.01
5% significance level 1.9420 0.05
10% significance level 1.6162 0.10
Table 13. Pure-Randomness Test Results.
Table 13. Pure-Randomness Test Results.
Lag OrderLB Test Statisticp-Value
6924.58<0.0001
121024.50<0.0001
Table 14. Combination-Model Forecast Results Based on S-BIC.
Table 14. Combination-Model Forecast Results Based on S-BIC.
DateForecastDateForecast
February 2023102.2802February 2024102.6504
March 2023102.2691March 2024102.7241
April 2023102.2830April 2024102.7929
May 2023102.2924May 2024102.8563
June 2023102.2787June 2024102.9131
July 2023102.2893July 2024102.9633
August 2023102.3085August 2024103.0063
September 2023102.2982September 2024103.0426
October 2023102.2970October 2024103.072
November 2023102.3170November 2024103.0952
December 2023102.5003December 2024103.1124
January 2024102.5736January 2024103.1245
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, D.; Zhao, Y.; Xu, X. Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration. Big Data Cogn. Comput. 2026, 10, 14. https://doi.org/10.3390/bdcc10010014

AMA Style

Cao D, Zhao Y, Xu X. Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration. Big Data and Cognitive Computing. 2026; 10(1):14. https://doi.org/10.3390/bdcc10010014

Chicago/Turabian Style

Cao, Dezhi, Yue Zhao, and Xiaona Xu. 2026. "Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration" Big Data and Cognitive Computing 10, no. 1: 14. https://doi.org/10.3390/bdcc10010014

APA Style

Cao, D., Zhao, Y., & Xu, X. (2026). Interpretable Nonlinear Forecasting of China’s CPI with Adaptive Threshold ARMA and Information Criterion Guided Integration. Big Data and Cognitive Computing, 10(1), 14. https://doi.org/10.3390/bdcc10010014

Article Metrics

Back to TopTop