A New Scheme of Adaptive Covariance Inflation for Ensemble Filtering Data Assimilation

Due to the model and sampling errors of the finite ensemble, the background ensemble spread becomes small and the error covariance is underestimated during filtering for data assimilation. Because of the constraint of computational resources, it is difficult to use a large ensemble size to reduce sampling errors in high-dimensional real atmospheric and ocean models. Here, based on Bayesian theory, we explore a new spatially and temporally varying adaptive covariance inflation algorithm. To increase the statistical presentation of a finite background ensemble, the prior probability of inflation obeys the inverse chi-square distribution, and the likelihood function obeys the t distribution, which are used to obtain prior or posterior covariance inflation schemes. Different ensemble sizes are used to compare the assimilation quality with other inflation schemes within both the perfect and biased model frameworks. With two simple coupled models, we examined the performance of the new scheme. The results show that the new inflation scheme performed better than existing schemes in some cases, with more stability and fewer assimilation errors, especially when a small ensemble size was used in the biased model. Due to better computing performance and relaxed demand for computational resources, the new scheme has more potential applications in more comprehensive models for prediction initialization and reanalysis. In a word, the new inflation scheme performs well for a small ensemble size, and it may be more suitable for large-scale models.


Introduction
Data assimilation (DA) incorporates observations into a climate model through background error covariances derived from model dynamics and then produces a continuous time series of climate states [1][2][3]. In the ensemble Kalman filter (EnKF) [4], covariance inflation [5] is often used to avoid underestimating the background error covariance caused by a finite size of ensembles. It increases the state's uncertainty by expanding its ensemble spread and increases the confidence in the observations.
The covariance inflation scheme is mainly divided into multiplicative [6,7], additive [8] and observation error variance [9] inflation. This paper focuses on multiplicative inflation, which is further divided into prior and posterior inflation by applying the inflation factor to the background ensemble and the analysis ensemble, respectively. Prior inflation was proposed earlier, and Anderson [10] used it in the ensemble adjusted Kalman filter (EAKF) assimilation method. The inflation factor here requires manual tuning for each assimilation. Consequently, it is often time consuming and computationally expensive, especially for complex geophysical models. Many studies have also pointed out that the EnKF assimilation method is sensitive to the choice of the inflation factor [11,12]. Therefore, Anderson [13] developed a time-adaptive covariance inflation algorithm based on hierarchical Bayesian estimation theory. He updated the inflation factor like a variable and got results as satisfactory as those from manual tuning. By extending the Bayesian approach, Anderson [14] proposed a spatial and temporal varying adaptive covariance inflation algorithm in 2009 (A09). In addition, an online inflation factor estimation algorithm in the ensemble transform Kalman filter (ETKF) framework was proposed by Wang et al. [15] and extended by Li et al. [16] to simultaneously estimate covariance inflation and observation errors online. Miyoshi [17] improved the ETKF framework by adaptively estimating the inflation factor at each grid point, and the method has been applied to several geophysical system studies [18][19][20][21]. Zheng [22] and Liang [9] used the maximum likelihood method to estimate the inflation factor from the update vector at each time step. Zhang [23] proposed a special posterior inflation (relaxation) scheme, the relaxation-to-prior-perturbation (RTPP) approach. Based on it, Whitaker and Hamill [12] proposed the relaxation-to-prior-spread (RTPS) method. The relaxation factors of these two methods are obtained by manual tuning. Ying and Zhang [24] proposed an adaptive RTPS method, and Kotsuki et al. [25] proposed an adaptive RTPP method to obtain varying optimal factors. Both methods are based on the innovation statistics [8,26] in the observation space.
All the above methods assume that the inflation innovation is Gaussian, but it can produce negative or minimal inflation values, and a long run of deflation may lead to filter divergence. So, scholars have tried many other schemes. For example, Brankart [27] made the initial prior obey the exponential distribution, but it is not suitable for small values. As a conjugate distribution to the variance parameter of the Gaussian distribution, the inverse chi-square (χ −2 ) (equivalent to the inverse-gamma) distribution may be a better choice. El Gharamti made the prior probabilities of inflation obey the inverse-gamma distribution [28] (E18) and applied it to the posterior inflation [29] (E19). Raanes [30] made the likelihood function obey the χ 2 distribution, and the prior and posterior probabilities obey the χ −2 distribution. However, all these advantages seem insufficient for small ensemble sizes.
In this paper, a new inflation scheme is proposed in the framework of Bayesian theory, in which the prior probability still obeys the χ −2 distribution and the likelihood function obeys the t distribution, which is more suitable for small sample sizes, and data assimilation experiments are performed in two atmospheric-ocean-coupled model frameworks. In the results of the comparison, the new scheme shows significant effects under high sampling and model errors.
In addition to the explanation of abbreviations in this paper, we also list main abbreviations and definitions in the text in Appendix A to read more conveniently. The abbreviations of inflation methods are explained as follows: AIb denotes the spatial and temporal adaptive prior covariance inflation scheme in A09; AIa denotes the adaptive inflation scheme that uses AIb into posterior inflation; EIb denotes the enhanced adaptive prior inflation scheme in E18; EIa denotes the enhanced adaptive posterior inflation scheme mAI-a in E19; tXb and tXa are the adaptive prior and posterior inflation schemes proposed in this paper, respectively.
The paper is organized as follows: Section 2 introduces the assimilation method, the basic theory of adaptive inflation and a new adaptive inflation scheme. Section 3 focuses on a series of numerical experiments with a simple five-variable model and compares the new method's performance with other inflation schemes. Section 4 verifies the applicability and effectiveness of the new inflation scheme in another coupled model. Finally, the discussion and conclusion are given in Section 5.

EAKF Assimilation Method
To construct the assimilation frame, we used the ensemble adjustment Kalman filter (EAKF) [3,10,[31][32][33] assimilation method. The process involves two steps. First, the observation increment ∆y o,i is calculated from the state ensemble x and the observation y o : where y is the projection of the state values on the observation space, y = h(x), h is the projection operator; y i is the i-th member of the ensemble; y is the ensemble mean; and r 2 (y, y o ) is the ratio of the model ensemble variance in the observation space and the observation error variance, i.e., σ 2 y /σ 2 y o . Second, the state increment ∆x i is calculated from the observation increment: where cov(x, y) is the error covariance of the state ensemble x and the ensemble y in the observation space. The assimilated analysis ensemble is obtained by adding the state increments to each corresponding member of the state ensemble.

Basic Inflation Theory
This section focuses on the basic inflation scheme to be compared in this paper, and all the equations can be found in A09, E18 and E19. To compensate for the error covariance lost in the ensemble assimilation process and prevent filter divergence, the error covariance needs to be inflated, i.e., P in f = λP, where λ is the inflation factor, which is generally slightly greater than 1. In practice, the inflation of error covariance is generally achieved by inflating the state ensemble spread, as shown in Equation (3). For simplicity, we assume that all equations are at the same time step, so the time subscripts are omitted: where x j,i denotes the i-th member state value of the j-th variable and x j denotes the ensemble mean of the j-th variable. A larger ensemble size corresponds to a smaller inflation factor and vice versa [6]. In the background state framework, considering only scalar systems (which can be extended to vector systems), the background ensemble mean x b and sample varianceσ 2 b are expressed as follows: where b (background) denotes the prior inflation. This can be replaced by a (analysis) denoting the posterior inflation, to represent the statistical analysis of the posterior ensemble. x b,i denotes the ensemble perturbation of the background ensemble, and N is the ensemble size. Theoretically, the background ensemble member x b,i ∼ N µ, σ 2 b , but the background varianceσ 2 b calculated from the sample is an underestimated variance due to reasons such as the finite ensembles: That is, the underestimated variance is inflated to obtain the true or near-true variance. Assuming that the true state of the model is x t , the observation y o can be obtained by the following equation: where the operator h is used to project the state space variables into the observation space (both are consistent by default in this paper, so h is the unit matrix and is omitted after here). ε o is the observation error, which is set to obey a Gaussian distribution with mean 0 and variance σ 2 o . Similarly, we can obtain: where the background error ε b and the analysis error ε a obey a Gaussian distribution with mean 0 and variance σ 2 b , σ 2 a , respectively. The analysis ensemble is calculated from the background ensemble and the observation correction as follows: where the correction term According to the definition of the innovation statistic, the background distance d b is given by Equation (11): The innovation statistics respond to the difference between the observation and the ensemble mean, which is used later in Bayesian theory to calculate the likelihood of inflation. In addition, since the background ensemble is formed by adding random perturbations directly to the initial field, we can assume that the background error ε b is not related to the observation error ε o . However, the analysis ensemble is calculated from the background ensemble and observations, so the analysis error ε a is considered related to the observation error ε o .
With the development of inflation theory, prior inflation has been widely studied and applied, as described in Section 1. It inflates the prior ensemble of states using Equation (3) before the assimilation step. A09 proposed a classical spatial-temporal adaptive inflation algorithm, which is called AI-b in this paper. Similar to the estimation of state variables, the inflation factor as a parameter also requires prior inflation and observations through Bayesian theory to compute the posterior inflation: where p(λ|d b ) is the posterior probability of λ. Equation (12) is used to calculate the value of λ when the posterior probability is the maximum. p(λ) is the prior probability of λ with a model function of 1, i.e., the posterior inflation factor is the prior factor the next time.
Anderson considered the prior probability to obey a Gaussian distribution with mean λ b and variance σ 2 λ,b . Here, norm is a standardized constant. p(d b |λ) is the likelihood, which is also considered to obey a Gaussian distribution, and its mean and variance of the background innovation statistic on the prior λ are given by the following equations, respectively: Since ε o is not related to ε b , the expectation of its product is 0. The variance of the likelihood is denoted as θ 2 = σ 2 o + λσ 2 b by matching Equation (6) and a determined observation error variance. Further, the inflation posterior probability density function (pdf) can be obtained as follows: To calculate λ when the posterior probability is maximized, the above equation is derived, and its final form is for a cubic equation [13]. However, if there is non-exact correspondence between the observation space and the state space, the influence of the correlation coefficient between the observation and the prior state or localization factor should be considered [14]. The inflation factor in the observation space should be a function related to it in the state space: where γ = ρr, ρ is the localization factor, and r is the correlation coefficient between observation and state. If γ = 1, the result based on Equation (15) is a sixth-order equation, which is generally insoluble. Thus, based on Equation (15), A09 performed a Taylor expansion on the likelihood function, the linear term was retained, and finally a quadratic equation concerning λ was obtained, giving a solution close to λ b . Ideally, λ should be greater than 1 to push the ensemble state away from its mean and increase the error covariance. When λ = 1, no inflation is performed.
o , λ will be less than 1 or even less than 0, which will not inflate. Similarly, λ much larger than 1 is also infeasible, which would lead to over-inflation of the ensemble. Therefore, λ should be a reasonable range of variation.
Unlike A09, in the enhanced prior scheme (EIb) in E18, El Gharamti got the innovation statistic d in the likelihood function by each member of the prior ensemble: By calculation, the expectation of d remains the same and the variance is added to the original with a correction term related to the ensemble size. The modified variance of the likelihood function is: Meanwhile, the prior probability in E18 obey the inverse-gamma distribution, as shown in the following equation: where α is the shape parameter, β is the rate parameter, Γ is the gamma function and Γ(x) = +∞ 0 t x−1 e −t dt, (x > 0). If the prior mean (mode) and variance of the Gaussian distribution are available, the two unknown parameters can be found by making them equal to the mode and variance of the inverse-gamma distribution, respectively. Compared with the Gaussian distribution, the inverse-gamma distribution features of this scheme avoid some negative or small inflation values and reduce the impact on the assimilation quality.
Different from the prior inflation, in the enhanced posterior inflation scheme (EIa) in E19, in addition to the inflation factor is acting on the analysis ensemble, the following treatment is applied to the analysis state and variance.
ε o is related to ε a , the likelihood variance is not the same as the results of Equations (14) and (18), but a function of the posterior variance σ 2 a,j , the posterior variance of the previous assimilation step σ 2 a,j−1 and the observation variance σ 2 o,j , where j denotes the j-th observation of the assimilation [29]. To reduce the computational cost in the high-dimensional complex model, El Gharamti decorrelated them as follows: where σ 2 a and x a are not correlated with observations and the innovation statistic becomes d a,i = y o − x a,i . To ensure that the variance σ 2 a is not less than 0, several methods are designed to restrict it. A comparison of the experimental effects shows that σ 2

The New Inflation Scheme
The classical adaptive inflation is computed based on Bayesian theory with a Gaussian framework. However, as described in Section 1, many studies have shown that a framework with a Gaussian distribution is not the only choice and different distributions have some advantages in some aspects. In this paper, we used alternative distributions to obtain new inflation schemes.

Prior Probability
Raanes showed that the inverse-gamma or inverse chi-square distribution is a better choice for the prior pdf of inflation, which is also better than the assimilation effect in the Gaussian framework (note that the gamma and chi-square distributions are equivalent and can be converted into each other [30]). Therefore, this scheme uses the inverse-gamma distribution as in E18 to describe the prior pdf of inflation, as in Equation (19).

Likelihood Function
When the degree of freedom is large enough, the t distribution is believed to become the same as the Gaussian distribution. However, when the sample size is small, the t distribution shows the feature of "heavy-tailed." It is influenced by the sample and deviates significantly from the Gaussian distribution. Large ensemble sizes cannot be used in actual large-scale climate models, so the t distribution is more suitable than the Gaussian distribution for estimating the overall population. For the above reasons, the inflation scheme makes the likelihood function obey the t distribution, where the pdf of the t-distribution is derived from Table A1 in the paper by Raanes [30]: where v is the degree of freedom, which is equal to the ensemble size; M is the number of state variables; and b and B are the parameters in the t distribution pdf. The t distribution has mean b and variance v/(v − 2)B. To find the required parameters, suppose the prior mean and variance of the Gaussian distribution are available so that both means and variances are equal: Bringing them into Equation (22), the likelihood function can be obtained as follows: To enhance the relationship between the innovation statistic d and each state ensemble member, the method proposed in E18 can be used. So, the variance of the likelihood function in the scheme is shown as Equation (18).

Posterior Probability
According to Bayesian theory (Equation (12)), multiplying the likelihood function with the prior probability gives the posterior probability of inflation: where θ is a function of λ and α and β are functions of λ b and σ 2 λ . Therefore, finding the updated posterior inflation is equivalent to finding the value of λ when the posterior probability is maximized.
Let the derivative of the posterior probability be 0. Eventually, the same quadratic equation as Equation (38) in E18 [28] can be obtained: However, l and l are not same with them in E18, the detailed procedure can be found in Appendix B. The root close to λ b is the updated inflation factor. In the posterior inflation scheme, we also use the scheme in E19 for decorrelation, as shown in the previous section.
Since the inflation method is obtained from the likelihood function obeying the t distribution and the prior probability obeying the χ −2 distribution, they can be used for the background state to obtain the prior inflation (tXb) and the analysis state to obtain the posterior inflation (tXa), respectively. The updated inflation variance calculation is not given here because a fixed variance is more appropriate [13] in terms of the calculation's cost and effectiveness. It is proved that even the adaptive varying covariance decreases to a stable value over time [28].

Algorithm Implementation
The computing process and characteristics of the new adaptive inflation algorithm in the sea-air coupled assimilation model based on EAKF are as follows: • Without abandoning the Gaussian framework, the t distribution of the likelihood function and the χ −2 distribution of the prior probabilities are used, assuming that their Gaussian distributions are available, and their product outputs are Gaussian priors when assimilating the next observation.

•
The prior inflation factor is used before each variable assimilation step, and the posterior inflation factor is used after it. • Localization is not considered in this paper, and since the state space and the observation space are consistent, γ = 1.

•
The rate parameter β is calculated by the mean λ b and variance σ 2 λ,b of the prior inflation factor. • The innovation statistic d as well as its variance θ 2 are calculated. Then, the ratio of the gamma function is calculated by the special method proposed in this paper to obtain the values of l and l . • Finally, the quadratic equation containing β, l, l and λ b is solved to obtain the updated inflation factor λ u . The new λ u,j is the prior inflation factor λ b,j+1 when assimilating the next observation.

The Model
We first used a five-variable coupled climate model (5VCCM), a decadal pycnocline prediction model, proposed by Zhang [34,35] and widely used in many studies [3,36], to conduct a series of experiments and analyze the experimental results. The 5VCCM is a simple version of the coupled general circulation model (CGCM), with some similar features, avoiding the enormous costs of using complex models. The 5VCCM consists of five variables: three variables from the Lorenz63 chaotic atmosphere model [37], one variable from the slab ocean model, and one variable from the deep-ocean pycnocline model [38]. The fast atmosphere drives the slower ocean, resulting in sea-air interactions. The governing equations are as follows: .
where all quantities are given in non-dimensional units. x 1 , x 2 and x 3 are atmospheric variables, where x 1 is the flip rate of convection, x 2 is the temperature difference proportional between the up-flow and down-flow fluids and x 3 is the temperature gradient in the vertical direction. ω and η are ocean variables, where ω denotes the slab-ocean and η denotes the deep-ocean pycnocline. A dot above a variable denotes the time tendency. The above five formulas constitute a system of nonlinear differential equations and contain 15 parameters. σ, κ and b are the original parameters in the Lorenz63 model with standard values of 9.95, 28 and 8/3, respectively. c 1 denotes the parameter of atmospheric forcing by the ocean; c 2 denotes the atmospheric forcing on the upper ocean; c 3 and c 4 denote the linear forcing by the deep ocean on the upper ocean and the interaction between them, respectively; and c 5 and c 6 denote the linear forcing by the upper ocean on the deep ocean and their interaction, respectively. Without the interaction between different media, the upper ocean would consist of only the damping term O d ω and the external forcing S(t) = S m + S s cos 2πt/S pd , where O d is the damping coefficient; S m and S s define the magnitude of the annual mean and seasonal cycle, respectively, and S pd defines the timescale of the seasonal cycle. Since the timescale of ω is much slower than that of the atmosphere, the heat capacity O m is much larger than the damping coefficient O d , which means that the timescale of the ocean is O m /O d times that of the atmosphere. In the deep-ocean pycnocline model, η denotes the anomaly of the ocean pycnocline depth, and its equation is derived from the two-term balance model of the zonal-time mean pycnocline [38]; Γ is the constant of proportionality.

Experiment Design
The experiments were designed to compare the performances of new inflation schemes with those of other inflation schemes with different ensemble sizes. Before starting the assimilation experiments, we needed to construct perfect and imperfect assimilation models. We selected the leapfrog time difference scheme as the perfect model scheme [34] and used the Robert-Asselin time filter [39,40] with a time filter coefficient of 0.125. The fourthorder Runge-Kutta (RK4) time difference scheme was used in the imperfect model for comparison [36]. The experimental time step was ∆t = 0.01, and all 15 parameters were considered as standard values in Section 3.1. We assumed that the only source of the model error is from the different time difference schemes. The initial values and the observation of the experiment are generated with reference [36], and the true and observation field required for the experiment can be obtained from the initial values and the model of the input parameters together, as follows.
The five variables (x 1 , x 2 , x 3 , ω, η) of the coupled model were spun up from the initial values (0, 1, 0, 0, 0) for 1000 time units (TUs; 1 TU is 100 time steps) in the perfect and imperfect models, respectively, to obtain true and biased initial fields. Then running the true initial field for 10,000 TUs using the perfect model, we obtained the true values of the five variables about the time series. The observation field was obtained by adding Gaussian white noise with a standard deviation of 2 every 5 steps for x 1 , x 2 and x 3 , and a standard deviation of 0.2 every 20 steps for ω. This observation frequency was based on the actual climate observation system, where the atmosphere has a higher frequency of observations than the ocean. The deep-ocean variable η had no observations, so no inflation was performed on it.
The initial ensembles of perfect and imperfect model assimilations were obtained from true and biased initial fields, respectively, adding only Gaussian white noise consistent with the observed standard deviation on x 2 . The ensembles were used as the initial condition to run 10,000 TUs corresponding to their time difference methods, respectively, and different inflation schemes were used for comparison. The assimilation effect was judged by the rootmean-square error (RMSE). The RMSE (Equation (28)) time series of x 2 in the last 100 TUs, ω in the last 1000 TUs, and η in the 10,000 TUs were selected for analyses and comparison. The mean RMSE (Equation (29)) of the stable last 5000 TUs was also represented.
where the subscript denoting the state is omitted, x is the mean of the state ensemble, and n is the number of steps for analysis. The initial inflation factor was 1.0. The standard deviation of the inflation factor took a fixed value such that σ λ b = 0.1 when using the perfect model and σ λ b = 1.0 when using the imperfect model [14].
The other two experiments were conducted as a reference to the assimilation results. The first was a control (CTRL) experiment that did not introduce any observations, i.e., only model integration was performed. The second was an assimilation experiment with state estimation only (SEO), without introducing covariance inflation or localization.

Result Analysis
Based on the above experimental setup, this section compares and analyzes the performance of the new inflation scheme used in the prior and posterior ensembles in the perfect and imperfect models and shows the effect of the traditional assimilation method with the new adaptive inflation scheme.

Inflation Scheme Comparison Imperfect Model
• Prior inflation scheme The initial bias ensemble was integrated using the RK4 difference method with different inflation schemes. The time series of RMSEs compared with SEO and CTRL are shown in Figure 1. The manually tuned inflation factor was almost unavailable in the complex model, so it was not compared in this paper. The black line is the control experiment CTRL, the magenta line is the SEO, the red line is the spatial-temporal adaptive prior inflation method proposed by A09, the green line is the enhanced adaptive prior inflation method proposed by E18, and the blue line is the new prior inflation scheme proposed. The inset in each graph in Figure 1 shows its partial enlargement for a clear comparison of all inflation schemes.  Figure 1a shows the experimental results for an ensemble size of 5. The RMSEs of CTRL and SEO were much larger than those on adding the inflation scheme. In addition, we compared the effects with the three inflation schemes. There was no apparent difference between the three schemes for x 2 . For the variable ω, AIb performed poorly and often had RMSEs above the observational standard deviation. Both EIb and tXb worked better due to the χ −2 distribution of the inflation prior probability. tXb was a bit better because its likelihood function obeyed the t distribution, which is more suitable for small ensemble sizes. For the variable η without observation, AIb produced unstable results but was better than EIb and tXb. Compared with EIb, tXb produced more stable and better assimilation results. AIb produced poor assimilation results for ω and better but unstable results for η in the sea-air coupled model. The conflict between these two variables was alleviated by EIb, which substantially improved the assimilation effect of ω and stabilized η at the same time. Moreover, the new inflation scheme tXb improved the assimilation quality of these two variables again and reduced their RMSEs. Figure 1b shows the experimental results for an ensemble size of 20. Although the sampling error was reduced, the same assimilation schemes with the inflation factors were still significantly better than the RMSEs of CTRL and SEO, which shows that covariance inflation significantly improves data assimilation quality. To compare the differences between the inflation schemes clearly, we did not compare these two schemes in subsequent experiments. When the ensemble size increased to 20, there was no longer a significant difference between EIb and tXb for ω. The reason is that the larger ensemble size makes the t distribution gradually approach the Gaussian distribution and produces a similar effect. For updated η by the action of other variables only, consistent with the ensemble size of 5, tXb still had a better performance than EIb and was better than AIb here.
To intuitively compare the performance of different inflation methods for different ensemble sizes and to explore the implementation of the new inflation scheme between various sampling errors, we calculated the mean RMSEs of the last 5000 TUs for x 2 , ω and η, and the results are displayed in Figure 2. The blue bar is the AIb scheme of A09, the orange bar is the EIb scheme of E18 and the yellow bar is the tXb prior inflation scheme. The results in Figure 1 show that the different inflation schemes have insignificant effects on the RMSE of x 2 . However, the first subplot of Figure 2 shows that the new inflation scheme has some advantages over the other two for x 2 when the ensemble size is small, while the three schemes show comparable levels when the ensemble size exceeds 20. The advantage of the new scheme is more evident than the advantages of the others for ω. When the sampling error was large, i.e., the ensemble size was 5, the effect of tXb improved by 48.6% relative to the classical AIb scheme. When the ensemble size was less than or equal to 20, tXb was better than EIb, which further indicated that the t distribution of the likelihood function plays a major role for small samples. When the ensemble size was 5, EIb did not show any advantage for η. In contrast, except for a further reduction in RMSEs for x 2 and ω, the effect of tXb improved by 45.9% for the unobserved variable η, reaching a similar level as AIb and showing a better effect than EIb. Throughout, tXb showed promising results when the ensemble size was small (Figure 2). tXb also offered comparable levels to EIb due to the gradual convergence of the t distribution and the Gaussian distribution at larger ensemble sizes. This result indicates that tXb has better results than the other two prior inflation methods for most cases in the imperfect model, and the larger the sampling error, the more pronounced the effect.
In the simple sea-air coupled model, the variables x 1 , x 2 , x 3 and ω provided observations on general characteristics similar to those by most models. The variable η, which changed only under the influence of other variables, had more unique features and reflected the characteristics of the unobserved variables to some extent. So, the results of x 2 and ω showed that 20 ensemble members are enough to significantly reduce the sampling error in the imperfect simple sea-air coupled model. Due to the short integration time of the model, there is not enough capacity to respond to the changes due to the ensemble size, so an excessive ensemble size does not always give better results. Also in this model, the computation times of different inflation schemes with different ensemble sizes were compared, as shown in Table 1. Due to the unstable computer power, the following values are the average results of three times of experiments. The time taken for the three schemes is close when using the same ensemble size, while the computation time will be significantly higher when the ensemble size increases. Therefore, using a small ensemble size can save more time cost.  The posterior inflation results for the three schemes are shown in Figure 3, where the ensemble size is 5 for (a) and 20 for (b). Similar to the results of the prior inflation schemes in Figure 1, the difference in the inflation schemes had no pronounced effect for x 2 . For ω, tXa was better than the other two schemes, especially AIa. For η, AIa showed better results when the ensemble size was 5, but tXa exhibited lower RMSEs when the ensemble size increased. In any case, tXa showed better results than EIa, indicating that the new inflation scheme is superior to the enhanced inflation schemes of E18 and E19 in some aspects. The results of the prior and posterior inflation schemes of E19 and our schemes are compared in the same figure. We selected the variable η with more stable RMSE results for comparison. The RMSEs of η in the last 5000 TUs are shown in Figure 4, along with a high sampling error with an ensemble size of 5 for (a) and a low sampling error with an ensemble size of 20 for (b). The magenta line is the enhanced prior inflation scheme EIb, the red line is the new prior inflation tXb, the green line is the enhanced posterior inflation scheme EIa and the blue line is the new posterior inflation tXa. Irrespective of whether it is the prior or posterior inflation scheme, the result shows that the new inflation method outperforms the enhanced inflation scheme when the ensemble size is small (Figure 4a). Furthermore, all the inflation schemes were more stable for η when the ensemble size increased to 20 (Figure 4b). The prior inflation was better than the posterior inflation for each scheme with an imperfect model and a small sampling error, which is consistent with the conclusion of E19. Moreover, the enhanced prior inflation scheme EIb had the same effect as our posterior inflation scheme tXa, indicating that the new inflation scheme is better than the enhanced scheme overall.

Perfect Model
In the assimilation framework with different inflation schemes, the unbiased model was integrated using the leapfrog scheme. As an example, the time series of RMSEs of η obtained by using three prior inflation schemes are shown in Figure 5. The red line indicates the AIb scheme, the green line indicates the EIb scheme and the blue line indicates the tXb scheme. The results of an ensemble size of 5 are shown in Figure 5a and of 20 in Figure 5b. The RMSEs of η in the perfect model showed a significant reduction compared to those in the imperfect model, and they were often close to 0. When the ensemble size was 5, the RMSEs of tXb were stable at a lower level, but the other two schemes increased much more suddenly at some moments and showed unstable results. When the sampling error reduced, the results of the three inflation schemes improved to some extent, but the new prior inflation scheme was still more stable for η. To better understand the influence of different sampling errors on the assimilation effect of the variable η in the perfect model, the mean RMSEs of the variable η in the last 5000 TUs with different inflation schemes are shown in Figure 6. tXb showed better or comparable levels compared with the other schemes in the perfect model regardless of the sampling error. However, similar to the result in the imperfect model, the special variable η did not show the familiar regularity, but the RMSE decreased when the ensemble size was 100. The RMSE reached a low level because of the smaller sampling error and no model error, and the difference between the schemes was minimal. Such a slight difference is likely to occur by chance, and even different random noises may change.

The Inflation Effect
To clearly show the advantage of adaptive covariance inflation, we compared the mean RMSE in the last 5000 TUs of tXb (blue) with that of SEO (orange) for different ensemble sizes in the imperfect model ( Figure 7a) and the perfect model (Figure 7b). When the model error was large, tXb showed a significant advantage and a large ensemble size for SEO was still challenging to reach an equivalent effect. The RMSE of SEO significantly decreased without a model error, but tXb still performed better at high sampling errors. When the ensemble size increased and the sampling error gradually lowered, SEO had a similar effect to tXb, but it did not exist in the actual model. In the perfect model, the effect of tXb at an ensemble size of 5 was the same as that of SEO at 100 for x 2 and the effect of tXb at 5 was the same as that of SEO at 10 for ω. The above results show that the scheme with the adaptive covariance inflation can effectively reduce the ensemble size, decrease the cost and speed up the computation.

The Model
After verifying the effect of the new adaptive inflation scheme by 5VCCM experiments, we also conducted experiments using another sea-air coupled model with a better physical basis [41]. The North Atlantic Meridional Overturning Circulation Box Model (MOCBM) [42,43] is a low-order model of the North Atlantic climate system consisting of an atmospheric model and an oceanic thermohaline circulation model. The former adds high-and low-latitude temperature variables to the three tropospheric variables in the low-order atmospheric circulation model proposed by Lorenz [44,45], which is different from the Lorenz63 convective model. The latter is a three-box ocean thermohaline circulation model, including the subtropical upper ocean, the subpolar upper ocean and the deep ocean. It evolved from the original two-box model [46], providing a basic understanding of the dynamics of the thermohaline circulation. In addition to the diffusion of temperature and salinity between each box, the upper ocean also exchanges energy with the atmosphere.
The two models are coupled through some variables and coefficients of the upper ocean and atmosphere, and the governing equations are: where the dots above the variables denote the derivatives of the variables concerning time. X denotes the zonal wind and Y and Z denote the amplitudes of cosine and sine phases of the large-scale eddies, respectively. F denotes the diabatic heating contrasts between the low-and high-latitude ocean and G represents the varying zonal heating zonal difference between land and ocean, both directly related to the upper-ocean temperature. The other terms and some of the meanings in the following equation are not described in detail here, and a detailed explanation can be found in the work of Tardif et al. [43] The evolutionary governing equations for temperature and salinity for the three boxes are as follows: where T and S denote the temperature and salinity in the ocean, respectively; V denotes the volume of each box; and subscripts 1, 2 and 3 denote the high-latitude box, the lowlatitude box and the deep-ocean box, respectively. T A1 is the high-latitude air temperature, which is correlated with X, and T A2 is the low-latitude air temperature, which is a constant 25 • C/298.15 K. Q S is the volume-averaged equivalent salt flux, which is linearly related to the eddy energy Y 2 + Z 2 [47]. The meridional overturning circulation (MOC) q has a positive value in the thermal circulation [43] and presents a negative value in the reverse salt circulation, which is obtained from the temperature and salinity of the upper ocean as follows: where α is the thermal expansion coefficient of seawater, β is the salt expansion coefficient and µ is the proportionality constant. The unit of q is Sv, with 1 Sv = 10 6 m 3 s −1 . Other parameters are no longer listed for explanation, and the standard values of all parameters in MOCBM are set as (a, b, F 0 ,

The Build-Biased Model
The next assimilation experiment required establishing an imperfect model. Given that this is achieved in the 5VCCM using different difference schemes, the MOCBM shows biased models using incorrect physical parameters. Since q is directly related to the ocean state, we performed a sensitivity analysis of the physical parameters in the ocean and selected the most sensitive parameter to add bias to the experiment.
We used q to test the sensitivity of parameters, following Zhao et al. [41]. The tested parameters were formed into an ensemble of 20 by adding Gaussian white noise with a standard deviation of 10% of its standard value, while the other parameters retained their standard values. The results were integrated freely for 250 years with the same initial field, and the last 200 years were taken to calculate the time-averaged spread. The sensitivity percentage was obtained from the ratio of the sensitivity of a single parameter to the sum of the sensitivities of all parameters, as shown in Figure 8.  Figure 8 gives the sensitivity percentages of all 11 physical parameters in the oceanic part of the model. Moreover, the MOC is most sensitive to the parameter γ, i.e., a change in γ causes the most different values of q. Therefore, we added a 20% deviation to γ to form a biased model with the wrong parameter, taking a standard value of 0.06364 for γ in the perfect model and 0.076368 in the imperfect model.

Experimental Design
After comparing the similarities and differences between the new and other inflation schemes in the 5VCCM experiment, the main purpose of the MOCBM experiment was to verify the feasibility and effect of the new inflation scheme in this coupled model. The parameters in Section 4.1 and the values of the initial state in this section are from reference [48]. The perfect model uses the standard values of all parameters, and the imperfect model modifies the value of γ. We assumed that the only source of model error in this experiment is the incorrect physical parameters. Both perfect and imperfect models use a fourth-order Runge-Kutta time difference scheme with a time step of 3 h. Starting with the initial state field (X, Y, Z, T 1 , T 2 , T 3 , S 1 , S 2 , S 3 ) = (1.7, 0.0, 0.0, 288.15K, 298.15 K, 283.15 K, 34.21875 psu, 35.0 psu, 34.6 psu), the model runs 2920 steps per year. Since the MOC has long timescale variables, it runs for 5000 years in this paper. The time series obtained by the perfect model is the true state. Following the feature of the existing observing system, observations are generated for only atmospheric and upper-ocean variables. In this study, the standard deviation of the atmospheric variables X, Y and Z were 0.1; T 1 and T 2 were 0.5 K; S 1 and S 2 were 0.1 psu; and the Gaussian white noise corresponding to the standard deviation was added to the true value with the observation frequency of 1 year to obtain the observation field of the model.
The MOCBM experiment also used the EAKF method for data assimilation, with 20 initial ensembles generated by adding white noise with standard deviation to the atmospheric variables X, Y and Z in the initial field. Two experiments were set up for comparison using imperfect models: the control (CTRL) experiment with free integration and the state estimation only (SEO) experiment.
A change in the ensemble size had little effect on the assimilation results in this model, showing that the model is not sensitive to the size of the sampling error, so this experiment selected an ensemble size of 20 for the investigation. In the perfect model experiment, the assimilation effect of SEO was excellent and the impact of adding the inflation factor was not apparent. The ideal model does not exist in practice; thus, the experiment of the perfect model was not conducted.

Result Analysis
In the imperfect model, comparison and assimilation experiments were performed with the same initial ensemble of the above parameter values and states. The average of 20 costumes was taken at each step and compared with the actual values to obtain the state time series, as shown in Figure 9.  Figure 9 shows the 5000-year state time series of q from different comparison experiments. All the values were positive, which indicates that all the 5000 model years have heat-driven circulation, i.e., the ocean flows from the sea surface to the poles, sinks at high latitudes, returns from the deep ocean to the equator and upwells to the upper ocean at low latitudes [43]. All the experiments were compared using their ensemble mean values, and the results of both the CTRL (blue line) and the SEO (orange line) differed significantly from the actual state (red line) due to significant model errors. They did not even match the period of change. The tXb scheme with the new adaptive prior inflation factor (green line) fit the true value better and had the same period of variation, which benefits from the "observed" restrictions on the states and the adjustment of the various inflation factors. Therefore, the new inflation scheme is also applicable to the MOCBM with more obvious physical characteristics, and the adaptive inflation method described in this paper for the sea-air coupled model is feasible.

Discussion and Conclusions
A new adaptive covariance inflation algorithm was designed in this paper, including prior and posterior schemes. Based on Bayesian theory, the prior pdf of inflation obeyed the χ −2 distribution and the likelihood function obeyed the t distribution suitable for small samples. At the same time, the enhancement of the innovation statistic d presented in E18 was used, i.e., a correction was added to the inflation factor and the new adaptive prior inflation tXb was finally obtained. Based on the prior inflation scheme, the decorrelation in E19 was used for the posterior inflation scheme. In the first experiment, the adaptive prior inflation scheme in A09 was first used for the posterior ensemble and was compared with our proposed new scheme and the enhanced method in E18 in the framework of a simple sea-air coupled model. The parameters in the model were not changed, and the model errors only originated from different time difference schemes in the first experiment. The true state field was obtained by the leapfrog scheme, adding Gaussian white noise to generate observations, while the same technique was used for the perfect model integration. Furthermore, the RK4 scheme was used for the imperfect model. A series of experimental results were obtained by changing the ensemble size in the prior or posterior inflation scheme. The second experiment added a model error using incorrect parameters to verify the new inflation scheme's feasibility for other coupled models.
The results show that the new prior inflation tXb has good performance in terms of some parameters compared with the other two schemes in the imperfect model. When the ensemble size was large, the effect of tXb was close to that of EIb because the t distribution tended to be Gaussian. For the posterior inflation scheme, the effect of tXa was still better in most cases. Whether it was the prior or the posterior inflation, the new inflation scheme outperformed the enhanced scheme when the ensemble size was small and had no significant difference for a larger ensemble size.
In conclusion, the new inflation scheme in the imperfect model performs well for a small ensemble size, and it may be more suitable for high-dimensional, large-scale models. In the perfect model (although rare in reality), the new inflation scheme shows more stable results than the other two schemes. However, the results of the experiments are more affected by random errors due to minor sampling errors. Compared to SEO, tXb shows better results, especially in the imperfect model and tXb with a small ensemble size achieves the same effect as SEO with a large ensemble size in the perfect model.
The new inflation scheme also has some positive effects on the simple coupled model. However, there are still some limitations in this study, and the possible future research directions as follows.
1. The method has not been used in a real model, so further testing of the inflation scheme in real atmospheric and ocean models is needed.
2. We have assumed that the state variables are consistent with the observations, i.e., the projection operator h is a unitary matrix, so further verification is needed when h is not a unitary matrix or the observations are not perfect.
3. Due to computer performance limitations, we only performed a small number of iterations for the two simple models. In fact, a longer computation is necessary to reflect the physical processes of the models more clearly. Data Availability Statement: Due to privacy-related restrictions, the data presented in this study are not publicly available but are available on request from the corresponding author.
When M is odd and v is odd: When M is odd and v is even: