1. Introduction
Land subsidence is one of the most critical issues for groundwater usage. However, its modeling and prediction are challenging. This is because land subsidence is a highly nonlinear process with various uncertainties arising from observation errors; the limited available observations; the correlation between parameters; and aquitard properties such as the slow hydraulic head propagation, inelastic deformation, and the past-maximum effective stress (preconsolidation stress), which is updated over time. This study focuses on quantifying the uncertainties in model parameters and predictions, which is both scientifically and practically significant, thus providing insights into subsurface processes and decision making in groundwater management.
Uncertainty in land subsidence model parameters is quantified by finding many sets of model parameters that reproduce the observed subsidence within acceptable error ranges. This is an inverse problem and is addressed through data assimilation [
1]. Typical data assimilation includes an ensemble Kalman filter (EnKF) [
2,
3], an EnKF with multiple data assimilations (EnKF-MDA) [
4], an ensemble smoother (ES) [
5,
6], an ES with multiple data assimilations (ES-MDA) [
7,
8,
9], and a particle filter [
10,
11,
12].
In data assimilation, the balance between optimizing model performance and maintaining model diversity (diversity among ensemble members) is essential. Optimization improves the ensemble’s consistency with the target data but reduces the model diversity. Excessive reduction in model diversity may lead to overconfidence, as the ensemble appears to be more consistent with observations and suggests reduced uncertainty. An extreme case is an ensemble collapse, where iterative assimilation cycles homogenize the ensemble, all members become identical, and the ensemble becomes statistically insignificant [
11]. Addressing the reduction in model diversity associated with optimization is essential to avoid underestimating or miscalculating the actual uncertainty in the system.
Kim and Vossepoel [
12] focused on the ensemble size (the number of ensemble members) to maintain the model diversity for land subsidence data assimilation with a particle filter. If the ensemble size is small, the complex nature of land subsidence degenerates the insensitive parameters’ diversity because of the pseudo-correlation between the parameters and the simulation. It is important to ensure the diversity of the parameters that are insensitive to reproduction analysis, since they may be sensitive to prediction [
13,
14]. Jha et al. [
6] found a case where the ensemble converged too quickly using ES for UGS gas reservoir surface displacement data in northern Italy. Their case implies that while data assimilation considers multiple model states simultaneously, premature convergence can occur if the model diversity is not sufficiently maintained, similar to the initial value dependence in deterministic inversion [
15]. Refs [
5,
6,
9] showed that the posterior ensemble is influenced by the quality of observation data, its utilization, and integration with forward analysis. Thus, to properly perform data assimilation for land subsidence, complex and specialized adjustments are necessary regarding parameter search range and hyperparameter settings [
5]. Because of this, Kang et al. [
16] described ES as unstable in highly complex problems. To overcome these limitations, attempts are underway to add an initial ensemble selection scheme to ES [
16,
17], but the algorithm is becoming more complex.
The greatest problem lies in the nonlinear and complex nature of land subsidence. Data assimilation methods that are adaptive to nonlinear problems and flexibly adjust the balance between optimization and model diversity maintenance are necessary to address the different scenarios and challenges in land subsidence analysis.
Evolutionary-based data assimilation (EDA) has never been applied to land subsidence but is a promising option. EDA is a data assimilation process using an evolutionary algorithm (EA), a population-based optimization method inspired by biological evolution [
18]. The advantages of adopting EDA are as follows: First, EDA does not assume linearity or differentiability in the optimization problem, making it applicable to nonlinear land subsidence inversion. Second, since EA is a relatively classical optimization method, various approaches have been proposed to maintain the model diversity while advancing optimization, and EDA can take advantage of the accumulated knowledge. Recently, EDA has been studied for nonlinear problems such as the classical Lorenz model [
19] and streamflow forecasting [
18,
20]. If EDA is available in addition to the existing data assimilation methods for land subsidence, a more appropriate approach can be selected depending on the situation.
This study investigates the capability of EDA to quantify the uncertainties in land subsidence model parameters through a case study in Kawajima (Japan), with a particular focus on maintaining the model diversity. In Kawajima, the authors previously developed a one-dimensional land subsidence model with a deterministic evolutionary algorithm [
21]. Although the existing model elucidated the land subsidence mechanism caused by seasonal groundwater level fluctuations, quantifying the uncertainty in the model parameters and prediction was a remaining problem. EDA addresses the model parameter uncertainty.
Predictive uncertainty is quantified through an ensemble prediction consisting of members with equivalent confidence. An essential application of quantified predictive uncertainty is decision making [
22]. Ideally, all potential predictions should be captured by identifying all the possible model parameter combinations. However, due to practical limitations, the number of predictive ensemble members may be too small, the prediction may be biased, or the predictive uncertainty may be underestimated or too overestimated to provide helpful information for decision making.
A promising solution is to transform ensemble predictions into predictive probability density functions (PDFs) while correcting for biases and variances in the original ensemble predictions through post-processing, as is often used in weather forecasting [
23]. There are multiple advantages to using post-processed predictive PDFs instead of raw ensemble predictions: (1) Post-processed predictive PDFs can be interpreted probabilistically even if the number of original ensemble members is small; (2) it is expected to improve the predictive performance by correcting for the predictive bias and over/underestimated prediction spread; (3) the post-processed predictive PDFs become robust when the original ensembles consist of several ensembles with different governing equations, initial conditions, and boundary conditions. Developing different ensembles is possible because post-processing is independent of the ensemble-building process. For the same reason, post-processing can be combined with any data assimilation method. If post-processing can complement the potential drawbacks of the filter or smoother approach, post-processed predictive PDFs will outperform the raw ensemble predictions in predictive performance.
To the best of the authors’ knowledge, this study is the first to propose the post-processing of the ensemble predictions of land subsidence. The employed post-processing method is ensemble model output statistics (EMOS) [
24], one of the standard statistical post-processing techniques in weather forecasting. EMOS outputs Gaussian PDFs using the linear regression of the mean and variance of the original ensemble predictions. Thus, the EMOS prediction loses the potential multimodality of the original ensemble predictions but inherits the time-series trends of the original ensemble predictions and benefits from the post-processing described above. The regression coefficients are determined by minimizing the average value of the continuous ranked probability score (CRPS) over a given period prior to the prediction. The CRPS is a statistical measure of the distance between the observed value and the predictive distribution. The EMOS prediction with the regression coefficients determined by minimizing the average CRPS corrects for predictive bias and over/under-dispersive spread of the original ensemble prediction.
EDA and EMOS can potentially contribute to land subsidence modeling, but their performance needs to be investigated due to the lack of relevant studies. There are two research objectives: (1) to validate the performance of EDA in quantifying uncertainty in land subsidence model parameters and (2) to validate the performance of EMOS in quantifying long-term predictive uncertainty. This study involves the application of EDA and EMOS in Kawajima (Japan). Model parameters and predictions generally have different purposes and uses for evaluating uncertainty. Therefore, there is no guarantee that the parameter ensembles estimated to characterize the subsurface will provide helpful predictions for decision making. Combining EDA and EMOS solves this problem because the model parameter uncertainty and the predictive uncertainty are quantified independently. This point is demonstrated through a case study in this paper.
The paper’s organization is as follows:
Section 2 describes the methodology, including the development and adjustment of the combination of EDA and EMOS for the land subsidence modeling.
Section 2 also briefly describes the authors’ previous work on field data and the existing deterministic model in Kawajima [
21], which are transferred to the case study to examine the performance of the proposed method in this study.
Section 3 presents the results and discussions of applying EDA to the land subsidence model in Kawajima.
Section 4 describes the post-processing of the prediction of the ensemble constructed in
Section 3. The results and discussions on EMOS operation are presented by comparing the raw ensemble predictions and EMOS predictions. In
Section 5, multiple other factors are considered that can affect the modeling results.
Section 6 describes the conclusions.
2. Methods
2.1. Vertically One-Dimensional Land Subsidence Simulator
In this study, we used the existing land subsidence simulator developed by [
25]. The simulator performs a coupled analysis of saturated groundwater flow and vertical uniaxial soil deformation. The elastic deformation is described by linear elasticity, and the plastic (inelastic) deformation is described by a modified Cam-clay model [
26]. This simulator was previously used in Tokyo [
25] and Kawajima (Japan) [
21]. Here, only the essential points are concisely introduced (see [
21] for details).
The governing equation of the groundwater mass conservation law is as follows:
where
is the water density (kg m
−3) (assumed to be constant);
is the initial void ratio (-);
is the void ratio (-);
is the hydraulic conductivity (m s
−1);
is the gravitational acceleration (m s
−2); and
is the pore pressure (Pa). The void ratio change is composed of the elastic component
and the plastic component
. The elastic change in the void ratio
is linear with the effective stress changes. When the effective stress changes from
to
,
is
where
is the specific storage (m
−1). Here, the compression is taken to be positive, and the effective stress is
where
is the total stress. The plastic change in the void ratio
is calculated using the modified Cam-clay model [
26].
where
is the compression index, and
is the past maximum effective stress (preconsolidation stress). Note that during plastic deformation, the elastic component in the void ratio change is canceled out by (2) and the second term of (4), and only the first term of (4) remains.
2.2. Evolutionary-Based Data Assimilation (EDA)
EDA is a technique that formulates EA for data assimilation from a Bayesian perspective. Therefore, EDA procedures follow the EA procedures. Since the main focus of this study is an EDA exercise for land subsidence and not a formulation, see [
18] for a detailed organization of EDA based on a Bayesian perspective.
EA is a population-based optimization method in which the components are described using biological evolution as an analogy. The given optimization problem is called the environment, the solution is called the individual, the set of solutions is called the population, and the evaluation of each solution is called the fitness. EA aims to acquire individuals with high fitness after many cycles of population evolution. Individuals in a population compete with each other based on fitness to improve their fitness in the next cycle.
We call the
-th set of subsurface physical parameters as individual
. The fitness of individual
is defined as the inverse of the root mean square error (RMSE) between the observed and simulated subsidence:
where
is the RMSE calculated by individual
. The RMSE is the degree of agreement between the observed and simulated subsidence:
where
is the total number of observed subsidence data,
(m) is the observed subsidence at the
-th timestep, and
(m) is the simulated subsidence at the
-th timestep by individual
. Because the smaller RMSE means a better agreement between the observed and simulated subsidence, an individual with high reproducibility has a high fitness. Equation (6) defines the assimilation target data using a smoothing approach.
EA generates new individuals by mixing the parameters of existing individuals as a crossover analogy. The new individuals are called offspring, and the individuals to be mixed are called parents. The two parental individuals are selected probabilistically from the population through a virtual roulette [
27] with the probability proportional to the fitness:
where
is the probability that individual
is selected as a parent, and
is the population size. The probability of becoming a parent is based on competition; individuals better fitted to the environment are more likely to produce offspring.
The parameter values of the offspring are random weighted averages of the parents’ parameter values, and thus, the offspring inherits the characteristics of the parents. Furthermore, normal random numbers are added to the parameter values of the offspring. This analogy for mutation encourages model diversity maintenance and solution search to escape from the local solution. The process of offspring production is repeated until the cumulative number of offspring reaches the population size. The new population is more fitted to the environment and replaces the current population. This is the evolution of the population to fit the environment better and promotes the assimilation to the observed subsidence defined in (6). Thus, the evolution of the population is equivalent to model updating in the ensemble smoother based on the Kalman gain function considering all past observations. The cycle of generation and replacement of a new population is called a generation, which is equivalent to an assimilation cycle.
Specific individuals with higher fitness in a population are called elites. Passing the elites unchanged to the next generation prevents the loss of good individuals once found [
28,
29]. Elites are equivalent to an ensemble. Note that the elites are a subset of the population, while the nonelite individuals are the search points.
The evolution cycle homogenizes the elites and eventually causes ensemble collapse (
Figure 1a without fitness sharing) because mutation is insufficient to maintain the model diversity. Thus, we added a fitness-sharing procedure [
30], the most popular implementation that assists in maintaining model diversity. Fitness sharing modifies the fitness of all individuals downwardly according to the density of individuals in the solution space. In the optimization process with fitness sharing, individuals are aggregated toward better directions in the solution space while repelling each other. As a result, various local optimal individuals are found (
Figure 1a).
The concept of fitness sharing is analogous, where each individual has a territory, and the individuals sharing the territory reduce each other’s fitness. The territory is called the niche, and the size of the niche is called the niche radius, which is the key to fitness sharing. The fitness reduction is significant when there are many individuals in a niche. The downward fitness modification of densely distributed individuals promotes model diversification and escape from local solutions by causing individuals to repel each other in the solution space.
Fitness sharing downwardly modifies the fitness
of individual
based on the total distance in the solution space between individual
and all the individuals in the same niche. The fitness after the modification
is calculated as
where
is the population size,
is the sharing function, and
is the distance between individuals
and
in the solution space. The sharing function (
Figure 1b) determines the degree of fitness modification based on
. The sharing function is defined as
where
is a constant, typically set to 1 [
28,
31], that determines the shape of the sharing function (
Figure 1b), and
is the niche radius. The sharing function strongly decreases fitness when an individual is similar to another. Note that the sum of denominators’ calculation in (8) includes the individual
itself, so the denominator is never zero. Although the niche radius is a key factor controlling the strength of promoting model diversity (
Figure 1a), setting an appropriate value is difficult [
31,
32,
33,
34,
35,
36].
Furthermore,
is typically measured by the Euclidean distance [
37,
38]. If
is the dimensionality of the solution space, the Euclidean distance
normalized by the upper and lower bounds of each parameter is
where
is the value of parameter
of individual
,
is the value of
of individual
,
is the upper bound of
, and
is the lower bound of
.
Now, all the algorithmic components necessary for EDA are in place.
Figure 1c shows a flowchart of the implemented EDA. A more specific algorithmic procedure is given in
Appendix A. The algorithm aims to obtain diverse elites with comparable reproducibility of the observed subsidence.
An initial population is prepared.
All individuals are independently evaluated using (5) and (6).
The fitness of all individuals is modified through fitness sharing (8)–(10).
The number of individuals with the highest modified fitness are selected from the population as elites, where is the number of elites. Elites are copied to the empty population pool, where the population pool is the individuals that will be the population in the next generation.
The average RMSE among elites and the average Euclidean distance among elites are measured to examine the assimilation status and model diversity. The average Euclidean distance among elites
is calculated as follows:
where
is the normalized Euclidean distance between elites 𝑖 and 𝑗 calculated from (10). A decrease in 𝐷 implies a decrease in model diversity among elites.
If the number of generations reaches the predetermined number of EDA termination generations, the procedure ends with the output of the elites at that point. Otherwise, the EDA procedure moves forward.
Offspring is generated through parental selection, modified fitness, crossover, and mutation. The generated offspring is added to the population pool. The offspring generation and addition to the pool are iterated until the pool’s population reaches the population size.
The population pool replaces the current population, and the procedure returns to the fitness evaluation (the second step).
The implemented EDA simultaneously promotes optimization and diversification for elites by performing generational iterations. The final output is diverse elites, i.e., diverse sets of model parameters with similar reproducibility of observed subsidence. The niche radius controls the intensity of diversification, with larger niche radii increasing diversification. The number of generations is equivalent to the number of assimilation iterations for the same dataset in ES-MDA.
2.3. Converting Ensemble Predictions to Predictive Probability Distribution Functions with EMOS
This section describes the ensemble model output statistics (EMOS) theory, which converts ensemble simulations into Gaussian PDFs. It also describes how to evaluate the conversion.
2.3.1. Theory of EMOS
The EMOS, proposed by [
24], is a post-processing technique that transforms ensemble simulations into Gaussian PDFs while correcting for predictive bias and over/under-dispersive spread of the original ensemble. Since EMOS is a post-processing method, it can be applied to predictions by the parameter ensembles constructed by any type of inversion algorithm. It should also be emphasized that EMOS requires its own statistical training, independent of the construction of the parameter ensemble.
The main subject of EMOS is comparing the original ensemble prediction and the post-processed prediction. Here, we call the post-processed PDFs the EMOS distributions. The EMOS distributions cover both the ensemble assimilation period and the prediction period, where the ensemble assimilation period is the assimilation target period for the parameter ensemble, as defined in (6). Depending on the context, only the prediction period of the EMOS distribution may be of interest, in which case, the EMOS distribution is called an EMOS prediction.
If
denotes the mean of the EMOS distribution for time
,
is a bias-corrected weighted average of the simulated values of the ensemble members at
, as in (12).
where
denote subsidence quantities at
simulated by an ensemble consisting of
members,
is the time-independent bias parameter, and
are time-independent non-negative weights of a linear regression. If
denotes a variance of the EMOS distribution,
is a linear function of the variance of ensemble simulation at
, defined as
where
is a time-independent coefficient,
is a time-independent non-negative coefficient, and
is the ensemble variance at
. Combining (12) and (13) yields the EMOS Gaussian distribution as follows:
whose mean and variance depend on the ensemble simulation.
The most important part of the EMOS operation is the determination of the time-independent EMOS coefficients:
and
. The EMOS coefficients are determined by minimizing the average of the CRPS over a given period prior to the prediction. In this way, EMOS can correct for predictive bias and ensemble spread. The CRPS is a statistical measure of the distance between the point value and a distribution (not limited to a Gaussian distribution). The CRPS has the same unit as the point value and is defined as (15) [
39]:
where
is the point value,
is a distribution,
is the cumulative distribution function (CDF) of
, and
is the Heaviside function (16).
Figure 2 illustrates the concept of the CRPS using the observed cumulative subsidence and the EMOS Gaussian distribution as an example. The CRPS is small when the predictive PDF is sharp near the actual value. Conversely, if the predictive PDF is very dispersive, or if the predictive PDF is sharp but the peak is far from the actual value, the CRPS is large. If the prediction is very confident, i.e., in the case of the deterministic prediction, the CRPS is equivalent to the mean absolute error [
39].
When calculating the CRPS for the EMOS distribution at time
,
is replaced by
, which is the observation at time
;
is replaced by the EMOS Gaussian distribution (14); and
is replaced by the CDF of (14). In this case, the CRPS (15) can be expressed with a closed-form analytical solution [
24] as follows:
where
and
denote the PDF and the CDF, respectively, of a Gaussian distribution with mean 0 and variance 1 evaluated at the normalized error
. If
is the observation for time
from the time series of observation
, the average CRPS over the period from
to
is calculated from (17) as follows:
where
is the average CRPS over the period from
to
. Training EMOS or CRPS minimization refers to finding the time-independent EMOS coefficients
and
that minimize (18) for the specific period. This period from
to
is called the EMOS training period.
CRPS minimization allows the EMOS distribution to reasonably account for the variability in observations over the EMOS training period, i.e., correct for bias and the over/under-dispersive ensemble spread. When the ensemble spread is excessive relative to the training data, CRPS minimization tightens the EMOS distribution, but the variability in the training data does not allow for excessive tightening. Conversely, when the ensemble spread is underestimated relative to the training data, CRPS minimization widens the EMOS distribution, but the variability in the training data does not allow for excessive widening.
The EMOS coefficients were obtained by numerically minimizing (18) using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. A globally optimal solution is not guaranteed since the BFGS algorithm depends on the initial values. Thus, we prepared 40 sets of initial values and adopted the best result as the optimal solution. The 40 initial sets consisted of 39 randomly prepared sets and the simplest set from [
40]:
, equally weighted
,
, and
. This is the simplest approach because the mean of the predictive Gaussian distribution is equal to the original ensemble mean, and the variance of the predictive Gaussian distribution is equal to the original ensemble variance. To constrain
to be non-negative, the following procedure was used: First, (18) was minimized without any constraint on
. If all
were non-negative, the minimization was complete. If there were one or more negative coefficients, they were set to zero, and (18) was minimized again under this constraint. The ensemble variance was then recalculated using only the ensemble members remaining in the regression equation. Furthermore, the variance coefficients
and
must satisfy the non-negativity of (13) in addition to the non-negativity of
. Although non-negativity was not required for
, both
and
were obtained through optimization over
and
, which satisfied
and
.
2.3.2. Evaluation Criteria
The length of the EMOS training period affects the EMOS distribution [
24]. Short training periods pose the problem of arbitrariness in the choice of training period. Long training periods cannot incorporate short-term biases. We shifted the EMOS training period to find the appropriate one and examined the predictive performance change. The evaluation criteria were (1) the empirical coverage, (2) the predictive coverage, (3) the RMSE, and (4) the CPRS. The RMSE was based on the EMOS mean (50% percentile). These criteria were used to measure the accuracy and sharpness of the predictive distributions. The definitions of the empirical coverage and predictive coverage are described below.
Coverage measures the proportion of actual values within certain probability intervals (PIs). If
denotes a set of the central
PIs from time
to
,
is the
% percentile, and
is the
% percentile of the ensemble simulation or the EMOS distribution at time
. For example, when considering the 90% PIs for the ensemble simulation, the simulated member values corresponding to the 5% and 95% percentiles form the lower and upper bounds of the intervals. The PIs for the EMOS distribution are denoted in the same way. The indicator variable
(19) is used to calculate the coverage of the
PIs.
where
is the observation for time
from the time series of observation
. Then, coverage over the period from
to
is defined as follows:
There are several types of coverage with different concepts, such as nominal, empirical, and predictive. Nominal coverage is
for (19) and is predetermined using the modeler. Empirical coverage is the coverage over the EMOS training period, calculated as the percentage of training data falling within the
% PIs to be trained. The significance of the empirical coverage is to test whether the
% PIs explain
% of the past observations. The predictive coverage refers to the predictive performance calculated by (20) over the prediction period. Ideally, the observations should be indistinguishable from random draws from the PDFs [
23]. Thus, both the empirical and predictive coverage should match the nominal coverage. However, it is difficult to expect a complete agreement between the predictive and nominal coverage in long-term prediction. The reason is that nominal coverage expects a time-independent
% prediction accuracy, whereas the predictive uncertainty typically increases over time (see [
41] for a rigorous discussion).
2.4. Field Data Description: Kawajima, Japan
We applied this methodology to the field data in Kawajima town (Japan), where the authors previously reported the seasonally progressing land subsidence [
21]. The previous work analyzed the land subsidence mechanism through a deterministic inversion for a one-dimensional model using an evolutionary algorithm without fitness sharing. Here, we briefly describe the work by [
21] because this study performs an EDA using the same model domain, initial pore pressure distribution, boundary conditions, and observation data.
Kawajima is in the Arakawa lowland, where the entire surface layer is covered by Holocene deposits (as shown in
Figure 3a,b). The surrounding surface layers consist of Pleistocene strata. The Arakawa River has eroded these Pleistocene strata and deposited sediments, forming the Holocene strata. Cross-sections of the north–south and east–west geological and hydrogeological profiles are shown in
Figure 3c,d. The unconsolidated Holocene sediments have a 25–30 m thickness, posing a high risk of land subsidence due to groundwater extraction from shallow wells.
Over 50% of the land use in Kawajima is agricultural, with most groundwater pumping occurring within a depth of 50 m. Agricultural groundwater usage during the summer significantly surpasses groundwater usage for other purposes. Consequently, groundwater levels exhibit seasonal fluctuations, primarily due to this seasonal pumping. This leads to a cyclical pattern of elastic expansion and elastoplastic compression of the ground each year, contributing to a cumulative process of land subsidence. Deformation data indicate that elastoplastic deformation is confined to formations between 0 and 80 m depth, while only elastic deformation is observed in deeper formations.
Groundwater levels have recovered in the long term, but land subsidence has progressed due to seasonal groundwater level fluctuation (
Figure 4). Drought years have both sudden hydraulic head drop and significant subsidence due to increased water demand. The numerical model domain consisted of 87 meshes with nine types of layer classification (S, F1 to F5, and T1 to T3) (
Figure 4c and
Table 1). “S” represents the surface soil, “F” represents the aquifer, and “T” represents the aquitard. The following parameters were set for the layers so meshes belonging to the same layer would have identical properties: hydraulic conductivity
(m/day), specific storage
(1/m), the compression index
(-), the initial void ratio
(-), water density (assumed to be constant at 1000 kg/m
3), solid density (assumed to be constant at 2600 kg/m
3), and the overconsolidation depth (OCD) (m). The OCD is the past-maximum thickness of the overburden layer above the target layer [
21]. The OCD is equivalent to the preconsolidation head at the initial condition. The preconsolidation head is the threshold hydraulic head below which inelastic compaction begins [
42]. The simulation period was from January 1945 to April 2019, with a time step of one month. The hydraulic head shown in
Figure 4a is a boundary condition set for the mesh at the observation screen’s center (
Figure 4c). The surface mesh had atmospheric pressure. The bottom boundary had zero mass flux. The initial pore pressure distribution was hydrostatic, with a hydraulic head of 13.35 m. The initial preconsolidation stress distribution depended on the OCD. After estimating
,
,
,
, and OCD for nine layer types using evolutionary algorithm deterministic inversion, the model reproduced the observed subsidence with an RMSE accuracy of 3 mm.
The model explained the land subsidence mechanism through three sets of layers:
The first set was the aquifer consisting of F3, F4, and F5 with dynamically varying hydraulic head boundary conditions. Due to the thin clay layers in the aquifer, it had plastic deformations in drought years. The plastic deformation near permeable layers was essential to reproduce the significant subsidence that occurred only in drought years due to the sudden drop in the hydraulic head.
The second set was T3, which had low hydraulic conductivity. T3 buffered the hydraulic head change propagating upward from the aquifer.
The third set was F2 and T2, where the hydraulic head kept declining without seasonal fluctuations due to T3’s buffer. Due to the slow propagation of hydraulic heads, the aquifer’s long-term head recovery since the 1970s has not been transmitted to the third set, contributing to long-term subsidence.
Although the authors’ previous work [
21] elucidated the land subsidence mechanism, the uncertainty quantification in the model parameters and the predictions remains challenging. The quantification of these uncertainties is essential in considering future groundwater management.
4. Quantification of Predictive Uncertainty Using EMOS
To test the EMOS’s performance in applying land subsidence ensemble prediction, we conducted EMOS training on the parameter ensembles obtained in
Section 3. Then, we compared the predictive performance between the raw ensemble predictions and the EMOS predictions. Here, the EMOS distribution trained using ensemble “Name” is called “Name-EMOS”. For example, the EMOS Gaussian predictions acquired through post-processing “A2” is “A2-EMOS”.
The criteria used to evaluate the predictive performance were empirical coverage, predictive coverage, RMSE, and CRPS. The nominal coverage was 90%. Thus, the 90% PIs (90% probability intervals formed by the upper 95th percentile and lower 5th percentile of the raw ensemble predictions or the EMOS predictions) were the evaluation target for both the empirical coverage and predictive coverage. For the same reason, the ideal value for both the empirical coverage and predictive coverage was 90%. For comparison, we calculated the empirical coverage of the raw ensemble prediction for the same period as the EMOS training period.
The RMSE measures the agreement between the observed subsidence and deterministic predictions. We used the ensemble mean to calculate the RMSE for the raw ensemble predictions. We used the EMOS mean to calculate the RMSE for the EMOS predictions. For both the RMSE and CRPS, smaller values indicate better predictive performance.
Note that the predictive performance concerns the long-term prediction accuracy. The groundwater levels used for the boundary condition during the prediction period were the actual groundwater levels in the future. Therefore, for the predictive performance in this study, we assumed a perfect estimate of the future groundwater levels.
4.1. EMOS Results Using Different Ensemble Spreads
The EMOS training was performed for A1 to A4 by shifting the training period. The objectives were (1) to test EMOS’s performance on ensembles with different ensemble spreads and (2) to explore the appropriate EMOS training period. For shifting the EMOS training period, we analyzed 5 to 120 months prior to the prediction.
Figure 10 shows the changes in the evaluation criteria when the EMOS training period was shifted.
4.1.1. Agreement between the Nominal Coverage and the Empirical Coverage
The empirical and nominal coverage (90%) agreement verifies the past statistical agreement between observations and distributions (ensemble simulations or EMOS distributions). Raw A1 had 100% empirical coverage for the 5–46-month training periods, meaning the ensemble spread was over-dispersive. Indeed, the parameter distribution (
Figure 6 and
Figure 7) and reproduction analysis (
Figure 8) indicated that raw A1 was over-dispersive. On the other hand, raw A2, A3, and A4 had much lower empirical coverages than nominal coverage, indicating that the ensemble spread was under-dispersive.
After EMOS training, all A1-EMOS, A2-EMOS, A3-EMOS, and A4-EMOS exhibited approximately 80–94% of empirical coverage. This means that EMOS successfully corrected the ensemble spread, regardless of whether the original ensemble spread was over- or under-dispersive.
4.1.2. Change in Predictive Performance When Shifting the EMOS Training Period
The long-term predictive performance was assessed for the prediction horizon from 1 to 60 months ahead (5-year prediction), from 1 to 120 months ahead (10-year prediction), and from 1 to 180 months ahead (15-year prediction).
Table 5 shows the predictive performance of raw A1, A2, A3, and A4. The predictive coverage of raw A1 was 100% for all of the 5-year, 10-year, and 15-year predictions because raw A1 was over-dispersive, i.e., the 90% PIs were too broad. This means that the prediction by raw A1 provides less meaningful information for decision making.
On the other hand, the predictive coverages of raw A2, A3, and A4 were below 58% for all of the 5-year, 10-year, and 15-year predictions. This is much lower than the 90% nominal coverage. The main reason for this low predictive performance was that the PIs with 90% coverage were too short, but the failure to capture the long-term trend due to bias also reduced the predictive performance. Better predictive coverage, RMSE, and CRPS were found in the 15-year prediction than in the 10-year prediction for raw A2, A3, and A4. This is because the systematic errors related to the Tohoku earthquake in March 2011, reported by [
21], coincidentally acted as a bias correction to the long-term prediction trend since the earthquake.
A good EMOS prediction works robustly against the training period shift. The robust period maintains a similar output to other training periods. In this way, there is no confusion in the choice of the training period. The predictive coverage (
Figure 10) had no robust training periods for the A1-EMOS. The reason was the significant reproduction error of A1 during the assimilation period, which made the EMOS training difficult. The A1-EMOS became sensitive to the training period shift, and the prediction bias was often corrected in the wrong direction, causing unstable predictive performance.
On the other hand, A2-EMOS, A3-EMOS, and A4-EMOS had good predictive performance in the 5-year and 10-year predictions, with training periods ranging from 74 to 101 months. Strictly speaking, the predictive coverage was stable at around 96%, which was slightly over-dispersive but satisfactory considering the long-term prediction’s difficulty and the raw ensembles’ poor predictive performance. When the training period exceeded 101 months, A2-EMOS, A3-EMOS, and A4-EMOS often had 100% predictive coverage in the 5-year and 10-year predictions. For the 15-year prediction, the training periods longer than 98 months showed good performance. Good predictive performance was also observed for the training periods of 39 to 47 months because the training data matched the trend of the verification data.
4.2. Comparison between Raw A2 and the A2-EMOS Trained over 92 Months
To examine the EMOS’s performance in detail, we compared the raw A2 ensemble and A2-EMOS trained over 92 months (September 1992 to April 2000) as a typical example from the robust training periods.
Table 6 compares the evaluation criteria for the raw A2 ensemble and A2-EMOS. For all criteria, A2-EMOS was superior to raw A2.
Figure 11 shows the 90% PIs of raw A2 and A2-EMOS. The EMOS mean captured the observations better than the ensemble mean of raw A2 until it was affected by the systematic error due to the Tohoku earthquake in March 2011.
Notably, the predictive uncertainty of A2-EMOS remained constant over time. A2-EMOS with other robust training periods confirmed the same finding. This may be because raw A2’s ensemble spread was not constant for the time horizon. From 1995 to 2000, raw A2’s ensemble spread slightly increased. On the other hand, there was no such time trend in the observations. Thus, when training EMOS, the coefficient for the ensemble variance was not favorable, and the term constant over time was preferred, resulting in the constant EMOS prediction spread. Restrictions can be placed on and in CRPS minimization to prevent or obtain such a constant EMOS prediction spread (e.g., set to zero to prevent; set to zero to obtain).
4.3. EMOS for Updated Ensembles
To test the EMOS in practical situations, we conducted EMOS training on the updated ensembles B, C, and D using the past 92 months of observations.
Figure 12a–f compare the reproduction and prediction between the raw ensembles and the EMOS distributions. The results indicate the necessity and difficulty of considering the seismic effects after March 2011. For example, the raw D predictions included biases that overestimated subsidence after 2015 (
Figure 12e). D-EMOS trained over 92 months also failed to capture the observed trend after the earthquake (
Figure 12f).
Here, we show that EMOS with a short training period can correct the bias resulting from the seismic effect on the prediction. We conducted a training period shift for D-EMOS and checked the change in the predictive coverage (the prediction target was the observations from May 2015 to April 2019).
Figure 12g shows the results. The EMOS prediction with a training period of 20 to 30 months showed good predictive performance due to intensive correction for the short-term bias.
Figure 12h illustrates the prediction results of the D-EMOS trained over 20 months (September 2013 to April 2015). The bias related to the earthquake was successfully corrected. It should be noted, however, that the EMOS training became more sensitive to shifting the training period when it was shorter.
4.4. Strategies for Better and Robust EMOS Prediction
As mentioned in
Section 4.1.2, it is preferred to train EMOS with the robust period, which maintains a similar output with other training periods. Sometimes, however, EMOS training against this basic strategy yields good results. As a short training period successfully removed the seismic effect from the prediction in
Section 4.3, the modeler sometimes needs to subjectively set the training period depending on the situation (the observed data trend, the limitations of the numerical model, and the potential drawbacks of the filter approach or smoothing approach). In essence, the appropriate EMOS training period should be comprehensively determined by a modeler who understands the context of the problem.
Because EMOS is based on regression, it cannot correct for errors beyond the model representation. Thus, improving the predictive performance of the raw ensemble without relying too heavily on post-processing is essential for improving the EMOS predictive performance. Although not addressed in this study, predictions are robust when the ensemble is composed of ensemble members with different governing equations, initial conditions, and boundary conditions. In this way, ensemble predictions can incorporate various types of uncertainty. Because EMOS is independent of the ensemble construction process, it can output Gaussian predictions using mixed ensemble members. Indeed, this approach is standard in climate forecasts [
50] and streamflow forecasts [
51].
Raw ensemble predictions from EDA based on the smoothing approach can handle multimodal (multipeaked) predictions, but its short-term bias is one of its disadvantages. In addition, the spread of the EDA’s ensemble predictions is influenced by the niche radius, making its control challenging. On the other hand, EMOS prediction has superior bias correction, prediction spread correction, and interpretability. However, EMOS output is Gaussian distribution predictions, which greatly simplify the potentially valuable information that the raw ensemble predictions have. Thus, EMOS may oversimplify the representation of uncertainty in situations where predictions are not expected to gather around a mean value. Using EMOS for land subsidence prediction requires careful consideration of the specific modeling requirements and the nature of the uncertainties involved.
4.5. Scenario Analysis
We performed scenario analyses to demonstrate practices in groundwater management planning. The visualization of the predictive uncertainty helps decision makers consider whether the effort required to implement each scenario is worth the return obtained. Three hydraulic head scenarios prepared by [
21] were used (
Figure 13). Scenario 1 continued the seasonal fluctuation of the hydraulic head in 2019. Scenario 2 was half the seasonal variation in Scenario 1. Scenario 3 had the same seasonal fluctuations as Scenario 1, with a long-term trend of hydraulic head recovery (0.055 m/year). The hydraulic heads of all scenarios were smoothly connected to the observed head in April 2019 and were used as boundary conditions. Ensemble E, assimilated to all the data, was used to generate the predictions for each scenario.
Figure 14 shows the scenario analysis results. The observed subsidence from mid-2014 to 2019 showed a relatively slow subsidence trend. On the other hand, the raw E simulation failed to capture the observed slow subsidence trend in the short term because of the smoothing assimilation of the long-lasting subsidence. The rapid subsidence bias of raw E needs to be corrected to make good predictions.
We conducted EMOS training on ensemble E over 71 months (June 2013 to April 2019). We chose this training period because (1) the training period after late 2013 corrected for the seismic effect on prediction in
Section 4.3, and (2) the longer training period provided more stable training results.
The resulting E-EMOS 90% PIs (
Figure 14) successfully captured the short-term trend of gradual subsidence from June 2013 to April 2019, and thus, good predictive performance was expected after 2019.
Figure 15 compares the E-EMOS 90% PIs for Scenarios 1, 2, and 3. The mean subsidence rate was −0.59 mm/year for Scenario 1, −0.44 mm/year for Scenario 2, and −0.50 mm/year for Scenario 3. The width of the E-EMOS 90% PIs was nearly constant over time at 3.2 mm for all three scenarios because the EMOS coefficient
for the ensemble variance was negligibly small. Similar results were obtained when the EMOS training period was shifted by several months.
5. Other Uncertainty Factors
This study primarily focused on model parameters and prediction uncertainties in land subsidence modeling. This section describes how we simplified multiple other factors that possibly influence modeling results.
Search parameter selection: We selected 45 parameters to explore, comprising five parameters (, , , , and OCD) in the nine types of layers. The selection considered the parameters’ relevance to the simulation results and the overall complexity of the inverse problem. Solid-phase density was assumed to be constant in all layers, but its influence was buried in the impact of other parameters. However, it might be considered in different geological and hydrogeological environments and at larger scales.
Geological, hydrogeological, and groundwater use homogeneity: We employed a vertical one-dimensional model. Stratigraphy is based on existing borehole data. This model domain structure assumes a horizontally homogeneous distribution of geology, hydrogeology, and groundwater use. In our case, the assumption is supported by the observation data in
Section 2.4. Otherwise, a three-dimensional analysis is required.
Measurement error: Because the measurement errors of the borehole extensometer at Kawajima were not quantified in the existing studies, we could not add reasonable measurement errors and consider their possible effects on the inversion procedure. From another point of view, adding virtual measurement errors to the observation data in ES and ES-MDA is common. However, we did not apply it in EDA. Adding measurement errors aims to maintain the model diversity rather than accurately represent measurement errors. Indeed, the standard algorithm artificially expands the covariance matrix of the measurement errors using inflation factors, e.g., [
8]. Furthermore, explicitly incorporating measurement errors into the algorithm does not guarantee that the reproduction errors of the resulting ensemble will match input measurement errors. In the sense that the goal is to maintain the model diversity, the EDA’s fitness sharing controlled by niche radius is an alternative to adding measurement errors. Although the niche radius is not directly related to measurement errors, its role in the algorithm is similar to inflation factors. Establishing an appropriate niche radius is difficult, as is determining the appropriate inflation factors.
It is crucial to recognize that data assimilation may make simulations consistent with observations even under invalid assumptions. The alignment might mask misinterpretations of underlying phenomena. A comprehensive examination of the consistency between observation data and modeling strategies will ensure that data assimilation does not merely fit models to observations but captures the essence of geophysical processes. The uncertainties and limitations mentioned above should be addressed in future studies.
6. Conclusions
The nonlinearity nature of land subsidence and limited observation data reduce model diversity during the data assimilation process, leading to both the underestimation and miscalculation of uncertainty in model parameters and prediction. EDA and EMOS are potentially promising solutions, but their performance has been unknown. This paper presented a case study in Kawajima with two research objectives: (1) to validate the performance of EDA in quantifying uncertainty in land subsidence model parameters and (2) to validate the performance of EMOS in quantifying long-term predictive uncertainty.
When performing EDA using a smoothing approach with multiple data assimilations, it took several dozen assimilation cycles for the ensemble to assimilate to the data sufficiently, and model diversity was maintained even after 1000 assimilation cycles for the same dataset. The balance of reproducibility and model diversity is controlled by niche radius, and in this study, the best balance was found in ensemble A2 with a niche radius of 1, which was well correlated to the mutation settings. The average RMSE of the reproduction analysis using A2 was 4.1 mm, indicating EDA’s high reproduction performance. The depth distribution of the estimated parameters was consistent with soil types. Assimilating new observations did not constrain the parameter uncertainty, but the predictive uncertainty was improved. Considering the overall results, EDA was excellent at maintaining the model diversity.
EMOS Gaussian predictions outperformed the raw ensemble predictions in predictive performance because EMOS compensated for the over/under-dispersive prediction spread and the short-term bias, a potential weakness for the smoothing approach, using past observations statistically. For example, in the 5-year prediction period following April 2000, the A2-EMOS 90% PIs trained over 92 months achieved better performance (coverage: 95%, RMSE: 0.16 cm, CRPS: 0.091 cm) than the raw A2 90% PIs (coverage: 31.7%, RMSE: 0.35 cm, CRPS: 0.21 cm). The raw ensemble predictions, influenced by the earthquake since 2011, tended to overestimate long-term subsidence. On the other hand, EMOS predictions with a short training period showed good predictive performance due to the effective correction of the short-term bias. Furthermore, scenario analysis using EMOS predictions showed that a groundwater management strategy that controls seasonal hydraulic head fluctuations reduces land subsidence more than a long-term hydraulic head recovery strategy.
There is no guarantee that the parameter ensembles estimated to characterize the subsurface will make helpful predictions for decision making. Combining EDA and EMOS solves this problem because the model parameter uncertainty and the prediction uncertainty are quantified independently, as was demonstrated in this study. The proposed methodology contributes to understanding and managing groundwater and land subsidence, considering both the model parameter uncertainty and the predictive uncertainty.