1. Introduction
Pricing multi-peril agricultural insurance under compound climate hazards requires evaluating a nonlinear conditional expectation of a terminal claim driven by correlated perils while optimising a green-finance control that feeds back into the loss dynamics. Existing approaches address tail dependence, non-stationary forecasting, and sequential control in isolation [
1,
2]. The premium naturally inherits the nonlinear expectation structure developed by Peng [
3], embedding actuarial properties such as monotonicity and risk loading within a coherent probabilistic framework.
BSDEs were introduced by Pardoux and Peng [
4] and have become a central tool in mathematical finance. In insurance, BSDEs have been applied to optimal reinsurance [
5] and premium principles under ambiguity [
6]. However, applications to multi-peril agricultural insurance with compound dependence and green-finance controls are absent from the literature.
Research on compound climate risk has shifted from single-factor assessments to compound extreme-event modelling via copula-based dependence structures [
7,
8,
9]. Goodwin and Hungerford [
1] applied copula-based models to systemic agricultural risk but without temporal dynamics. Black–Scholes and fractional extensions have been used for agricultural premium determination [
10,
11,
12]. However, these assume specific parametric dynamics that may not capture the full dependence spectrum of multi-peril risks. In the domain of green finance, green credit, bonds, and carbon markets mobilise funds and mitigate environmental risks [
13], with regional effectiveness varying significantly [
14,
15]. Miao et al. [
16] investigated the influence of green technological innovation on resource utilisation efficiency but did not operationalise this within a dynamic pricing framework.
The broader literature on agricultural insurance design [
17,
18] and government support mechanisms [
19] further motivates the need for rigorous pricing methods that account for climate non-stationarity. Recent advances in agricultural decision systems [
20] and food-security assessment under climate change [
21] reinforce this urgency. Q-learning [
22] and its deep extensions have been applied to financial decision-making [
23] but rarely to insurance pricing. The well-known theoretical connection between discrete-time Q-learning and the continuous-time Hamilton–Jacobi–Bellman equation [
24] has not been exploited to provide convergence guarantees in an insurance-pricing context. Recent work on deep BSDE solvers [
25] opens the possibility of scaling such approaches, while game-theoretic analyses of insurance markets [
26,
27] and information-asymmetry models [
28] provide complementary perspectives on market equilibrium.
Table 1 contrasts the proposed framework with representative existing studies.
The novelty resides not in these components individually but in the structural integration: the BSDE driver provides a single mathematical object in which copula dependence, recurrent forecasting, and policy optimisation interact through rigorously defined information flow, enabling comparison, stability, and convergence results that are inaccessible when the same tools are composed ad hoc. In particular, the copula correlation matrix
enters the driver’s risk-loading term and simultaneously determines the forward jump structure; the LSTM approximates the conditional expectation required by the Euler step; and Q-learning solves the discrete HJB arising from the same Euler discretisation. This tight coupling, formalized in Theorem 4, is what distinguishes the framework from a mere concatenation of existing methods. Starting from the expected-loss premium, we successively incorporate a copula-based risk loading, a green-finance implementation cost, and an actuarial penalty, with each term motivated by a specific economic or regulatory requirement as detailed in
Section 2.2. The existence and uniqueness result in Theorem 2 follows from standard BSDE theory applied to our specific driver. The contribution here is in verifying the requisite conditions for the copula-structured, control-dependent driver as established in Lemma 2, and in showing that the premium admits a nonlinear expectation representation as stated in Proposition 1.
The second contribution concerns structural properties inherited from the nonlinear expectation. The comparison theorem (Theorem 3) and the stability estimate are adaptations of standard BSDE results [
30] to the present driver; the contribution lies not in the proofs but in the actuarial interpretation. The comparison theorem, applied to our copula-structured driver, yields monotonicity of premiums with respect to dependence strength (Corollary 1).
The third contribution is the Euler discretisation with identified components and convergence. We derive the Euler scheme for the forward–backward system and show that the three computational steps, namely copula-based dependence estimation, LSTM conditional expectation approximation, and Q-learning discrete Hamilton–Jacobi–Bellman solution, are not independent layers but sequential components of this single scheme. These components are linked by the information flow from dependence estimation to conditional expectation to optimal control, as described in
Section 3.
Three key design choices are made. First, BSDEs are used over static actuarial formulae to generate a dynamic, time-consistent premium process that ensures a monotonic relationship between losses and premiums via the comparison theorem. Second, a copula-structured driver separates modeling individual risk distributions from their dependencies, allowing independent optimization of marginals and natural translation of dependency strength into premium loading. Third, while simpler alternatives exist, LSTM and Q-learning are chosen based on a performance–cost trade-off, as the modular framework allows substitution, and an ablation study quantifies each component’s contribution.
Figure 1 provides a schematic of the complete modelling pipeline. Historical loss data enter the copula estimation module (Ingredient I), which outputs the dependence structure
and augmented training sequences. These feed into the LSTM forecasting module (Ingredient II), producing base premium rates and risk indices. The Q-learning module (Ingredient III) takes the risk indices as MDP state and outputs optimal green adjustment factors. The three outputs are assembled into the dynamic premium via (
18), with loss-ratio feedback closing the loop. This pipeline corresponds to Algorithm 1.
| Algorithm 1 Modular |
Practitioner Pipeline- 1:
Module A—Copula Estimation. Fit marginals (MLE/AIC), estimate via IFM, validate (Cramér–von Mises, ), generate synthetic sequences. - 2:
Module B—Loss Forecasting. Train LSTM (5-fold temporal CV) on historical + augmented data; output base rate and risk index . - 3:
Module C—Green Optimisation. Run Q-learning (Algorithm 2) to obtain . - 4:
Assembly. .
|
| Algorithm 2 Q-Learning for Green Adjustment |
- 1:
Input: , , , - 2:
Initialise for all - 3:
for to do - 4:
, - 5:
for to do - 6:
-greedy action; observe , ; update via ( 16) - 7:
end for - 8:
end for - 9:
Output:
|
The remainder of this paper is organised as follows.
Section 2 develops the continuous-time BSDE framework.
Section 3 derives the Euler discretisation, identifies the three computational ingredients, and establishes convergence.
Section 4 reports the empirical analysis, divided into implementation in
Section 4.4 and theoretical verification in
Section 4.5.
Section 5 discusses implications and limitations.
Section 6 concludes.
2. Continuous-Time BSDE Pricing Framework
All results are rigorous under the stated assumptions; the differing status of Conditions (C1)–(C3) is discussed in Remark 6.
2.1. Probability Space, Forward Loss Dynamics, and Green-Finance Mechanism
Let be a filtered probability space satisfying the usual conditions, supporting a d-dimensional Brownian motion and an independent Poisson random measure on with compensator . Here corresponds to the three perils: typhoon (), flood (), and drought ().
Definition 1 (Copula-structured compound loss process)
. The aggregate loss vector satisfieswith initial condition . Here is the baseline drift, quantifies the marginal loss reduction due to green investment in peril j, and is the green adjustment control adapted to . Equation (
1) states that each peril’s loss evolves under three forces: a predictable trend (
), random fluctuations (
), and sudden catastrophic jumps (
). The green control
acts as a brake on the drift: investing more in green technology slows expected loss growth at rate
per unit of investment.
The Brownian drivers are correlated via
with
obtained from Kendall’s
. Second, the joint distribution of jump sizes is specified by the
t-Copula [
9]:
Marginals
are first estimated by MLE, and then the copula parameters are estimated by the IFM method conditional on
[
31]. Marginal misspecification is the key vulnerability; the K–S tests (
Section 4.4.1, all
) and the copula Cramér–von Mises test (
) mitigate this concern.
The decomposition
reflects the empirical finding (
Section 4.9) that green technology adoption reduces expected agricultural losses [
13,
32], with the effect varying across peril types. The control
simultaneously affects the forward loss dynamics through drift reduction and the backward premium through the BSDE driver, creating a feedback loop whose trade-off structure is analysed.
Definition 2 (
t-Copula density)
. The density of the d-dimensional t-Copula with ν degrees of freedom and correlation matrix Σ
iswhere . Theorem 1 (Tail dependence)
. The bivariate t-Copula with parameters has symmetric tail dependence Figure 2,
Figure 3 and
Figure 4 illustrate the forward loss dynamics for each peril separately.
Figure 2 shows the typhoon loss paths with jump-diffusion characteristics;
Figure 3 highlights the co-movement with typhoon via the copula structure; and
Figure 4 displays the contrasting drought dynamics with negative typhoon correlation.
Remark 1 (Time-varying copula extension)
. The fixed copula extends naturally to a sliding-window estimator (see Section 4.7 for the empirical analysis). If is -predictable and uniformly bounded (), the Lipschitz bound in z becomes and Theorems 2–3 and Theorem 4 continue to hold. Definition 3 (Time-varying copula estimator)
. The sliding-window t-copula with window length replaces the static correlation matrix Σ
by the -predictable estimatorwhere denotes Kendall’s estimated from observations in the window , and the degrees-of-freedom parameter is re-estimated jointly over the same window by IFM. We say is uniformly bounded if holds uniformly in t for constants . Lemma 1 (Driver regularity under time-varying copula)
. Suppose satisfies Definition 3 and is uniformly bounded with constants . Then the driver f in (8), with Σ
replaced pointwise by , satisfies Assumption 2 withConsequently, Theorems 2–4 all continue to hold; the convergence constant C in (19) now also depends on . Proof. The Lipschitz bound in y is unchanged from Lemma 2. For the z-bound, replace by uniformly in t, giving . All remaining steps in the proofs of Theorems 2–4 carry through with this modified constant. □
Assumption 1. The coefficients satisfy
- (A1)
Lipschitz continuity: There exists such that for all t, , .
- (A2)
Linear growth: .
- (A3)
Jump integrability: .
Under Assumption 1, (
1) has a unique strong solution
[
33].
2.2. Driver Construction
The driver is assembled from four components. Step 1 (expected-loss baseline): the instantaneous net loss rate
. Step 2 (risk loading): the term
bridges copula dependence and premium, reducing to the standard-deviation principle [
29] when
. Step 3 (green-finance cost): implementation costs
yield the net marginal effect
when
, higher green investment reduces the premium; otherwise it is counterproductive. Q-learning resolves this trade-off to yield
. Step 4 (actuarial adequacy penalty): the penalty
prevents excessive deviation from the exogenous target
where
is the trailing
w-period average loss ratio and
is the regulatory safety loading.
2.3. BSDE Formulation of the Premium
Definition 4 (Premium BSDE)
. The premium process solveswhere is the premium, is the diffusion-hedging process, is the jump-hedging process, and is the terminal claim. In plain terms, the comparison theorem states that “worse inputs yield higher premiums”: if the terminal claim is larger or the instantaneous risk loading is higher under scenario (1) than under scenario (2), then the premium under scenario (1) dominates at every point in time. This is the BSDE analogue of the monotonicity axiom in coherent risk measures.
2.4. Existence and Uniqueness
Assumption 2. The driver f satisfies
- (B1)
Lipschitz in : There exist such that for all , .
- (B2)
Uniform bound in g: .
- (B3)
Square-integrable terminal condition: .
Lemma 2. The driver (8) satisfies Assumption 2 with and , where and . Proof. Lipschitz in
y: The terms depending on
y are
and
. Hence,
Lipschitz in
z: We use the regularisation
. For any
,
Taking gives . □
Theorem 2 (Existence and uniqueness)
. Under Assumptions 1 and 2, for any fixed admissible control G, the BSDE (9) has a unique adapted solution . Proof sketch. The driver is globally Lipschitz in
by Lemma 2 and
by (B3); jumps are handled via [
30,
34]. The Picard iteration contracts under
for
sufficiently large. □
In actuarial terms, Theorem 2 guarantees that for any given green-finance policy, a well-defined and unique premium process exists, leaving no ambiguity in the price assigned by the framework.
2.5. Comparison Theorem
Theorem 3 (Comparison)
. Let , ; solve BSDE (9) with drivers and terminal conditions . If a.s. and for all arguments a.s., then for all a.s. Proof. Define , , . Then with . By (B1), where , . Applying Itô’s formula to with sufficiently large and using yields for all t. □
Corollary 1 (Actuarial monotonicity)
. If in the Loewner order, then : the premium rises with dependence strength.
Remark 2 (Dependence ordering vs. tail dependence)
. Corollary 1 establishes premium monotonicity with respect to the Loewner ordering of Σ
, which controls the correlation component of dependence. The tail dependence coefficient (4) depends on both ρ and ν: increasing ρ (for fixed ν) increases both tail dependence and Σ
in the Loewner order, so the corollary applies directly. However, decreasing ν (for fixed Σ
) increases tail dependence without changing Σ
; in this case, the premium increase operates through the magnitude of the risk-loading term rather than through the comparison theorem. Formally ordering premiums with respect to ν requires additional structure on the driver and is left for future work. Remark 3 (Sensitivity to copula misspecification)
. If the true dependence structure is not a t-copula, the stability estimate [30] gives , so the monotonicity degrades gracefully. Section 4.6 quantifies this empirically. Informally, the stability estimate ensures that small perturbations in the terminal claim or the driver produce only small changes in the premium, a continuity property essential for practitioners who must work with estimated parameters.
2.6. Practitioner Implementation Guide
To facilitate adoption, we provide a modular pipeline (Algorithm 1) with clearly defined input–output interfaces, implementable via standard libraries (scipy 1.11, PyTorch 2.1.0, numpy 1.24).
Module B can be replaced by ARIMA or ETS provided Condition (C1) holds; Theorem 4 still guarantees the same convergence rate with a larger constant
.
Section 4.4.2 quantifies the trade-off (
vs. LSTM’s
).
2.7. Nonlinear Expectation Interpretation
Proposition 1 (
g-Expectation representation)
. For a fixed admissible control G, define . Then .
The g-expectation structure yields four actuarial properties:
- (i)
Monotonicity: the comparison theorem is the monotonicity of .
- (ii)
Risk loading: the nonlinearity yields ; the excess is the risk premium, amplified by .
- (iii)
Stability as continuity: The stability estimate is the continuity of
in both terminal condition and generator [
3].
- (iv)
From
g-expectation to Euler scheme: computing
at each discrete time requires evaluating
, which needs
(copula),
(LSTM), and
(Q-learning). The three ingredients of
Section 3 are evaluations of
at discrete times.
Remark 4 (Controlled nonlinear expectation)
. Optimising over yields , bridging BSDE theory and stochastic control.
2.8. Optimal Green Control
Proposition 2 (HJB characterisation)
. Define given . Under Assumptions 1 and 2, v is the unique viscosity solution ofwhere is the infinitesimal generator of under control g. Proof. For controlled forward–backward SDEs with jump diffusions, the viscosity solution framework is established in [
24] (continuous diffusions) and extended to Lévy-driven processes in [
35]. The key requirements—Lipschitz driver, bounded controls, and square-integrable jumps—are verified in Assumptions 1 and 2. □
The control
is a green-finance adjustment factor:
is a green discount,
a surcharge, and
neutral. The HJB Equation (
10) determines
by balancing
against
; Algorithm 2 approximates this in discrete time. The coefficients
are estimated by panel regression;
is calibrated to Zhejiang pilot data [
36].
3. Euler Discretisation of the Forward–Backward System
Let be a uniform partition with mesh .
3.1. Ingredient (I): Dependence Estimation via the
t-Copula
The copula parameters
are estimated via IFM, with goodness-of-fit assessed by the Cramér–von Mises statistic:
To address limited historical observations, the estimated copula and marginals are used to generate
synthetic correlated loss sequences via Monte Carlo simulation. These augmented data supplement the historical sample for LSTM training in Ingredient (II), improving the conditional expectation approximation without introducing distributional assumptions beyond those already embedded in the copula model.
Outputs include (to driver risk loading and Ingredient III risk index ), and (to Ingredient II for forward simulation and data augmentation).
3.2. Ingredient (II): Conditional Expectation Approximation
via LSTM
The LSTM approximates the mapping
Training uses both historical observations () and copula-augmented sequences from Ingredient (I) (), with augmented data weighted at 0.3 to prevent oversmoothing.
The train–test split is strictly temporal: 2014–2022 (99 city-year observations) for training, 2023 (11 observations) for testing. Hyperparameters are selected via 5-fold temporal cross-validation with expanding windows. Augmented observations from the fitted
t-Copula are weighted at 0.3 relative to historical data, selected by cross-validation from
. The single-year test set limits statistical power; this constraint is partially mitigated by leave-one-city-out cross-validation (
Section 4.8) and the cross-province experiment (
Section 4.3).
The LSTM follows the standard gated architecture of Hochreiter and Schmidhuber [
37] with two stacked layers,
hidden units, and dropout
.
Assumption 3 (Forward approximation accuracy)
. for a constant .
Proposition 3 (Sufficient condition for (C1) under regularity)
. Suppose the conditional expectation function satisfies with bounded derivatives, and suppose the approximator satisfies a uniform approximation bound . Thenwhere depends on the derivatives of φ and the forward SDE coefficients. Proof. Apply Itô’s formula to
over
: the remainder satisfies
. Adding the approximation error via the triangle inequality yields (
14). Hence (C1) holds with
provided
. □
Remark 5. Proposition 3 decomposes Condition (C1) into regularity of φ (guaranteed by Assumption 1 via standard SDE theory) and a uniform bound , which follows from universal approximation theorems for feedforward networks. A rigorous extension to LSTMs remains open; the empirical verification in Section 4.5.3 confirms the required scaling in practice. Outputs are (to driver Step 1) and with copula-based tail risk (to Ingredient III MDP state).
3.3. Ingredient (III): Discrete HJB Solution via Q-Learning
The Euler discretisation of (
10) yields the Bellman equation
with state
(where
uses copula-based tail risk from (I) and loss prediction from (II)), action
,
, and
.
The Q-learning update
converges to
under Robbins–Monro conditions.
The implementation uses states encoding and actions in , with discount factor and reward . The full Q-learning procedure is summarised in Algorithm 2.
As a comparison, we also implement a Deep Q-Network (DQN) that replaces the tabular Q-function with a neural network
parameterised by
. The network consists of two hidden layers with 64 units each and ReLU activations. The loss function is
where
denotes the target network parameters updated every 50 episodes. Experience replay with buffer size 5000 is used. The DQN operates on the continuous state
without discretisation.
Table 2 compares four RL algorithms on the same MDP. All achieve comparable variance reductions (43.1–44.3%; pairwise differences insignificant,
). Tabular Q-learning is retained as the default for its convergence guarantee (Theorem 4), interpretability, and efficiency; the gap narrows with finer discretisation (
).
3.4. Assembling the Discrete Premium
The discrete premium rate is
with loss-ratio feedback
,
.
This is the explicit Euler step for the backward variable Y (i.e., ) with the three ingredients substituted.
3.5. Convergence of the Euler Scheme
Theorem 4 states that the discrete premium computed by the Euler scheme approaches the true continuous-time premium at rate . The three conditions (C1)–(C3) quantify the approximation quality of each ingredient: (C1) bounds the LSTM error, (C2) bounds the Q-learning control error, and (C3) bounds the copula estimation error. When all three are controlled, the overall scheme inherits the classical half-order BSDE convergence rate.
Theorem 4 (Discrete-time convergence)
. Under Assumptions 1 and 2 and where
- (C1)
Assumption 3 holds (Ingredient II);
- (C2)
Q-learning satisfies with (Ingredient III);
- (C3)
for a constant (Ingredient I);
there exists independent of n such that
Remark 6. The rate is rigorous under (C1)–(C3). (C2) and (C3) have full theoretical backing [38]; (C1) has partial theoretical support (Proposition 3) and strong empirical support (Section 4.5.3, ). 3.6. Complete Proof of Theorem 4
We adapt Zhang’s [
39] framework with explicit tracking of the three ingredient errors. Throughout,
C denotes a generic constant depending on
but independent of
n.
Define
,
. From the continuous BSDE integrated over
:
Subtracting the Euler recursion for
:
By the Lipschitz property (B1) and the triangle inequality:
where
.
Standard regularity of the BSDE solution [
39] gives
, using the Itô isometry and the regularity
.
From the martingale representation and the Euler approximation
:
where
. Hence
, which, using
, gives
.
Squaring (
20), taking expectations, and applying Young’s inequality:
Substituting the Step 4 bound for
:
where
, using (C2):
. The copula estimation error from (C3) contributes
per step; summing over
n steps yields an additional term
independent of
.
Since
(both use the same terminal condition on the partition), the discrete Gronwall lemma yields
Summing the Z-bounds: .
This completes the proof of (
19).
5. Discussion
The key structural distinction from existing methods is that static copula models [
9], machine-learning forecasters [
2,
37], and reinforcement-learning controllers [
23] each address one facet of the pricing problem in isolation. Setting
in the driver recovers the linear pricing of copula-only approaches; removing the LSTM eliminates dynamic forecasting; removing Q-learning eliminates policy optimisation. In the proposed framework all three are components of a single Euler scheme for a controlled
g-expectation, unified by the convergence theorem rather than assembled ad hoc. The premium ordering of Corollary 1 is with respect to the Loewner order on
; the separate role of
in tail dependence is discussed in Remark 2.
A natural multi-agent extension leads to mean-field games, where each insurer’s BSDE driver depends on the empirical premium distribution; as the number of agents grows the equilibrium converges to a McKean–Vlasov BSDE [
41], complementing game-theoretic [
26,
27], subsidy-design [
19], and adverse-selection [
28] perspectives with a continuous-time stochastic-control foundation.
Several limitations should be acknowledged: the sliding-window copula improves variance reduction by 2.7 pp (
Section 4.7); Condition (C1) has partial theoretical and strong empirical support (Proposition 3,
); copula misspecification degrades gracefully; DQN offers marginal gains at
cost; and Algorithm 1 provides a modular pipeline with graceful degradation to ARIMA. Key open problems are rigorous approximation bounds for LSTMs on jump-diffusion paths, fully parametric dynamic copulas [
42], and semiparametric marginals to strengthen copula identifiability.
The cross-sectional dimension of eleven cities remains modest. The bootstrap analysis (
Table 6) and cross-province transfer experiment (
Table 7) quantify these uncertainties. Nevertheless, extending the framework to a national-scale panel would strengthen external validity and allow finer regional stratification.