Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning

Pei, Yunjiao; Zhao, Jun; Chen, Yankai; Li, Jianfeng; Chen, Qiaoting; Liu, Zichen; Li, Xiyan; Zhai, Yifan; Tang, Qi

doi:10.3390/math14061043

Open AccessFeature PaperArticle

Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning

by

Yunjiao Pei

¹

,

Jun Zhao

^1,*

,

Yankai Chen

¹

,

Jianfeng Li

¹

,

Qiaoting Chen

¹

,

Zichen Liu

¹

,

Xiyan Li

¹

,

Yifan Zhai

² and

Qi Tang

³

¹

School of Mathematics and Statistics, Ningbo University, Ningbo 315211, China

²

Entrepreneurship Academy, Nanyang Technological University, Singapore 639798, Singapore

³

School of Management and Economics, Tianjin University, Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(6), 1043; https://doi.org/10.3390/math14061043

Submission received: 12 February 2026 / Revised: 12 March 2026 / Accepted: 17 March 2026 / Published: 19 March 2026

(This article belongs to the Special Issue Mathematical Modeling for Economics and Finance: Probability, Stochastic Processes, and Differential Equations)

Download

Browse Figures

Versions Notes

Abstract

Pricing multi-peril agricultural insurance under compound climate hazards demands a framework that captures stochastic dependence among heterogeneous perils, accommodates non-stationary loss dynamics, and supports adaptive policy optimisation. We demonstrate that backward stochastic differential equations, combined with copula dependence, recurrent neural networks, and reinforcement learning, provide a unifying language for this task; the contribution lies in their principled integration. The dynamic premium is the unique adapted solution of a BSDE whose driver encodes compound-risk dependence through a Student-t copula, forward loss dynamics through a jump-diffusion process, and a green-finance adjustment through an optimal control variable. Within this framework we derive three progressive results by adapting standard BSDE theory to the compound-dependence and policy-control setting. First, existence and uniqueness hold under Lipschitz and square-integrability conditions. Second, a comparison theorem guarantees that a larger correlation matrix yields higher premiums; the degrees-of-freedom effect enters separately through the risk-loading magnitude. Third, the Euler discretisation converges at a rate of one half of the time-step size, with copula estimation, LSTM conditional expectation approximation, and Q-learning HJB solution as sequential components. Applied to eleven Zhejiang cities (2014–2023,

N \times T = 110

), in this illustrative application the framework reduces premium variance by 43.5 percent (bootstrap 95% CI:

[38.2 %, 48.7 %]

) while maintaining actuarial adequacy with a mean loss ratio of 0.678, though the modest sample size warrants caution in generalising these findings. Each component contributes statistically significant improvements confirmed by the Friedman test at the 0.1 percent significance level.

Keywords:

backwardstochastic differential equation; nonlinear expectation; t-copula; tail dependence; reinforcement learning; Bellman equation; agricultural insurance; comparison theorem; Euler discretisation

MSC:

60H10; 91G20; 62P05; 91B30; 93E20

1. Introduction

Pricing multi-peril agricultural insurance under compound climate hazards requires evaluating a nonlinear conditional expectation of a terminal claim driven by correlated perils while optimising a green-finance control that feeds back into the loss dynamics. Existing approaches address tail dependence, non-stationary forecasting, and sequential control in isolation [1,2]. The premium naturally inherits the nonlinear expectation structure developed by Peng [3], embedding actuarial properties such as monotonicity and risk loading within a coherent probabilistic framework.

BSDEs were introduced by Pardoux and Peng [4] and have become a central tool in mathematical finance. In insurance, BSDEs have been applied to optimal reinsurance [5] and premium principles under ambiguity [6]. However, applications to multi-peril agricultural insurance with compound dependence and green-finance controls are absent from the literature.

Research on compound climate risk has shifted from single-factor assessments to compound extreme-event modelling via copula-based dependence structures [7,8,9]. Goodwin and Hungerford [1] applied copula-based models to systemic agricultural risk but without temporal dynamics. Black–Scholes and fractional extensions have been used for agricultural premium determination [10,11,12]. However, these assume specific parametric dynamics that may not capture the full dependence spectrum of multi-peril risks. In the domain of green finance, green credit, bonds, and carbon markets mobilise funds and mitigate environmental risks [13], with regional effectiveness varying significantly [14,15]. Miao et al. [16] investigated the influence of green technological innovation on resource utilisation efficiency but did not operationalise this within a dynamic pricing framework.

The broader literature on agricultural insurance design [17,18] and government support mechanisms [19] further motivates the need for rigorous pricing methods that account for climate non-stationarity. Recent advances in agricultural decision systems [20] and food-security assessment under climate change [21] reinforce this urgency. Q-learning [22] and its deep extensions have been applied to financial decision-making [23] but rarely to insurance pricing. The well-known theoretical connection between discrete-time Q-learning and the continuous-time Hamilton–Jacobi–Bellman equation [24] has not been exploited to provide convergence guarantees in an insurance-pricing context. Recent work on deep BSDE solvers [25] opens the possibility of scaling such approaches, while game-theoretic analyses of insurance markets [26,27] and information-asymmetry models [28] provide complementary perspectives on market equilibrium.

Table 1 contrasts the proposed framework with representative existing studies.

The novelty resides not in these components individually but in the structural integration: the BSDE driver provides a single mathematical object in which copula dependence, recurrent forecasting, and policy optimisation interact through rigorously defined information flow, enabling comparison, stability, and convergence results that are inaccessible when the same tools are composed ad hoc. In particular, the copula correlation matrix

Σ

enters the driver’s risk-loading term and simultaneously determines the forward jump structure; the LSTM approximates the conditional expectation required by the Euler step; and Q-learning solves the discrete HJB arising from the same Euler discretisation. This tight coupling, formalized in Theorem 4, is what distinguishes the framework from a mere concatenation of existing methods. Starting from the expected-loss premium, we successively incorporate a copula-based risk loading, a green-finance implementation cost, and an actuarial penalty, with each term motivated by a specific economic or regulatory requirement as detailed in Section 2.2. The existence and uniqueness result in Theorem 2 follows from standard BSDE theory applied to our specific driver. The contribution here is in verifying the requisite conditions for the copula-structured, control-dependent driver as established in Lemma 2, and in showing that the premium admits a nonlinear expectation representation as stated in Proposition 1.

The second contribution concerns structural properties inherited from the nonlinear expectation. The comparison theorem (Theorem 3) and the stability estimate are adaptations of standard BSDE results [30] to the present driver; the contribution lies not in the proofs but in the actuarial interpretation. The comparison theorem, applied to our copula-structured driver, yields monotonicity of premiums with respect to dependence strength (Corollary 1).

The third contribution is the Euler discretisation with identified components and convergence. We derive the Euler scheme for the forward–backward system and show that the three computational steps, namely copula-based dependence estimation, LSTM conditional expectation approximation, and Q-learning discrete Hamilton–Jacobi–Bellman solution, are not independent layers but sequential components of this single scheme. These components are linked by the information flow from dependence estimation to conditional expectation to optimal control, as described in Section 3.

Three key design choices are made. First, BSDEs are used over static actuarial formulae to generate a dynamic, time-consistent premium process that ensures a monotonic relationship between losses and premiums via the comparison theorem. Second, a copula-structured driver separates modeling individual risk distributions from their dependencies, allowing independent optimization of marginals and natural translation of dependency strength into premium loading. Third, while simpler alternatives exist, LSTM and Q-learning are chosen based on a performance–cost trade-off, as the modular framework allows substitution, and an ablation study quantifies each component’s contribution.

Figure 1 provides a schematic of the complete modelling pipeline. Historical loss data enter the copula estimation module (Ingredient I), which outputs the dependence structure

(\hat{ν}, \hat{Σ})

and augmented training sequences. These feed into the LSTM forecasting module (Ingredient II), producing base premium rates and risk indices. The Q-learning module (Ingredient III) takes the risk indices as MDP state and outputs optimal green adjustment factors. The three outputs are assembled into the dynamic premium via (18), with loss-ratio feedback closing the loop. This pipeline corresponds to Algorithm 1.

Algorithm 1 Modular

Practitioner Pipeline

1:: Module A—Copula Estimation. Fit marginals (MLE/AIC), estimate $(\hat{ν}, \hat{Σ})$ via IFM, validate (Cramér–von Mises, $p > 0.05$ ), generate $N_{aug}$ synthetic sequences.
2:: Module B—Loss Forecasting. Train LSTM (5-fold temporal CV) on historical + augmented data; output base rate $R_{0, k} = {\hat{E}}_{L, k} / S_{k}$ and risk index $H_{k}$ .
3:: Module C—Green Optimisation. Run Q-learning (Algorithm 2) to obtain $G_{k}^{*}$ .
4:: Assembly. $R_{f, k} = R_{0, k} \times (1 + H_{k}) \times G_{k}^{*}$ .

Algorithm 2 Q-Learning for Green Adjustment

1:: Input: $S$ , $A$ , $γ = 0.95$ , $N_{ep} = 500$
2:: Initialise $Q (s, a) = 0$ for all $(s, a)$
3:: for $e = 1$ to $N_{ep}$ do
4:: $ε_{e} = max (0.01, 1 - 0.002 e)$ , $α_{e} = 0.1 / (1 + 0.01 e)$
5:: for $k = 0$ to $n - 1$ do
6:: $ε$ -greedy action; observe $r_{k}$ , $s_{k + 1}$ ; update via (16)
7:: end for
8:: end for
9:: Output: $G_{k}^{QL} = arg {max}_{a} Q (s_{k}, a)$

The remainder of this paper is organised as follows. Section 2 develops the continuous-time BSDE framework. Section 3 derives the Euler discretisation, identifies the three computational ingredients, and establishes convergence. Section 4 reports the empirical analysis, divided into implementation in Section 4.4 and theoretical verification in Section 4.5. Section 5 discusses implications and limitations. Section 6 concludes.

2. Continuous-Time BSDE Pricing Framework

All results are rigorous under the stated assumptions; the differing status of Conditions (C1)–(C3) is discussed in Remark 6.

2.1. Probability Space, Forward Loss Dynamics, and Green-Finance Mechanism

Let

(Ω, F, {F_{t}}_{t \in [0, T]}, P)

be a filtered probability space satisfying the usual conditions, supporting a d-dimensional Brownian motion

W_{t} = {(W_{t}^{(1)}, \dots, W_{t}^{(d)})}^{⊤}

and an independent Poisson random measure

N (d t, d z)

on

[0, T] \times R_{+}^{d}

with compensator

\hat{N} (d t, d z) = ν_{J} (d z) d t

. Here

d = 3

corresponds to the three perils: typhoon (

j = 1

), flood (

j = 2

), and drought (

j = 3

).

Definition 1 (Copula-structured compound loss process).

The aggregate loss vector

L_{t} = {(L_{t}^{(1)}, L_{t}^{(2)}, L_{t}^{(3)})}^{⊤}

satisfies

d L_{t}^{(j)} = [μ_{j}^{0} (t, L_{t}) - ϕ_{j} G_{t}] d t + σ_{j} (t, L_{t}) d W_{t}^{(j)} + \int_{R_{+}} γ_{j} (t, L_{t^{-}}, z) {\tilde{N}}^{(j)} (d t, d z), j = 1, \dots, d,

(1)

with initial condition

L_{0} = ℓ_{0} \in R_{+}^{d}

. Here

μ_{j}^{0}

is the baseline drift,

ϕ_{j} > 0

quantifies the marginal loss reduction due to green investment in peril j, and

G_{t} \in [\underset{̲}{G}, \bar{G}]

is the green adjustment control adapted to

F_{t}

.

Equation (1) states that each peril’s loss evolves under three forces: a predictable trend (

μ_{j}^{0} - ϕ_{j} G_{t}

), random fluctuations (

σ_{j} d W_{t}^{(j)}

), and sudden catastrophic jumps (

\int γ_{j} d {\tilde{N}}^{(j)}

). The green control

G_{t}

acts as a brake on the drift: investing more in green technology slows expected loss growth at rate

ϕ_{j}

per unit of investment.

The Brownian drivers are correlated via

Σ = [ρ_{j k}]

with

ρ_{j k} = sin (\frac{π}{2} {\hat{τ}}_{j k})

obtained from Kendall’s

τ

. Second, the joint distribution of jump sizes is specified by the t-Copula [9]:

F_{z_{1}, z_{2}, z_{3}} (x_{1}, x_{2}, x_{3}) = C_{ν, Σ} (F_{1} (x_{1}), F_{2} (x_{2}), F_{3} (x_{3})) .

(2)

Marginals

{\hat{F}}_{j}

are first estimated by MLE, and then the copula parameters are estimated by the IFM method conditional on

{\hat{F}}_{j}

[31]. Marginal misspecification is the key vulnerability; the K–S tests (Section 4.4.1, all

p > 0.45

) and the copula Cramér–von Mises test (

p = 0.523

) mitigate this concern.

The decomposition

μ_{j} (t, L_{t}, G_{t}) = μ_{j}^{0} (t, L_{t}) - ϕ_{j} G_{t}

reflects the empirical finding (Section 4.9) that green technology adoption reduces expected agricultural losses [13,32], with the effect varying across peril types. The control

G_{t}

simultaneously affects the forward loss dynamics through drift reduction and the backward premium through the BSDE driver, creating a feedback loop whose trade-off structure is analysed.

Definition 2 (t-Copula density).

The density of the d-dimensional t-Copula with ν degrees of freedom and correlation matrix Σ is

c_{ν, Σ} (u) = \frac{Γ (\frac{ν + d}{2}) {[Γ (\frac{ν}{2})]}^{d - 1} {| Σ |}^{- 1 / 2}}{{[Γ (\frac{ν + 1}{2})]}^{d}} \cdot \frac{{(1 + \frac{ξ^{⊤} Σ^{- 1} ξ}{ν})}^{- (ν + d) / 2}}{\prod_{j = 1}^{d} {(1 + \frac{ξ_{j}^{2}}{ν})}^{- (ν + 1) / 2}},

(3)

where

ξ = {(t_{ν}^{- 1} (u_{1}), \dots, t_{ν}^{- 1} (u_{d}))}^{⊤}

.

Theorem 1 (Tail dependence).

The bivariate t-Copula with parameters

(ν, ρ)

has symmetric tail dependence

λ_{U} = λ_{L} = 2 t_{ν + 1} (- \sqrt{\frac{(ν + 1) (1 - ρ)}{1 + ρ}}) > 0 for all ν < \infty, ρ > - 1 .

(4)

Figure 2, Figure 3 and Figure 4 illustrate the forward loss dynamics for each peril separately. Figure 2 shows the typhoon loss paths with jump-diffusion characteristics; Figure 3 highlights the co-movement with typhoon via the copula structure; and Figure 4 displays the contrasting drought dynamics with negative typhoon correlation.

Remark 1 (Time-varying copula extension).

The fixed copula extends naturally to a sliding-window estimator

Σ_{t} = {[sin (\frac{π}{2} {\hat{τ}}_{j k}^{[t - w_{c}, t]})]}_{j, k}

(see Section 4.7 for the empirical analysis). If

Σ_{t}

is

F_{t}

-predictable and uniformly bounded (

\underset{̲}{σ} I_{d} ⪯ Σ_{t} ⪯ \bar{σ} I_{d}

), the Lipschitz bound in z becomes

κ = η {\bar{σ}}^{1 / 2}

and Theorems 2–3 and Theorem 4 continue to hold.

Definition 3 (Time-varying copula estimator).

The sliding-window t-copula with window length

w_{c} > 0

replaces the static correlation matrix Σ by the

F_{t}

-predictable estimator

Σ_{t} = {[sin (\frac{π}{2} {\hat{τ}}_{j k}^{[t - w_{c}, t]})]}_{j, k = 1}^{d},

(5)

where

{\hat{τ}}_{j k}^{[t - w_{c}, t]}

denotes Kendall’s

\hat{τ}

estimated from observations in the window

[t - w_{c}, t]

, and the degrees-of-freedom parameter

{\hat{ν}}_{t}

is re-estimated jointly over the same window by IFM. We say

Σ_{t}

is uniformly bounded if

\underset{̲}{σ} I_{d} ⪯ Σ_{t} ⪯ \bar{σ} I_{d}

holds uniformly in t for constants

0 < \underset{̲}{σ} \leq \bar{σ} < \infty

.

Lemma 1 (Driver regularity under time-varying copula).

Suppose

Σ_{t}

satisfies Definition 3 and is uniformly bounded with constants

\underset{̲}{σ}, \bar{σ}

. Then the driver f in (8), with Σ replaced pointwise by

Σ_{t}

, satisfies Assumption 2 with

β = ψ (1 + \bar{H}) \bar{G} + λ_{1}, κ = η {\bar{σ}}^{1 / 2} .

Consequently, Theorems 2–4 all continue to hold; the convergence constant C in (19) now also depends on

\bar{σ}

.

Proof.

The Lipschitz bound in y is unchanged from Lemma 2. For the z-bound, replace

∥ Σ ∥

by

∥ Σ_{t} ∥ \leq \bar{σ}

uniformly in t, giving

κ = η {\bar{σ}}^{1 / 2}

. All remaining steps in the proofs of Theorems 2–4 carry through with this modified constant. □

Assumption 1.

The coefficients

μ_{j}^{0}, σ_{j}, γ_{j}

satisfy

(A1): Lipschitz continuity: There exists $K_{1} > 0$ such that for all t, $ℓ, ℓ^{'} \in R_{+}^{d}$ , $| μ_{j}^{0} (t, ℓ) - μ_{j}^{0} (t, ℓ^{'}) | + | σ_{j} (t, ℓ) - σ_{j} (t, ℓ^{'}) | \leq K_{1} | ℓ - ℓ^{'} |$ .
(A2): Linear growth: $| μ_{j}^{0} (t, ℓ) | + | σ_{j} (t, ℓ) | \leq K_{2} (1 + | ℓ |)$ .
(A3): Jump integrability: $E [\int_{0}^{T} \int_{R_{+}} {| γ_{j} (t, L_{t^{-}}, z) |}^{2} ν_{J} (d z) d t] < \infty$ .

Under Assumption 1, (1) has a unique strong solution

L \in S^{2} ([0, T]; R_{+}^{d})

[33].

2.2. Driver Construction

The driver is assembled from four components. Step 1 (expected-loss baseline): the instantaneous net loss rate

\sum_{j = 1}^{d} [μ_{j}^{0} (t, L_{t}) - ϕ_{j} G_{t}]

. Step 2 (risk loading): the term

η \sqrt{Z_{t}^{⊤} Σ Z_{t}}

bridges copula dependence and premium, reducing to the standard-deviation principle [29] when

Σ = I

. Step 3 (green-finance cost): implementation costs

ψ (1 + H_{t}) G_{t} \cdot Y_{t}

yield the net marginal effect

\frac{\partial f}{\partial g} = - \sum_{j = 1}^{d} ϕ_{j} + ψ (1 + H_{t}) Y_{t} .

(6)

when

\sum_{j} ϕ_{j} > ψ (1 + H_{t}) Y_{t}

, higher green investment reduces the premium; otherwise it is counterproductive. Q-learning resolves this trade-off to yield

G_{t}^{*}

. Step 4 (actuarial adequacy penalty): the penalty

- λ_{1} | Y_{t} - {\hat{y}}_{t} |

prevents excessive deviation from the exogenous target

{\hat{y}}_{t} = {\bar{L}}_{[t - w, t]} \cdot (1 + θ_{reg}),

(7)

where

{\bar{L}}_{[t - w, t]}

is the trailing w-period average loss ratio and

θ_{reg} > 0

is the regulatory safety loading.

Assembling Steps 1–4:

\begin{matrix} f (t, ℓ, y, z, g) = \underset{Step 1 : net expected loss}{\underset{︸}{\sum_{j = 1}^{d} [μ_{j}^{0} (t, ℓ) - ϕ_{j} g]}} + \underset{Step 2 : risk loading}{\underset{︸}{η \sqrt{z^{⊤} Σ z}}} \\ + \underset{Step 3 : implementation cost}{\underset{︸}{ψ (1 + H_{t}) g \cdot y}} - \underset{Step 4 : adequacy penalty}{\underset{︸}{λ_{1} |y - {\hat{y}}_{t}|}} . \end{matrix}

(8)

2.3. BSDE Formulation of the Premium

Definition 4 (Premium BSDE).

The premium process

{(Y_{t}, Z_{t}, U_{t})}_{t \in [0, T]}

solves

- d Y_{t} = f (t, L_{t}, Y_{t}, Z_{t}, G_{t}) d t - Z_{t}^{⊤} d W_{t} - \int_{R_{+}^{d}} U_{t} (z) \tilde{N} (d t, d z), Y_{T} = ξ (L_{T}),

(9)

where

Y_{t} \in R

is the premium,

Z_{t} \in R^{d}

is the diffusion-hedging process,

U_{t} : R_{+}^{d} \to R

is the jump-hedging process, and

ξ (L_{T})

is the terminal claim.

In plain terms, the comparison theorem states that “worse inputs yield higher premiums”: if the terminal claim is larger or the instantaneous risk loading is higher under scenario (1) than under scenario (2), then the premium under scenario (1) dominates at every point in time. This is the BSDE analogue of the monotonicity axiom in coherent risk measures.

2.4. Existence and Uniqueness

Assumption 2.

The driver f satisfies

(B1): Lipschitz in $(y, z)$ : There exist $β, κ > 0$ such that for all $t, ℓ, g$ , $| f (t, ℓ, y, z, g) - f (t, ℓ, y^{'}, z^{'}, g) | \leq β | y - y^{'} | + κ | z - z^{'} |$ .
(B2): Uniform bound in g: $| f (t, ℓ, 0, 0, g) | \leq K_{3} {(1 + | ℓ |}^{2})$ .
(B3): Square-integrable terminal condition: $ξ (L_{T}) \in L^{2} (Ω, F_{T}, P)$ .

Lemma 2.

The driver (8) satisfies Assumption 2 with

β = ψ (1 + \bar{H}) \bar{G} + λ_{1}

and

κ = η {∥ Σ ∥}^{1 / 2}

, where

\bar{H} = {sup}_{t} H_{t}

and

\bar{G} = {sup}_{t} G_{t}

.

Proof.

Lipschitz in y: The terms depending on y are

ψ (1 + H_{t}) g y

and

- λ_{1} | y - {\hat{y}}_{t} |

. Hence,

\begin{matrix} |f (\dots, y, z, g) - f (\dots, y^{'}, z, g)| \\ \leq |ψ (1 + H_{t}) g (y - y^{'})| + λ_{1} || y - {\hat{y}}_{t} | - | y^{'} - {\hat{y}}_{t} || \\ \leq (ψ (1 + \bar{H}) \bar{G} + λ_{1}) | y - y^{'} | = β | y - y^{'} | . \end{matrix}

Lipschitz in z: We use the regularisation

φ_{ε} (z) = \sqrt{z^{⊤} Σ z + ε}

. For any

z, z^{'} \in R^{d}

,

| φ_{ε} (z) - φ_{ε} (z^{'}) | \leq \frac{| {(z - z^{'})}^{⊤} Σ (z + z^{'}) |}{φ_{ε} (z) + φ_{ε} (z^{'})} \leq {∥ Σ ∥}^{1 / 2} | z - z^{'} | .

Taking

ε ↓ 0

gives

κ = η {∥ Σ ∥}^{1 / 2}

. □

Theorem 2 (Existence and uniqueness).

Under Assumptions 1 and 2, for any fixed admissible control G, the BSDE (9) has a unique adapted solution

(Y, Z, U) \in S^{2} \times H^{2} \times J^{2}

.

Proof sketch.

The driver is globally Lipschitz in

(y, z)

by Lemma 2 and

ξ \in L^{2}

by (B3); jumps are handled via [30,34]. The Picard iteration contracts under

{∥ (Y, Z, U) ∥}_{μ}^{2} = E [\int_{0}^{T} e^{μ t} ({| Y_{t} |}^{2} + {| Z_{t} |}^{2} + ∥ U_{t} ∥^{2}) d t]

for

μ

sufficiently large. □

In actuarial terms, Theorem 2 guarantees that for any given green-finance policy, a well-defined and unique premium process exists, leaving no ambiguity in the price assigned by the framework.

2.5. Comparison Theorem

Theorem 3 (Comparison).

Let

(Y^{(i)}, Z^{(i)}, U^{(i)})

,

i = 1, 2

; solve BSDE (9) with drivers

f^{(i)}

and terminal conditions

ξ^{(i)}

. If

ξ^{(1)} \geq ξ^{(2)}

a.s. and

f^{(1)} (t, ℓ, y, z, g) \geq f^{(2)} (t, ℓ, y, z, g)

for all arguments a.s., then

Y_{t}^{(1)} \geq Y_{t}^{(2)}

for all

t \in [0, T]

a.s.

Proof.

Define

δ Y = Y^{(1)} - Y^{(2)}

,

δ Z = Z^{(1)} - Z^{(2)}

,

δ U = U^{(1)} - U^{(2)}

. Then

- d (δ Y_{t}) = δ f_{t} d t - δ Z_{t}^{⊤} d W_{t} - \int δ U_{t} (z) \tilde{N} (d t, d z)

with

δ Y_{T} \geq 0

. By (B1),

δ f_{t} \geq a_{t} δ Y_{t} + b_{t}^{⊤} δ Z_{t}

where

| a_{t} | \leq β

,

| b_{t} | \leq κ

. Applying Itô’s formula to

e^{μ t} {(δ Y_{t}^{-})}^{2}

with

μ

sufficiently large and using

δ Y_{T}^{-} = 0

yields

δ Y_{t}^{-} = 0

for all t. □

Corollary 1 (Actuarial monotonicity).

If

Σ^{(1)} \geq Σ^{(2)}

in the Loewner order, then

Y_{t}^{(1)} \geq Y_{t}^{(2)}

: the premium rises with dependence strength.

Remark 2 (Dependence ordering vs. tail dependence).

Corollary 1 establishes premium monotonicity with respect to the Loewner ordering of Σ , which controls the correlation component of dependence. The tail dependence coefficient (4) depends on both ρ and ν: increasing ρ (for fixed ν) increases both tail dependence and Σ in the Loewner order, so the corollary applies directly. However, decreasing ν (for fixed Σ ) increases tail dependence without changing Σ ; in this case, the premium increase operates through the magnitude of the risk-loading term

η \sqrt{z^{⊤} Σ z}

rather than through the comparison theorem. Formally ordering premiums with respect to ν requires additional structure on the driver and is left for future work.

Remark 3 (Sensitivity to copula misspecification).

If the true dependence structure is not a t-copula, the stability estimate [30] gives

E [{sup}_{t} {| δ Y_{t} |}^{2}] \leq C {∥ \hat{Σ} - Σ_{0} ∥}^{2} \cdot E [\int_{0}^{T} {| Z_{t} |}^{2} d t]

, so the monotonicity degrades gracefully. Section 4.6 quantifies this empirically.

Informally, the stability estimate ensures that small perturbations in the terminal claim or the driver produce only small changes in the premium, a continuity property essential for practitioners who must work with estimated parameters.

2.6. Practitioner Implementation Guide

To facilitate adoption, we provide a modular pipeline (Algorithm 1) with clearly defined input–output interfaces, implementable via standard libraries (scipy 1.11, PyTorch 2.1.0, numpy 1.24).

Module B can be replaced by ARIMA or ETS provided Condition (C1) holds; Theorem 4 still guarantees the same convergence rate with a larger constant

C_{L}

. Section 4.4.2 quantifies the trade-off (

R^{2} = 0.561

vs. LSTM’s

0.746

).

2.7. Nonlinear Expectation Interpretation

Proposition 1 (g-Expectation representation).

For a fixed admissible control G, define

g^{G} (t, y, z) : = f (t, L_{t}, y, z, G_{t})

. Then

Y_{t} = E_{g^{G}} [ξ (L_{T}) ∣ F_{t}]

.

The g-expectation structure yields four actuarial properties:

(i): Monotonicity: the comparison theorem is the monotonicity of $E_{g^{G}}$ .
(ii): Risk loading: the nonlinearity $η \sqrt{z^{⊤} Σ z}$ yields $E_{g^{G}} [ξ | F_{t}] > E [ξ | F_{t}]$ ; the excess is the risk premium, amplified by $Σ$ .
(iii): Stability as continuity: The stability estimate is the continuity of $E_{g^{G}}$ in both terminal condition and generator [3].
(iv): From g-expectation to Euler scheme: computing $E_{g^{G}} [ξ | F_{0}]$ at each discrete time requires evaluating $g^{G}$ , which needs $Σ$ (copula), $E_{k} [Δ L_{k}]$ (LSTM), and $G_{k}^{*}$ (Q-learning). The three ingredients of Section 3 are evaluations of $g^{G}$ at discrete times.

Remark 4 (Controlled nonlinear expectation).

Optimising over

G_{t}

yields

V_{0} = {inf}_{G \in G_{ad}} E_{g^{G}} [ξ (L_{T})]

, bridging BSDE theory and stochastic control.

2.8. Optimal Green Control

Proposition 2 (HJB characterisation).

Define

v (t, ℓ) = {inf}_{G} Y_{t}^{G}

given

L_{t} = ℓ

. Under Assumptions 1 and 2, v is the unique viscosity solution of

\partial_{t} v + inf_{g \in [\underset{̲}{G}, \bar{G}]} [L^{g} v + f (t, ℓ, v, σ^{⊤} \nabla v, g)] = 0, v (T, ℓ) = ξ (ℓ),

(10)

where

L^{g}

is the infinitesimal generator of

L_{t}

under control g.

Proof.

For controlled forward–backward SDEs with jump diffusions, the viscosity solution framework is established in [24] (continuous diffusions) and extended to Lévy-driven processes in [35]. The key requirements—Lipschitz driver, bounded controls, and square-integrable jumps—are verified in Assumptions 1 and 2. □

The control

G_{t} \in [\underset{̲}{G}, \bar{G}]

is a green-finance adjustment factor:

G_{t} < 1

is a green discount,

G_{t} > 1

a surcharge, and

G_{t} = 1

neutral. The HJB Equation (10) determines

G_{t}^{*}

by balancing

\sum_{j} ϕ_{j}

against

ψ_{G} (1 + H_{t}) Y_{t}

; Algorithm 2 approximates this in discrete time. The coefficients

ϕ_{j} = (0.12, 0.09, 0.06)

are estimated by panel regression;

ψ = 0.08

is calibrated to Zhejiang pilot data [36].

3. Euler Discretisation of the Forward–Backward System

Let

π_{n} = {0 = t_{0} < t_{1} < \dots < t_{n} = T}

be a uniform partition with mesh

Δ t = T / n

.

3.1. Ingredient (I): Dependence Estimation via the t-Copula

The copula parameters

(\hat{ν}, \hat{Σ})

are estimated via IFM, with goodness-of-fit assessed by the Cramér–von Mises statistic:

S_{n} = \int_{{[0, 1]}^{d}} {[{\hat{C}}_{n} (u) - C_{\hat{ν}, \hat{Σ}} (u)]}^{2} d {\hat{C}}_{n} (u) .

(11)

To address limited historical observations, the estimated copula and marginals are used to generate

N_{aug} = 1000

synthetic correlated loss sequences via Monte Carlo simulation. These augmented data supplement the historical sample for LSTM training in Ingredient (II), improving the conditional expectation approximation without introducing distributional assumptions beyond those already embedded in the copula model.

{\hat{p}}_{e} = \frac{1}{N_{sim}} \sum_{i = 1}^{N_{sim}} 1 \{\sum_{j = 1}^{d} L_{j}^{(i)} > S\} .

(12)

Outputs include

\hat{Σ}

(to driver risk loading and Ingredient III risk index

H_{k}

),

{\hat{F}}_{j}

and

(\hat{ν}, \hat{Σ})

(to Ingredient II for forward simulation and data augmentation).

3.2. Ingredient (II): Conditional Expectation Approximation via LSTM

The LSTM approximates the mapping

x_{k} : = (L_{t_{k - w}}, \dots, L_{t_{k}}, {covariates}_{k}) ⟼ {\hat{E}}_{L, k}^{LSTM} \approx E [L_{t_{k + 1}} ∣ F_{t_{k}}] .

(13)

Training uses both historical observations (

N \times T = 110

) and copula-augmented sequences from Ingredient (I) (

N_{aug} = 1000

), with augmented data weighted at 0.3 to prevent oversmoothing.

The train–test split is strictly temporal: 2014–2022 (99 city-year observations) for training, 2023 (11 observations) for testing. Hyperparameters are selected via 5-fold temporal cross-validation with expanding windows. Augmented observations from the fitted t-Copula are weighted at 0.3 relative to historical data, selected by cross-validation from

{0.1, 0.2, 0.3, 0.5, 1.0}

. The single-year test set limits statistical power; this constraint is partially mitigated by leave-one-city-out cross-validation (Section 4.8) and the cross-province experiment (Section 4.3).

The LSTM follows the standard gated architecture of Hochreiter and Schmidhuber [37] with two stacked layers,

h = 64

hidden units, and dropout

p = 0.2

.

Assumption 3 (Forward approximation accuracy).

E [{| {\hat{E}}_{L, k}^{LSTM} - E [Δ L_{k} ∣ F_{t_{k}}] |}^{2}] \leq C_{L} Δ t

for a constant

C_{L} > 0

.

Proposition 3 (Sufficient condition for (C1) under regularity).

Suppose the conditional expectation function

φ (t, ℓ) = E [L_{t + Δ t} ∣ L_{t} = ℓ]

satisfies

φ \in C^{1, 2} ([0, T] \times R_{+}^{d})

with bounded derivatives, and suppose the approximator

{\hat{φ}}_{θ}

satisfies a uniform approximation bound

{sup}_{ℓ} | {\hat{φ}}_{θ} (t, ℓ) - φ (t, ℓ) | \leq ε_{approx}

. Then

E [| {\hat{E}}_{L, k}^{a p p r o x} - E [Δ L_{k} ∣ F_{t_{k}}] |^{2}] \leq C_{φ} Δ t + 2 ε_{approx}^{2},

(14)

where

C_{φ}

depends on the derivatives of φ and the forward SDE coefficients.

Proof.

Apply Itô’s formula to

φ (s, L_{s})

over

[t_{k}, t_{k + 1}]

: the remainder satisfies

{E [| O (Δ t) |}^{2}] \leq C_{φ} Δ t

. Adding the approximation error via the triangle inequality yields (14). Hence (C1) holds with

C_{L} = C_{φ} + 2 ε_{approx}^{2} / Δ t

provided

ε_{approx}^{2} = O (Δ t)

. □

Remark 5.

Proposition 3 decomposes Condition (C1) into regularity of φ (guaranteed by Assumption 1 via standard SDE theory) and a uniform bound

ε_{approx}^{2} = O (Δ t)

, which follows from universal approximation theorems for feedforward networks. A rigorous extension to LSTMs remains open; the empirical verification in Section 4.5.3 confirms the required scaling in practice.

Outputs are

R_{0, k}^{LSTM} = {\hat{E}}_{L, k}^{LSTM} / S_{k}

(to driver Step 1) and

H_{k}

with copula-based tail risk (to Ingredient III MDP state).

3.3. Ingredient (III): Discrete HJB Solution via Q-Learning

The Euler discretisation of (10) yields the Bellman equation

Q^{*} (s, a) = E [r (s, a) + γ max_{a^{'}} Q^{*} (s^{'}, a^{'}) ∣ s, a],

(15)

with state

s_{k} = (I_{k}, Z_{k}, Q_{k}, H_{k})

(where

H_{k}

uses copula-based tail risk from (I) and loss prediction from (II)), action

a_{k} \in {0.85, 0.865, \dots, 1.15}

,

γ = e^{- r Δ t}

, and

r_{k} = - w_{1} | L_{k} - {\hat{L}}_{k} | + w_{2} Δ Z_{k} - w_{3} Δ I_{k}

.

The Q-learning update

Q (s_{k}, a_{k}) \leftarrow Q (s_{k}, a_{k}) + α_{k} [r_{k} + γ max_{a^{'}} Q (s_{k + 1}, a^{'}) - Q (s_{k}, a_{k})]

(16)

converges to

Q^{*}

under Robbins–Monro conditions.

The implementation uses

| S | = 200

states encoding

(I_{k}, Z_{gr, k}, Q_{k}, H_{k})

and

| A | = 21

actions in

[0.85, 1.15]

, with discount factor

γ = e^{- r Δ t} \approx 0.95

and reward

r_{k} = - w_{1} | L_{k} - {\hat{L}}_{k} | + w_{2} Δ Z_{gr, k} - w_{3} Δ I_{k}

. The full Q-learning procedure is summarised in Algorithm 2.

As a comparison, we also implement a Deep Q-Network (DQN) that replaces the tabular Q-function with a neural network

Q_{θ} (s, a)

parameterised by

θ

. The network consists of two hidden layers with 64 units each and ReLU activations. The loss function is

L (θ) = E [{(r_{k} + γ max_{a^{'}} Q_{θ^{-}} (s_{k + 1}, a^{'}) - Q_{θ} (s_{k}, a_{k}))}^{2}],

(17)

where

θ^{-}

denotes the target network parameters updated every 50 episodes. Experience replay with buffer size 5000 is used. The DQN operates on the continuous state

s_{k} = (I_{k}, Z_{gr, k}, Q_{k}, H_{k}) \in R^{4}

without discretisation.

Table 2 compares four RL algorithms on the same MDP. All achieve comparable variance reductions (43.1–44.3%; pairwise differences insignificant,

p > 0.15

). Tabular Q-learning is retained as the default for its convergence guarantee (Theorem 4), interpretability, and efficiency; the gap narrows with finer discretisation (

| S | = 500

).

3.4. Assembling the Discrete Premium

The discrete premium rate is

R_{f, k}^{(dyn)} = \underset{Ingr . (II)}{\underset{︸}{R_{0, k}^{LSTM}}} \times (1 + H_{k}) \times \underset{Ingr . (III)}{\underset{︸}{G_{k}^{QL}}},

(18)

with loss-ratio feedback

R_{f, k + 1}^{(adj)} = R_{f, k}^{(dyn)} (1 + χ (L_{k} - L^{*}) / L^{*})

,

χ \in (0, 1)

.

This is the explicit Euler step for the backward variable Y (i.e.,

Y_{k}^{n} = E_{k} [Y_{k + 1}^{n}] + f (t_{k}, L_{t_{k}}^{n}, Y_{k}^{n}, Z_{k}^{n}, G_{k}^{QL}) Δ t

) with the three ingredients substituted.

3.5. Convergence of the Euler Scheme

Theorem 4 states that the discrete premium computed by the Euler scheme approaches the true continuous-time premium at rate

O (\sqrt{Δ t})

. The three conditions (C1)–(C3) quantify the approximation quality of each ingredient: (C1) bounds the LSTM error, (C2) bounds the Q-learning control error, and (C3) bounds the copula estimation error. When all three are controlled, the overall scheme inherits the classical half-order BSDE convergence rate.

Theorem 4 (Discrete-time convergence).

Under Assumptions 1 and 2 and where

(C1): Assumption 3 holds (Ingredient II);
(C2): Q-learning satisfies $| G_{k}^{Q L} - G_{k}^{*} | \leq ε_{Q}$ with $ε_{Q}^{2} \leq C_{Q} Δ t$ (Ingredient III);
(C3): $| \hat{ν} - ν | + ∥ \hat{Σ} - Σ ∥ \leq ε_{C}$ for a constant $ε_{C} > 0$ (Ingredient I);
there exists $C > 0$ independent of n such that

$max_{0 \leq k \leq n} E [{| Y_{t_{k}} - Y_{k}^{n} |}^{2}] + \sum_{k = 0}^{n - 1} E [{| Z_{t_{k}} - Z_{k}^{n} |}^{2}] Δ t \leq C Δ t + C^{'} ε_{C}^{2} .$

(19)

Remark 6.

The

O (\sqrt{Δ t})

rate is rigorous under (C1)–(C3). (C2) and (C3) have full theoretical backing [38]; (C1) has partial theoretical support (Proposition 3) and strong empirical support (Section 4.5.3,

p_{perm} = 0.003

).

3.6. Complete Proof of Theorem 4

We adapt Zhang’s [39] framework with explicit tracking of the three ingredient errors. Throughout, C denotes a generic constant depending on

β, κ, T

but independent of n.

Step 1: Error decomposition.

Define

e_{k}^{Y} = Y_{t_{k}} - Y_{k}^{n}

,

e_{k}^{Z} = Z_{t_{k}} - Z_{k}^{n}

. From the continuous BSDE integrated over

[t_{k}, t_{k + 1}]

:

Y_{t_{k}} = E_{k} [Y_{t_{k + 1}}] + E_{k} \int_{t_{k}}^{t_{k + 1}} f (s, L_{s}, Y_{s}, Z_{s}, G_{s}^{*}) d s .

Subtracting the Euler recursion for

Y_{k}^{n}

:

\begin{matrix} e_{k}^{Y} & = E_{k} [e_{k + 1}^{Y}] + \underset{= : δ_{k}^{disc}}{\underset{︸}{E_{k} \int_{t_{k}}^{t_{k + 1}} [f (s, L_{s}, Y_{s}, Z_{s}, G_{s}^{*}) - f (t_{k}, L_{t_{k}}, Y_{t_{k}}, Z_{t_{k}}, G_{t_{k}}^{*})] d s}} \\ + \underset{= : δ_{k}^{approx}}{\underset{︸}{[f (t_{k}, L_{t_{k}}, Y_{t_{k}}, Z_{t_{k}}, G_{t_{k}}^{*}) - f (t_{k}, L_{t_{k}}^{n}, Y_{k}^{n}, Z_{k}^{n}, G_{k}^{QL})] Δ t}} . \end{matrix}

(20)

Step 2: Bounding the approximation error.

By the Lipschitz property (B1) and the triangle inequality:

\begin{matrix} | δ_{k}^{approx} | & \leq (β | e_{k}^{Y} | + κ | e_{k}^{Z} |) Δ t + \underset{\leq Φ ε_{Q} by (C 2)}{\underset{︸}{|\sum_{j} ϕ_{j} (G_{t_{k}}^{*} - G_{k}^{QL})|}} Δ t \\ + \underset{\leq C ∥ \hat{Σ} - Σ ∥ | Z_{t_{k}} | by (C 3)}{\underset{︸}{η |\sqrt{Z_{t_{k}}^{⊤} \hat{Σ} Z_{t_{k}}} - \sqrt{Z_{t_{k}}^{⊤} Σ Z_{t_{k}}}|}} Δ t \\ + \underset{bounded by \sqrt{C_{L} Δ t} (in L^{2}) by (C 1)}{\underset{︸}{|\sum_{j} [μ_{j}^{0} (t_{k}, L_{t_{k}}) - {\hat{E}}_{L, k}^{LSTM}]|}} Δ t, \end{matrix}

where

Φ = \sum_{j} ϕ_{j}

.

Step 3: Bounding the discretisation error.

Standard regularity of the BSDE solution [39] gives

E [{| δ_{k}^{disc} |}^{2}] \leq C {(Δ t)}^{2}

, using the Itô isometry and the regularity

E [\int_{t_{k}}^{t_{k + 1}} {| Z_{s} - Z_{t_{k}} |}^{2} d s] \leq C Δ t^{2}

.

Step 4: Z-estimate.

From the martingale representation and the Euler approximation

Z_{k}^{n} Δ t = E_{k} [Y_{k + 1}^{n} Δ W_{k}]

:

\begin{matrix} Z_{t_{k}} Δ t & = E_{k} \int_{t_{k}}^{t_{k + 1}} Z_{s} d s = E_{k} [Y_{t_{k + 1}} Δ W_{k}] + R_{k}^{Z}, \\ Z_{k}^{n} Δ t & = E_{k} [Y_{k + 1}^{n} Δ W_{k}], \end{matrix}

where

E [{| R_{k}^{Z} |}^{2}] \leq C {(Δ t)}^{3}

. Hence

{| e_{k}^{Z} |}^{2} Δ t \leq C E_{k} [{| e_{k + 1}^{Y} |}^{2} {] (Δ t)}^{- 1} \cdot E_{k} [{| Δ W_{k} |}^{2}] + C {(Δ t)}^{2}

, which, using

E_{k} [{| Δ W_{k} |}^{2}] = d Δ t

, gives

E [{| e_{k}^{Z} |}^{2}] \leq C E [{| e_{k + 1}^{Y} |}^{2}] / Δ t + C Δ t

.

Step 5: Gronwall recursion.

Squaring (20), taking expectations, and applying Young’s inequality:

\begin{matrix} E [{| e_{k}^{Y} |}^{2}] & \leq (1 + C Δ t) E [{| e_{k + 1}^{Y} |}^{2}] + C Δ t E [{| e_{k}^{Z} |}^{2}] \\ + C {(Δ t)}^{2} + C C_{L} {(Δ t)}^{2} + C Φ^{2} ε_{Q}^{2} {(Δ t)}^{2} \\ + C C_{C}^{2} n^{- 1} {(Δ t)}^{2} E [{| Z_{t_{k}} |}^{2}] . \end{matrix}

Substituting the Step 4 bound for

E [{| e_{k}^{Z} |}^{2}]

:

E [{| e_{k}^{Y} |}^{2}] \leq (1 + C^{'} Δ t) E [{| e_{k + 1}^{Y} |}^{2}] + C^{″} Δ t^{2},

where

C^{″} = C (1 + C_{L} + Φ^{2} C_{Q})

, using (C2):

ε_{Q}^{2} \leq C_{Q} Δ t

. The copula estimation error from (C3) contributes

C ε_{C}^{2} E [{| Z_{t_{k}} |}^{2}] {(Δ t)}^{2}

per step; summing over n steps yields an additional term

C^{'} ε_{C}^{2}

independent of

Δ t

.

Since

e_{n}^{Y} = ξ (L_{T}) - ξ (L_{T}^{n}) = 0

(both use the same terminal condition on the partition), the discrete Gronwall lemma yields

max_{0 \leq k \leq n} E [{| e_{k}^{Y} |}^{2}] \leq \frac{C^{″} (e^{C^{'} T} - 1)}{C^{'}} \cdot Δ t + C^{'} ε_{C}^{2} .

Summing the Z-bounds:

\sum_{k = 0}^{n - 1} E [{| e_{k}^{Z} |}^{2}] Δ t \leq C Δ t + C^{'} ε_{C}^{2}

.

This completes the proof of (19).

4. Empirical Application and Analysis

4.1. Application: Dynamic Premium Rate Determination

Table 3 reports the dynamic premium rates

R_{f, k}^{(dyn)}

for all eleven cities alongside static and officially filed rates for the 2023 policy year.

We stress that with

N \times T = 110

observations and a single test year, the empirical results serve primarily to demonstrate the framework’s operational feasibility and internal consistency with the theoretical predictions, rather than to establish definitive actuarial superiority. A multi-province, multi-decade study would be needed for robust generalisability claims.

Three patterns emerge. First, premiums rise for high-risk coastal cities (Taizhou:

+ 0.89 %

, Wenzhou:

+ 0.68 %

), addressing chronic underpricing [18]. Second, premiums fall slightly for low-risk inland cities (Huzhou:

- 0.09 %

, Lishui:

- 0.08 %

). Third, all loss ratios lie within

[0.6, 0.8]

, confirming solvency is maintained.

The full pipeline runs in approximately 15 min on a standard workstation, making it feasible for annual rate filings.

4.2. Data and Variables

We acknowledge that the cross-sectional dimension of eleven cities and the resulting 110 city-year observations constitute a modest sample. Consequently, the empirical findings should be interpreted as an illustration of the framework’s applicability rather than as evidence for broad generalisability. To partially mitigate this limitation, we supplement the analysis with bootstrap inference, leave-one-city-out cross-validation (Section 4.8), and a cross-province transfer experiment to Jiangxi Province (Section 4.3). Zhejiang is a national pilot for green finance reform with diverse agricultural risks [36]. Table 4 summarises the variable definitions and data sources, while Table 5 reports descriptive statistics for the key variables. Figure 5 visualises these distributional characteristics.

The training protocol follows Section 3.2.

To quantify the uncertainty arising from the small sample, we perform a block bootstrap with 2000 replications (block length = 3 years to preserve temporal dependence). For each replication, the full Euler scheme is re-estimated, yielding a bootstrap distribution of the variance reduction statistic.

In Table 6, the 95% bootstrap confidence interval for variance reduction is

[38.2 %, 48.7 %]

, indicating that even under sampling uncertainty the improvement over the static baseline is substantial (lower bound

> 35 %

).

4.3. Cross-Province Transfer Experiment

We conduct a transfer experiment on Jiangxi Province data (eight cities, 2016–2023,

N_{J} \times T_{J} = 64

) under three settings: (i) direct transfer; (ii) copula re-estimated, LSTM frozen; (iii) full re-estimation.

Table 7 shows that (i) retains 65% of Zhejiang performance, (ii) recovers 80%, and (iii) achieves 90%. The copula is the most geographically sensitive component; full local re-estimation is recommended.

Each parameter choice follows a transparent rationale. The risk-loading coefficient

η = 0.15

and adequacy penalty

λ_{1} = 0.10

are selected by leave-one-city-out cross-validation to minimise the mean absolute deviation of the loss ratio from the target

L^{*} = 0.70

. The green-finance implementation cost

ψ = 0.08

is calibrated to match the dispersion observed in Zhejiang pilot data [36]. The loss-reduction coefficients

ϕ_{j} = (0.12, 0.09, 0.06)

are estimated from peril-specific panel regressions, with typhoons showing the largest effect due to infrastructure resilience investments.

4.4. Implementation of the Euler Scheme

Model parameters are calibrated as follows:

η = 0.15

and

λ_{1} = 0.10

by LOCO cross-validation;

ψ = 0.08

matched to Zhejiang pilot dispersion [36];

ϕ_{j} = (0.12, 0.09, 0.06)

by peril-specific panel regression;

θ_{reg} = 0.25

from CBIRC guidelines;

L^{*} = 0.70

,

w = 3

years, and

χ = 0.30

from regulatory standards and cross-validation; QL reward weights

(w_{1}, w_{2}, w_{3}) = (0.5, 0.3, 0.2)

by grid search.

4.4.1. Ingredient (I): Copula Dependence Estimation

Marginal distributions (Table 8) all pass the K–S test at the 5% level.

Kendall’s

τ

(Table 9) shows the strongest positive dependence between typhoon and flood (

\hat{τ} = 0.428

) and weak negative dependence between typhoon and drought (Figure 6). We emphasise that the following verification is a consistency check: we test whether the data exhibit the qualitative patterns predicted by the theory (premiums increasing with dependence strength), not a proof of the theorem itself, which holds under its stated mathematical assumptions.

The t-copula dominates all competitors (Table 10); Clayton and Frank are rejected at the 5% level.

The estimated

\hat{Σ}

and

{\hat{F}}_{j}

are passed to Ingredient (II).

4.4.2. Ingredient (II): LSTM Conditional Expectation

Table 11 compares the LSTM against five benchmarks on the 2023 test set. The LSTM achieves the best performance across all metrics, and copula-based augmentation (“LSTM + aug”) further reduces RMSE by 5.6%.

To assess the statistical significance of the LSTM’s advantage over the next-best model (GRU), we perform a paired t-test on the city-level absolute errors: the LSTM achieves significantly lower MAE (

t = 2.41

,

p = 0.036

,

n = 11

). The difference between LSTM and LSTM+aug is also significant (

t = 2.18

,

p = 0.054

), marginally at the 10% level, confirming that copula augmentation provides a meaningful improvement despite the small test set.

Sensitivity analysis over

N_{aug} \in {0, 200, 500, 1000, 2000}

and

w_{aug} \in {0.1, 0.2, 0.3, 0.5, 1.0}

identifies the optimal configuration (

N_{aug} = 1000

,

w_{aug} = 0.3

); augmentation consistently reduces the train–test gap, and the Friedman test remains significant (

p < 0.001

) even without augmentation.

4.4.3. Ingredient (III): Q-Learning Green Optimisation

Q-learning converges after approximately 340 episodes (reward stabilising at

- 0.121 \pm 0.016

). Figure 7 shows the training reward curve. Table 12 reports the optimised green adjustment coefficients for each city.

4.5. Theoretical Verification

4.5.1. Comparison Theorem (Theorem 3)

As Table 13 shows, all ratios exceed 1, confirming

Y^{(Σ)} > Y^{(I_{d})}

and thus empirically validating the actuarial monotonicity predicted by Corollary 1.

4.5.2. Convergence Rate (Theorem 4)

Table 14 confirms the theoretical half-order rate: ratios of approximately

\sqrt{2}

and a regression slope of

0.49 \approx 1 / 2

. Note that the 10-year panel permits

n \in {10, 20, 40, 80}

without the LSTM forecast horizon mismatch present in shorter panels.

We emphasise that the results in Table 14 constitute an empirical illustration of the convergence rate predicted by Theorem 4, not a formal mathematical proof. The theorem holds under the stated assumptions (Assumptions 1 and 2 and Conditions (C1)–(C3)), and the empirical exercise demonstrates consistency between the theoretical prediction and observed behaviour on the Zhejiang dataset. Verification on other datasets and under distributional shift remains an open empirical question.

4.5.3. Verification of Condition (C1)

Condition (C1) requires

{MSE}_{k} \leq C_{L} Δ t

. We test this scaling relationship formally. For each

Δ t \in {1.0, 0.5, 0.25, 0.125}

years (the coarsest step

Δ t = 1.0

is enabled by the extended 10-year panel), we compute the empirical MSE on the 2023 test set and define

{\hat{C}}_{L} = MSE / Δ t

. If (C1) holds, then

{\hat{C}}_{L}

should be approximately constant across

Δ t

.

We emphasise that this verification is empirical rather than a rigorous mathematical proof of Condition (C1) for LSTM architectures. Proposition 3 provides a partial theoretical justification, but the uniform approximation bound for recurrent networks on jump-diffusion paths remains an open theoretical question. The assumption may fail under substantial distributional shift or for substantially different data-generating processes.

As shown in Table 15,

{\hat{C}}_{L}

ranges from 0.01134 to 0.01192 (mean 0.01163, coefficient of variation 2.4%), confirming remarkable stability. A linear regression of

log (MSE)

on

log (Δ t)

yields slope =

1.02

(

R^{2} =

0.9997

); the hypothesis

H_{0}

(slope

= 1

) cannot be rejected (

t =

0.38

,

p = 0.74

). A permutation test (

10, 000

permutations) yields

p_{perm} = 0.003

, confirming linear scaling. Together with Proposition 3, these results support Condition (C1), though the assumption may fail under distributional shift.

4.6. Ablation and Robustness

Table 16 shows each component contributes significantly (Friedman

p < 0.001

); Table 17 confirms variance reduction exceeds 35% across all sensitivity and misspecification scenarios, with premium error proportional to

{∥ \hat{Σ} - Σ_{0} ∥}_{F}

(Remark 3).

4.7. Time-Varying Copula Analysis

To address the concern that a fixed copula may not capture evolving climate dependence, we implement the sliding-window estimator of Definition 3 with

w_{c} \in {3, 5, 7}

years and compare against the static (

w_{c} = 10

) baseline.

Table 18 reveals two findings. First, the typhoon–flood correlation increases over recent windows (

\bar{ρ}

rises from 0.612 to 0.658), consistent with intensifying compound flood events under climate change. Second, the 5-year window achieves the highest variance reduction (46.2%), improving upon the static baseline by 2.7 percentage points, though the 3-year window shows higher estimation variance. These results confirm that dynamic copula estimation provides a meaningful improvement and validate the theoretical extension of Lemma 1. For the remaining analyses we retain the static copula as the conservative baseline, but report the 5-year dynamic results as a robustness check.

4.8. Robustness: Leave-One-City-Out

As a further robustness check, we perform leave-one-city-out (LOCO) cross-validation, re-estimating the full Euler scheme while sequentially holding out each city from training. Table 19 shows that the LOCO results (

42.4 \pm 1.3 %

) are stable and close to the full-sample estimate (43.5%), with no single city driving the overall performance.

4.9. Green Finance and Regional Heterogeneity

To explore the associative relationship between green-finance variables and the premium, we estimate a pooled OLS regression of the log premium on green-finance and control variables. We stress that this analysis is correlational: the small panel (

N \times T = 110

) and potential omitted-variable bias preclude causal interpretation. Table 20 reports the results. Both green finance development (coefficient

- 0.17

,

p < 0.001

) and green technology adoption (coefficient

- 0.21

,

p < 0.001

) exhibit statistically significant negative associations with premiums, while carbon emissions (coefficient

0.13

,

p = 0.002

) show a positive association. These patterns are consistent with the broader green-finance literature [13,14,32,40], though establishing causality would require instrumental-variable or natural-experiment designs beyond the scope of this study.

Table 21 reveals substantial heterogeneity: the effect of green finance is strongest in developed cities and weakest in less-developed cities. The Chow test (

F = 5.13

,

p = 0.002

) confirms that the regional coefficients are statistically distinct. Figure 8 illustrates the regional variation in green-finance coefficients.

A Sobel mediation test yields

z = 3.12

,

p = 0.002

, suggesting that approximately 41% of the association is mediated through technology adoption. We interpret this as suggestive evidence of a mediation pathway rather than a causal demonstration, given the observational nature of the data and the small sample size. Within the BSDE framework, the theoretical mechanism is that green finance increases the drift-reduction parameters

ϕ_{j}

through technology adoption, reducing the forward drift and hence the premium (see the trade-off condition (6)).

5. Discussion

The key structural distinction from existing methods is that static copula models [9], machine-learning forecasters [2,37], and reinforcement-learning controllers [23] each address one facet of the pricing problem in isolation. Setting

η = 0

in the driver recovers the linear pricing of copula-only approaches; removing the LSTM eliminates dynamic forecasting; removing Q-learning eliminates policy optimisation. In the proposed framework all three are components of a single Euler scheme for a controlled g-expectation, unified by the convergence theorem rather than assembled ad hoc. The premium ordering of Corollary 1 is with respect to the Loewner order on

Σ

; the separate role of

ν

in tail dependence is discussed in Remark 2.

A natural multi-agent extension leads to mean-field games, where each insurer’s BSDE driver depends on the empirical premium distribution; as the number of agents grows the equilibrium converges to a McKean–Vlasov BSDE [41], complementing game-theoretic [26,27], subsidy-design [19], and adverse-selection [28] perspectives with a continuous-time stochastic-control foundation.

Several limitations should be acknowledged: the sliding-window copula improves variance reduction by 2.7 pp (Section 4.7); Condition (C1) has partial theoretical and strong empirical support (Proposition 3,

p_{perm} = 0.003

); copula misspecification degrades gracefully; DQN offers marginal gains at

8 \times

cost; and Algorithm 1 provides a modular pipeline with graceful degradation to ARIMA. Key open problems are rigorous approximation bounds for LSTMs on jump-diffusion paths, fully parametric dynamic copulas [42], and semiparametric marginals to strengthen copula identifiability.

The cross-sectional dimension of eleven cities remains modest. The bootstrap analysis (Table 6) and cross-province transfer experiment (Table 7) quantify these uncertainties. Nevertheless, extending the framework to a national-scale panel would strengthen external validity and allow finer regional stratification.

6. Conclusions

This paper develops a unified BSDE framework for multi-peril agricultural insurance pricing. Three progressive results adapt known BSDE theory to compound-dependence and policy-control settings: existence and uniqueness (Theorem 2), comparison and monotonicity (Theorem 3, Corollary 1), and half-order Euler convergence (Theorem 4), with copula, LSTM, and Q-learning as sequential components of a single discretisation. Developed cities receive green discounts while less-developed cities see negligible reductions, supporting targeted green-finance policies [15,32,40]. Extensions include fully parametric dynamic copulas [42], national-scale panels via deep BSDE solvers [25], and higher-dimensional peril structures (

d > 3

), where the theory extends without modification but tabular Q-learning requires replacement by DQN or deep BSDE methods to avoid the curse of dimensionality.

Author Contributions

Conceptualization, Y.P., J.Z. and Y.C.; methodology, Y.P., J.Z. and J.L.; software, Y.P., Q.C. and Z.L.; validation, Y.P., J.Z., J.L. and Y.Z.; formal analysis, Y.P., J.Z. and Q.T.; investigation, Y.P., Y.C., Q.C. and X.L.; resources, J.Z., J.L. and Q.T.; data curation, Y.P., Q.C., Z.L. and X.L.; writing—original draft, Y.P., Y.C. and Z.L.; writing—review and editing, J.Z., J.L., Y.Z. and Q.T.; visualization, Y.P., Q.C., Z.L. and X.L.; supervision, J.Z. and J.L.; project administration, J.Z., Y.C. and Q.T.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Zhejiang Provincial Xinmiao Talent Program (Grant Nos. 2025R405A030, 2025R405A039) and supported by the National College Students’ Innovation and Entrepreneurship Training Program (Project: Research on Agricultural Insurance Pricing Mechanism under Green Finance Policy Based on Multi-hazard Joint Modeling and Distributed Intelligent Actuarial System). This research was also supported by the Project under Grant No. 25TJJD12. The corresponding author’s research was supported by the Zhejiang Provincial Natural Science Foundation of China under Grant Nos. LQ23A010012 and LZ26A010003, and the Ningbo Natural Science Foundation under Grant No. 2024J193.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Goodwin, B.K.; Hungerford, A. Copula-based models of systemic risk in U.S. agriculture: Implications for crop insurance and reinsurance contracts. Am. J. Agric. Econ. 2015, 97, 879–896. [Google Scholar] [CrossRef]
Liu, Y.; Ker, A.P. Simultaneous borrowing of information across space and time for pricing insurance contracts: An application to rating crop insurance policies. J. Risk Insur. 2021, 88, 231–257. [Google Scholar] [CrossRef]
Peng, S. Nonlinear expectations, nonlinear evaluations and risk measures. In Stochastic Methods in Finance; Springer: Berlin/Heidelberg, Germany, 2004; pp. 165–253. [Google Scholar]
Pardoux, É.; Peng, S. Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 1990, 14, 55–61. [Google Scholar] [CrossRef]
Liang, Z.; Yuen, K.C. Optimal dynamic reinsurance with dependent risks: Variance premium principle. Scand. Actuar. J. 2016, 2016, 18–36. [Google Scholar] [CrossRef]
Cohen, S.N.; Elliott, R.J. A general theory of finite state backward stochastic difference equations. Stoch. Process. Appl. 2010, 120, 442–466. [Google Scholar] [CrossRef][Green Version]
Zscheischler, J.; Westra, S.; van den Hurk, B.J.J.M.; Seneviratne, S.I.; Ward, P.J.; Pitman, A.; AghaKouchak, A.; Bresch, D.N.; Leonard, M.; Wahl, T.; et al. Future climate risk from compound events. Nat. Clim. Change 2018, 8, 469–477. [Google Scholar] [CrossRef]
Carter, M.R.; Cheng, L.; Sarris, A. Where and how index insurance can boost the adoption of improved agricultural technologies. J. Dev. Econ. 2016, 118, 59–71. [Google Scholar] [CrossRef]
Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
Azahra, A.S.; Johansyah, M.D.; Sukono. The development of fractional Black–Scholes model solution using the Daftardar-Gejji Laplace method for determining rainfall index-based agricultural insurance premiums. Mathematics 2025, 13, 1725. [Google Scholar] [CrossRef]
Necula, C. Option Pricing in a Fractional Brownian Motion Environment; Bucharest University of Economics, Center for Advanced Research in Finance and Banking: Bucharest, Romania, 2002; pp. 1–18. [Google Scholar]
He, X.J.; Lin, S. A fractional Black–Scholes model with stochastic volatility. Expert Syst. Appl. 2021, 178, 114983. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, Z.; Managi, S. A bibliometric analysis on green finance: Current status, development, and future directions. Financ. Res. Lett. 2019, 29, 425–430. [Google Scholar] [CrossRef]
Lee, C.-C.; Lee, C.-C. How does green finance affect green total factor productivity? Evidence from China. Energy Econ. 2022, 107, 105863. [Google Scholar] [CrossRef]
Zhou, X.; Tang, X.; Zhang, R. Impact of green finance on economic development and environmental quality: A study based on provincial panel data from China. Environ. Sci. Pollut. Res. 2020, 27, 19915–19932. [Google Scholar] [CrossRef]
Miao, C.; Fang, D.; Sun, L.; Luo, Q. Natural resources utilization efficiency under the influence of green technological innovation. Resour. Conserv. Recycl. 2017, 126, 153–161. [Google Scholar] [CrossRef]
Hazell, P.; Anderson, J.; Balzer, N.; Hastrup Clemmensen, A.; Hess, U.; Rispoli, F. The Potential for Scale and Sustainability in Weather Index Insurance for Agriculture and Rural Livelihoods; IFAD/WFP: Rome, Italy, 2010. [Google Scholar]
Goodwin, B.K.; Smith, V.H. What harm is done by subsidizing crop insurance? Am. J. Agric. Econ. 2013, 95, 489–497. [Google Scholar] [CrossRef]
Mahul, O.; Stutley, C.J. Government Support to Agricultural Insurance; World Bank: Washington, DC, USA, 2010. [Google Scholar]
Liu, S.; Jiang, R.; Liu, L.; Chan, F.T.S. A three-way decision and DEA game cross-efficiency hybrid approach to the procurement mode selection in contract farming. Socio-Econ. Plan. Sci. 2025, 102, 102336. [Google Scholar] [CrossRef]
Xie, X.; Hu, Y.; Li, X.; Li, S.; Li, X.; Li, Y. Assessing and improving the food security of China in the climate change. Systems 2025, 13, 1054. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Pham, H. Continuous-Time Stochastic Control and Optimization with Financial Applications; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Han, J.; Jentzen, A.; E, W. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, 8505–8510. [Google Scholar] [CrossRef] [PubMed]
Bourgeon, J.-M.; Chambers, R.G. Optimal area-yield crop insurance reconsidered. Am. J. Agric. Econ. 2003, 85, 590–604. [Google Scholar] [CrossRef]
Aase, K.K. Equilibrium in a reinsurance syndicate; existence, uniqueness and characterization. ASTIN Bull. 1993, 23, 185–211. [Google Scholar] [CrossRef]
Rothschild, M.; Stiglitz, J.E. Equilibrium in competitive insurance markets. Q. J. Econ. 1976, 90, 629–649. [Google Scholar] [CrossRef]
El Karoui, N.; Peng, S.; Quenez, M.C. Backward stochastic differential equations in finance. Math. Financ. 1997, 7, 1–71. [Google Scholar] [CrossRef]
Barles, G.; Buckdahn, R.; Pardoux, É. Backward stochastic differential equations and integral-partial differential equations. Stoch. Stoch. Rep. 1997, 60, 57–83. [Google Scholar] [CrossRef]
Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Wang, Y.; Zhi, Q. The role of green finance in environmental protection: Two aspects of market mechanism and policies. Energy Procedia 2016, 104, 311–316. [Google Scholar] [CrossRef]
Applebaum, D. Lévy Processes and Stochastic Calculus, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Tang, S.; Li, X. Necessary conditions for optimal control of stochastic systems with random jumps. SIAM J. Control Optim. 1994, 32, 1447–1475. [Google Scholar] [CrossRef]
Royer, M. Backward stochastic differential equations with jumps and related non-linear expectations. Stoch. Process. Appl. 2006, 116, 1358–1376. [Google Scholar] [CrossRef]
Zhejiang Bureau of Statistics. Zhejiang Statistical Yearbook 2023; China Statistics Press: Beijing, China, 2023.
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Even-Dar, E.; Mansour, Y. Learning rates for Q-learning. J. Mach. Learn. Res. 2003, 5, 1–25. [Google Scholar]
Zhang, J. A numerical scheme for BSDEs. Ann. Appl. Probab. 2004, 14, 459–488. [Google Scholar] [CrossRef]
Flammer, C. Corporate green bonds. J. Financ. Econ. 2021, 142, 499–516. [Google Scholar] [CrossRef]
Carmona, R.; Delarue, F. Probabilistic Theory of Mean Field Games with Applications; Springer: Cham, Switzerland, 2018; Volumes I–II. [Google Scholar]
Patton, A.J. Modelling asymmetric exchange rate dependence. Int. Econ. Rev. 2006, 47, 527–556. [Google Scholar] [CrossRef]

Figure 1. Schematic of the modelling pipeline. Arrows indicate information flow; dashed arrows indicate feedback. Each module corresponds to one ingredient of the Euler discretisation (Section 3).

Figure 2. Monte Carlo simulated loss paths for the typhoon peril (

j = 1

) from the forward SDE (1). The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The shaded region represents the 90% confidence band from

N_{sim} = 10,000

simulations.

Figure 2. Monte Carlo simulated loss paths for the typhoon peril (

j = 1

) from the forward SDE (1). The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The shaded region represents the 90% confidence band from

N_{sim} = 10,000

simulations.

Figure 3. Monte Carlo simulated loss paths for the flood peril (

j = 2

). Flood losses exhibit the highest correlation with typhoon losses (

\hat{τ} = 0.428

, Section 4.4.1), visible as co-movement in the jump arrival times. The shaded region represents the 90% confidence band from

N_{sim} = 10,000

simulations.

Figure 3. Monte Carlo simulated loss paths for the flood peril (

j = 2

). Flood losses exhibit the highest correlation with typhoon losses (

\hat{τ} = 0.428

, Section 4.4.1), visible as co-movement in the jump arrival times. The shaded region represents the 90% confidence band from

N_{sim} = 10,000

simulations.

Figure 4. Monte Carlo simulated loss paths for the drought peril (

j = 3

). Drought losses are characterised by smaller but more persistent deviations, with negative correlation to typhoon (

\hat{τ} = - 0.142

), reflecting the climatic opposition between excessive rainfall and water deficit.

Figure 4. Monte Carlo simulated loss paths for the drought peril (

j = 3

). Drought losses are characterised by smaller but more persistent deviations, with negative correlation to typhoon (

\hat{τ} = - 0.142

), reflecting the climatic opposition between excessive rainfall and water deficit.

Figure 5. Distributional characteristics of the three peril loss ratios (

N \times T = 110

). (a) Box plots with jittered individual observations; diamonds indicate means and horizontal bars indicate medians. (b) Kernel density estimates with dashed lines marking means and dotted lines marking medians.

Figure 5. Distributional characteristics of the three peril loss ratios (

N \times T = 110

). (a) Box plots with jittered individual observations; diamonds indicate means and horizontal bars indicate medians. (b) Kernel density estimates with dashed lines marking means and dotted lines marking medians.

Figure 6. Pairwise dependence structure among the three perils.

Figure 7. Q-learning training reward over 500 episodes.

Figure 8. Regional heterogeneity in green-finance effects on the premium. Point estimates and 95% confidence intervals of the regression coefficients for green finance development (Q), green technology adoption (

Z_{gr}

), and carbon emissions (I) across three regional groups (Developed, Intermediate, Less Developed). Negative coefficients (green region) indicate premium reduction; positive coefficients (red region) indicate premium increase. Significance levels: *** p < 0.01, ** p < 0.05, * p < 0.10.

Figure 8. Regional heterogeneity in green-finance effects on the premium. Point estimates and 95% confidence intervals of the regression coefficients for green finance development (Q), green technology adoption (

Z_{gr}

), and carbon emissions (I) across three regional groups (Developed, Intermediate, Less Developed). Negative coefficients (green region) indicate premium reduction; positive coefficients (red region) indicate premium increase. Significance levels: *** p < 0.01, ** p < 0.05, * p < 0.10.

Table 1. Comparison with representative existing studies.

Study	BSDE	Nonlinear Exp.	Tail Dep.	Dynamic	Forecast	Control	Green Fin.
El Karoui et al. [29]	✓	✓		✓
Goodwin & Hungerford [1]			✓
Azahra et al. [10]				✓
Miao et al. [16]							✓
Liu & Ker [2]					✓
This paper	✓	✓	✓	✓	✓	✓	✓

Note: ✓ indicates the feature is present in the study.

Table 2. Comparison across RL algorithms.

Algorithm	$\| S \|$	Var. Red. (%)	Mean L	$\| L - L^{*} \|$	Train (s)	Conv. Guar.
Tabular QL	200	43.5	0.678	0.031	12	Yes
Tabular QL	500	43.9	0.676	0.030	28	Yes
DQN	cont.	44.3	0.673	0.028	97	No
REINFORCE	cont.	43.1	0.681	0.033	45	No
A2C	cont.	44.0	0.675	0.029	68	No

Table 3. Final dynamic premium rates vs. static and filed rates (2023 policy year, %).

City	$R_{f}^{(stat)}$	$R_{f}^{(dyn)}$	$R_{f}^{(filed)}$	$Δ^{(dyn - stat)}$	Loss Ratio
Hangzhou	2.78	3.18	3.05	+0.40	0.672
Ningbo	3.21	3.52	3.38	+0.31	0.685
Jiaxing	2.43	2.58	2.51	+0.15	0.661
Shaoxing	2.51	2.72	2.63	+0.21	0.668
Huzhou	2.14	2.05	2.18	−0.09	0.643
Wenzhou	4.05	4.73	4.42	+0.68	0.701
Jinhua	2.32	2.56	2.45	+0.24	0.665
Taizhou	5.02	5.91	5.53	+0.89	0.712
Zhoushan	3.45	3.78	3.61	+0.33	0.689
Quzhou	1.86	1.89	1.92	+0.03	0.648
Lishui	1.62	1.54	1.68	−0.08	0.639
Mean	2.85	3.13	3.03	+0.28	0.678

Table 4. Variable definitions.

Type	Variable	Symbol	Unit	Source
Dependent	Premium/Final Rate	P, $R_{f}$	yuan/%	Model
Explanatory	Cultivated Area	A	mu	Agri. Dept.
	Market Price	$M_{p}$	yuan/ton	NBS
	Agricultural Productivity	$Y_{p}$	%	Yearbook
	Carbon Emissions	I	ton/mu	Govt. reports
Risk	Typhoon/Flood/Drought loss ratios	$L_{typ}, L_{fld}, L_{drt}$	%	Insurance
Green Fin.	Green Finance Development	Q	bn yuan	Fin. reports
Green Fin.	Green Technology Adoption	$Z_{gr}$	%	Surveys

Table 5. Descriptive statistics (

N \times T = 110

).

Table 5. Descriptive statistics (

N \times T = 110

).

Var.	Mean	Std	Min	Med	Max	Skew
$L_{typ}$	19.1	13.4	0.8	15.8	62.3	1.28
$L_{fld}$	16.2	10.5	1.6	13.9	48.7	1.02
$L_{drt}$	8.7	6.1	0.5	7.4	27.3	1.12
I	0.45	0.17	0.15	0.42	0.82	0.52
$Z_{gr}$	21.8	12.5	3.2	20.4	48.7	0.45
Q	11.3	8.9	0.8	8.7	35.6	0.87

Table 6. Bootstrap confidence intervals for key metrics.

Metric	Point Estimate	95% Bootstrap CI	Bootstrap SE
Variance reduction (%)	43.5	[38.2, 48.7]	2.68
Mean loss ratio	0.678	[0.651, 0.706]	0.014
$\| L - L^{*} \|$	0.031	[0.022, 0.041]	0.005
Copula $\hat{ν}$	5.41	[3.62, 8.37]	1.21
${\hat{ρ}}_{Typ – Fld}$	0.612	[0.483, 0.728]	0.063

Table 7. Cross-province transfer to Jiangxi Province.

Setting	Var. Red. (%)	Mean L	$R^{2}$ (LSTM)
(i) Direct transfer	28.4	0.731	0.512
(ii) Copula re-estimated	34.7	0.708	0.512
(iii) Full re-estimation	39.2	0.694	0.681
Zhejiang (reference)	43.5	0.678	0.774

Table 8. Marginal distribution fitting.

Peril	Distribution	Parameters	K–S	p	AIC
Typhoon	Log-normal	$μ = - 1.78, σ = 0.96$	0.074	0.641	$- 68.5$
Flood	Gamma	$α = 2.21, β = 0.069$	0.089	0.487	$- 62.1$
Drought	Weibull	$k = 1.42, λ = 0.118$	0.081	0.553	$- 57.3$

Table 9. Pairwise dependence (

\hat{ν} = 5.41

, 95% CI

[3.62, 8.37]

).

Table 9. Pairwise dependence (

\hat{ν} = 5.41

, 95% CI

[3.62, 8.37]

).

Peril Pair	$\hat{τ}$	$\hat{ρ}$	$\hat{λ}$	p
Typhoon–Flood	0.428	0.612	0.358	0.001
Typhoon–Drought	−0.142	−0.221	0.052	0.108
Flood–Drought	0.301	0.452	0.197	0.012

Table 10. Copula model comparison.

Copula	Log-Lik	AIC	$S_{n}$ (p)	Tail
t ( $\hat{ν}$ = 5.41)	21.83	$- 37.66$	0.027 (0.523)	Symmetric
Gaussian	16.45	$- 28.90$	0.052 (0.148)	None
Clayton	12.78	$- 21.56$	0.081 (0.028)	Lower
Frank	10.34	$- 16.68$	0.098 (0.014)	None

Table 11. Loss-ratio prediction performance (test set: 2023).

Model	RMSE	MAE	$R^{2}$	MAPE%	Rank
Static Mean	0.1187	0.0952	0.438	17.82	6
ARIMA (2, 1, 1)	0.1048	0.0821	0.561	14.93	5
ETS	0.1012	0.0793	0.592	14.18	4
SVR (RBF)	0.0938	0.0724	0.649	12.67	3
GRU	0.0905	0.0691	0.673	12.01	2
LSTM	0.0798	0.0608	0.746	10.52	1
LSTM + aug	0.0753	0.0574	0.774	9.87	–

Note: Bold values indicate the best-performing model(s).

Table 12. Green adjustment coefficients.

City	$G^{(stat)}$	$G^{(QL)}$	$Δ G$	Region
Hangzhou	0.97	0.92	−0.05	Developed
Ningbo	0.96	0.91	−0.05	Developed
Jiaxing	0.98	0.95	−0.03	Developed
Shaoxing	0.99	0.95	−0.04	Developed
Huzhou	0.97	0.93	−0.04	Developed
Wenzhou	1.02	1.01	−0.01	Interm.
Jinhua	1.03	1.02	−0.01	Interm.
Taizhou	1.05	1.04	−0.01	Interm.
Zhoushan	1.01	0.98	−0.03	Interm.
Quzhou	1.07	1.06	−0.01	Less Dev.
Lishui	1.09	1.08	−0.01	Less Dev.

Table 13. Verification of comparison theorem.

City	$R_{f}^{(Σ)}$ (%)	$R_{f}^{(I_{d})}$ (%)	Ratio
Hangzhou	3.18	2.72	1.17
Ningbo	3.52	3.01	1.17
Taizhou	5.91	4.97	1.19
Wenzhou	4.73	4.02	1.18
Lishui	1.54	1.40	1.10
Zhoushan	3.78	3.23	1.17

Table 14. Empirical convergence rate.

$Δ t$ (Years)	n	RMSE $(Y)$	Ratio
1.000	10	0.0398	–
0.500	20	0.0276	1.44
0.250	40	0.0192	1.44
0.125	80	0.0138	1.39
Regression: $log (RMSE) = 0.49 log (Δ t) + c$			$R^{2} = 0.998$

Table 15. Verification of Condition (C1): the ratio

{\hat{C}}_{L} = MSE / Δ t

is stable across four time-step sizes, supporting the linear scaling

MSE = O (Δ t)

.

Table 15. Verification of Condition (C1): the ratio

{\hat{C}}_{L} = MSE / Δ t

is stable across four time-step sizes, supporting the linear scaling

MSE = O (Δ t)

.

$Δ t$ (yr)	Empirical MSE	${\hat{C}}_{L}$	95% Bootstrap CI	${\hat{C}}_{L}$ Rel. Dev. (%)
1.000	0.01189	0.01189	[0.0091, 0.0158]	+2.8
0.500	0.00567	0.01134	[0.0087, 0.0148]	−1.9
0.250	0.00298	0.01192	[0.0092, 0.0155]	+3.1
0.125	0.00142	0.01136	[0.0085, 0.0152]	−1.7

Table 16. Ablation study.

Config.	Prem. Var.	Mean L	Var. Red.	$\| L - L^{*} \|$	Wilcoxon p
(i) Static	$5.62 \times 10^{7}$	0.721	–	0.085	–
(ii) +Copula	$4.91 \times 10^{7}$	0.708	12.6%	0.072	0.018
(iii) +Cop.+LSTM	$3.58 \times 10^{7}$	0.691	36.3%	0.048	<0.001
(iv) Full	$3.17 \times 10^{7}$	0.678	43.5%	0.031	<0.001

Table 17. Sensitivity and misspecification robustness.

Analysis	Configuration	Var. Red. (%)	Note
Sensitivity	Baseline ( $γ$ = 0.95, h = 64, $ν$ = 5.41)	43.5	–
	Worst ( $γ$ = 0.90, h = 32, $ν$ = 7.41)	35.6	–
	Best ( $γ$ = 0.99, h = 128, $ν$ = 3.41)	47.0	–
Misspecification	t (correct)	43.5	Mono. Yes (11/11)
	Gaussian ( $∥ δ Σ ∥$ = 0.048)	41.8	Yes (11/11)
	Clayton ( $∥ δ Σ ∥$ = 0.127)	38.2	Yes (10/11)
	Frank ( $∥ δ Σ ∥$ = 0.083)	40.1	Yes (11/11)
	Mixture ( $∥ δ Σ ∥$ = 0.061)	42.3	Yes (11/11)

Table 18. Static vs. time-varying copula comparison.

Window $w_{c}$ (Years)	${\bar{ν}}_{t}$	${\bar{ρ}}_{Typ – Fld}$	Var. Red. (%)	Mean L
10 (static)	5.41	0.612	43.5	0.678
7	$5.23 \pm 0.41$	$0.624 \pm 0.038$	44.8	0.674
5	$5.07 \pm 0.68$	$0.641 \pm 0.057$	46.2	0.669
3	$4.89 \pm 1.12$	$0.658 \pm 0.093$	45.1	0.672

Table 19. Leave-one-city-out robustness.

City Held Out	Var. Red. (%)	Mean L	$\| L - L^{*} \|$
Hangzhou	41.3	0.686	0.035
Ningbo	42.5	0.682	0.033
Taizhou	39.8	0.697	0.039
Wenzhou	40.7	0.691	0.037
Jinhua	43.1	0.676	0.031
Shaoxing	42.8	0.678	0.031
Jiaxing	43.3	0.677	0.030
Huzhou	43.5	0.675	0.030
Lishui	44.0	0.672	0.029
Quzhou	43.7	0.674	0.029
Zhoushan	41.8	0.684	0.034
Mean ± Std	$42.4 \pm 1.3$	$0.681 \pm 0.008$	$0.033 \pm 0.003$

Table 20. Regression (dependent:

ln R_{f}

;

N \times T = 110

).

Table 20. Regression (dependent:

ln R_{f}

;

N \times T = 110

).

Variable	Coeff	SE	t	p	VIF
Q	−0.17	0.04	−4.25	0.000	1.78
$Z_{gr}$	−0.21	0.04	−5.25	0.000	1.63
$Y_{p}$	−0.10	0.03	−3.33	0.001	1.31
I	0.13	0.04	3.25	0.002	1.85
A	−0.14	0.03	−4.67	0.000	1.42
D	−0.07	0.02	−3.50	0.001	1.11
$R^{2} = 0.703$ , Adj. $R^{2} = 0.686$		$F = 40.5$ , $p < 0.001$		$DW = 1.91$

Table 21. Regional heterogeneity (Chow test:

F = 5.13

,

p = 0.002

).

Table 21. Regional heterogeneity (Chow test:

F = 5.13

,

p = 0.002

).

	Developed		Intermediate		Less Dev.
Var	$β$	$p$	$β$	$p$	$β$	$p$
Q	−0.24 ***	0.000	−0.11 **	0.038	$- 0.07$	0.112
$Z_{gr}$	−0.29 ***	0.000	−0.14 ***	0.002	−0.09 *	0.058
I	0.07 **	0.041	0.14 ***	0.001	0.17 ***	0.000

*** p < 0.01, ** p < 0.05, * p < 0.10.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pei, Y.; Zhao, J.; Chen, Y.; Li, J.; Chen, Q.; Liu, Z.; Li, X.; Zhai, Y.; Tang, Q. Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics 2026, 14, 1043. https://doi.org/10.3390/math14061043

AMA Style

Pei Y, Zhao J, Chen Y, Li J, Chen Q, Liu Z, Li X, Zhai Y, Tang Q. Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics. 2026; 14(6):1043. https://doi.org/10.3390/math14061043

Chicago/Turabian Style

Pei, Yunjiao, Jun Zhao, Yankai Chen, Jianfeng Li, Qiaoting Chen, Zichen Liu, Xiyan Li, Yifan Zhai, and Qi Tang. 2026. "Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning" Mathematics 14, no. 6: 1043. https://doi.org/10.3390/math14061043

APA Style

Pei, Y., Zhao, J., Chen, Y., Li, J., Chen, Q., Liu, Z., Li, X., Zhai, Y., & Tang, Q. (2026). Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics, 14(6), 1043. https://doi.org/10.3390/math14061043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning

Abstract

1. Introduction

2. Continuous-Time BSDE Pricing Framework

2.1. Probability Space, Forward Loss Dynamics, and Green-Finance Mechanism

2.2. Driver Construction

2.3. BSDE Formulation of the Premium

2.4. Existence and Uniqueness

2.5. Comparison Theorem

2.6. Practitioner Implementation Guide

2.7. Nonlinear Expectation Interpretation

2.8. Optimal Green Control

3. Euler Discretisation of the Forward–Backward System

3.1. Ingredient (I): Dependence Estimation via the t-Copula

3.2. Ingredient (II): Conditional Expectation Approximation via LSTM

3.3. Ingredient (III): Discrete HJB Solution via Q-Learning

3.4. Assembling the Discrete Premium

3.5. Convergence of the Euler Scheme

3.6. Complete Proof of Theorem 4

4. Empirical Application and Analysis

4.1. Application: Dynamic Premium Rate Determination

4.2. Data and Variables

4.3. Cross-Province Transfer Experiment

4.4. Implementation of the Euler Scheme

4.4.1. Ingredient (I): Copula Dependence Estimation

4.4.2. Ingredient (II): LSTM Conditional Expectation

4.4.3. Ingredient (III): Q-Learning Green Optimisation

4.5. Theoretical Verification

4.5.1. Comparison Theorem (Theorem 3)

4.5.2. Convergence Rate (Theorem 4)

4.5.3. Verification of Condition (C1)

4.6. Ablation and Robustness

4.7. Time-Varying Copula Analysis

4.8. Robustness: Leave-One-City-Out

4.9. Green Finance and Regional Heterogeneity

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI