Next Article in Journal
Population-Based Metaheuristic Algorithms for a Hybrid Batch-Continuous Production Scheduling Problem in a Distributed Pharmaceutical Supply Chain
Previous Article in Journal
Win-Win or Laissez-Faire? Benchmarking Sovereign ESG Efficiency in OECD Countries Using Two-Stage DEA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning

1
School of Mathematics and Statistics, Ningbo University, Ningbo 315211, China
2
Entrepreneurship Academy, Nanyang Technological University, Singapore 639798, Singapore
3
School of Management and Economics, Tianjin University, Tianjin 300072, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(6), 1043; https://doi.org/10.3390/math14061043
Submission received: 12 February 2026 / Revised: 12 March 2026 / Accepted: 17 March 2026 / Published: 19 March 2026

Abstract

Pricing multi-peril agricultural insurance under compound climate hazards demands a framework that captures stochastic dependence among heterogeneous perils, accommodates non-stationary loss dynamics, and supports adaptive policy optimisation. We demonstrate that backward stochastic differential equations, combined with copula dependence, recurrent neural networks, and reinforcement learning, provide a unifying language for this task; the contribution lies in their principled integration. The dynamic premium is the unique adapted solution of a BSDE whose driver encodes compound-risk dependence through a Student-t copula, forward loss dynamics through a jump-diffusion process, and a green-finance adjustment through an optimal control variable. Within this framework we derive three progressive results by adapting standard BSDE theory to the compound-dependence and policy-control setting. First, existence and uniqueness hold under Lipschitz and square-integrability conditions. Second, a comparison theorem guarantees that a larger correlation matrix yields higher premiums; the degrees-of-freedom effect enters separately through the risk-loading magnitude. Third, the Euler discretisation converges at a rate of one half of the time-step size, with copula estimation, LSTM conditional expectation approximation, and Q-learning HJB solution as sequential components. Applied to eleven Zhejiang cities (2014–2023, N   ×   T = 110 ), in this illustrative application the framework reduces premium variance by 43.5 percent (bootstrap 95% CI: [ 38.2 % , 48.7 % ] ) while maintaining actuarial adequacy with a mean loss ratio of 0.678, though the modest sample size warrants caution in generalising these findings. Each component contributes statistically significant improvements confirmed by the Friedman test at the 0.1 percent significance level.

1. Introduction

Pricing multi-peril agricultural insurance under compound climate hazards requires evaluating a nonlinear conditional expectation of a terminal claim driven by correlated perils while optimising a green-finance control that feeds back into the loss dynamics. Existing approaches address tail dependence, non-stationary forecasting, and sequential control in isolation [1,2]. The premium naturally inherits the nonlinear expectation structure developed by Peng [3], embedding actuarial properties such as monotonicity and risk loading within a coherent probabilistic framework.
BSDEs were introduced by Pardoux and Peng [4] and have become a central tool in mathematical finance. In insurance, BSDEs have been applied to optimal reinsurance [5] and premium principles under ambiguity [6]. However, applications to multi-peril agricultural insurance with compound dependence and green-finance controls are absent from the literature.
Research on compound climate risk has shifted from single-factor assessments to compound extreme-event modelling via copula-based dependence structures [7,8,9]. Goodwin and Hungerford [1] applied copula-based models to systemic agricultural risk but without temporal dynamics. Black–Scholes and fractional extensions have been used for agricultural premium determination [10,11,12]. However, these assume specific parametric dynamics that may not capture the full dependence spectrum of multi-peril risks. In the domain of green finance, green credit, bonds, and carbon markets mobilise funds and mitigate environmental risks [13], with regional effectiveness varying significantly [14,15]. Miao et al. [16] investigated the influence of green technological innovation on resource utilisation efficiency but did not operationalise this within a dynamic pricing framework.
The broader literature on agricultural insurance design [17,18] and government support mechanisms [19] further motivates the need for rigorous pricing methods that account for climate non-stationarity. Recent advances in agricultural decision systems [20] and food-security assessment under climate change [21] reinforce this urgency. Q-learning [22] and its deep extensions have been applied to financial decision-making [23] but rarely to insurance pricing. The well-known theoretical connection between discrete-time Q-learning and the continuous-time Hamilton–Jacobi–Bellman equation [24] has not been exploited to provide convergence guarantees in an insurance-pricing context. Recent work on deep BSDE solvers [25] opens the possibility of scaling such approaches, while game-theoretic analyses of insurance markets [26,27] and information-asymmetry models [28] provide complementary perspectives on market equilibrium.
Table 1 contrasts the proposed framework with representative existing studies.
The novelty resides not in these components individually but in the structural integration: the BSDE driver provides a single mathematical object in which copula dependence, recurrent forecasting, and policy optimisation interact through rigorously defined information flow, enabling comparison, stability, and convergence results that are inaccessible when the same tools are composed ad hoc. In  particular, the copula correlation matrix Σ enters the driver’s risk-loading term and simultaneously determines the forward jump structure; the LSTM approximates the conditional expectation required by the Euler step; and Q-learning solves the discrete HJB arising from the same Euler discretisation. This tight coupling, formalized in Theorem 4, is what distinguishes the framework from a mere concatenation of existing methods. Starting from the expected-loss premium, we successively incorporate a copula-based risk loading, a green-finance implementation cost, and an actuarial penalty, with each term motivated by a specific economic or regulatory requirement as detailed in Section 2.2. The existence and uniqueness result in Theorem 2 follows from standard BSDE theory applied to our specific driver. The contribution here is in verifying the requisite conditions for the copula-structured, control-dependent driver as established in Lemma 2, and in showing that the premium admits a nonlinear expectation representation as stated in Proposition 1.
The second contribution concerns structural properties inherited from the nonlinear expectation. The comparison theorem (Theorem 3) and the stability estimate are adaptations of standard BSDE results [30] to the present driver; the contribution lies not in the proofs but in the actuarial interpretation. The comparison theorem, applied to our copula-structured driver, yields monotonicity of premiums with respect to dependence strength (Corollary 1).
The third contribution is the Euler discretisation with identified components and convergence. We derive the Euler scheme for the forward–backward system and show that the three computational steps, namely copula-based dependence estimation, LSTM conditional expectation approximation, and Q-learning discrete Hamilton–Jacobi–Bellman solution, are not independent layers but sequential components of this single scheme. These components are linked by the information flow from dependence estimation to conditional expectation to optimal control, as described in Section 3.
Three key design choices are made. First, BSDEs are used over static actuarial formulae to generate a dynamic, time-consistent premium process that ensures a monotonic relationship between losses and premiums via the comparison theorem. Second, a copula-structured driver separates modeling individual risk distributions from their dependencies, allowing independent optimization of marginals and natural translation of dependency strength into premium loading. Third, while simpler alternatives exist, LSTM and Q-learning are chosen based on a performance–cost trade-off, as the modular framework allows substitution, and an ablation study quantifies each component’s contribution.
Figure 1 provides a schematic of the complete modelling pipeline. Historical loss data enter the copula estimation module (Ingredient I), which outputs the dependence structure ( ν ^ , Σ ^ ) and augmented training sequences. These feed into the LSTM forecasting module (Ingredient II), producing base premium rates and risk indices. The Q-learning module (Ingredient III) takes the risk indices as MDP state and outputs optimal green adjustment factors. The three outputs are assembled into the dynamic premium via (18), with loss-ratio feedback closing the loop. This pipeline corresponds to Algorithm 1.
Algorithm 1 Modular
Practitioner Pipeline
  1:
Module A—Copula Estimation. Fit marginals (MLE/AIC), estimate ( ν ^ , Σ ^ ) via IFM, validate (Cramér–von Mises, p > 0.05 ), generate N aug synthetic sequences.
  2:
Module B—Loss Forecasting. Train LSTM (5-fold temporal CV) on historical + augmented data; output base rate R 0 , k = E ^ L , k / S k and risk index H k .
  3:
Module C—Green Optimisation. Run Q-learning (Algorithm 2) to obtain G k .
  4:
Assembly. R f , k = R 0 , k   ×   ( 1 + H k )   ×   G k .
Algorithm 2 Q-Learning for Green Adjustment
1:
Input: S , A , γ = 0.95 , N ep = 500
2:
Initialise Q ( s , a ) = 0 for all ( s , a )
3:
for  e = 1 to N ep  do
4:
    ε e = max ( 0.01 , 1 0.002 e ) , α e = 0.1 / ( 1 + 0.01 e )
5:
   for  k = 0 to n 1  do
6:
        ε -greedy action; observe r k , s k + 1 ; update via (16)
7:
   end for
8:
end for
9:
Output:  G k QL = arg max a Q ( s k , a )
The remainder of this paper is organised as follows. Section 2 develops the continuous-time BSDE framework. Section 3 derives the Euler discretisation, identifies the three computational ingredients, and establishes convergence. Section 4 reports the empirical analysis, divided into implementation in Section 4.4 and theoretical verification in Section 4.5. Section 5 discusses implications and limitations. Section 6 concludes.

2. Continuous-Time BSDE Pricing Framework

All results are rigorous under the stated assumptions; the differing status of Conditions (C1)–(C3) is discussed in Remark 6.

2.1. Probability Space, Forward Loss Dynamics, and Green-Finance Mechanism

Let ( Ω , F , { F t } t [ 0 , T ] , P ) be a filtered probability space satisfying the usual conditions, supporting a d-dimensional Brownian motion W t = ( W t ( 1 ) , , W t ( d ) ) and an independent Poisson random measure N ( d t , d z ) on [ 0 , T ]   ×   R + d with compensator N ^ ( d t , d z ) = ν J ( d z ) d t . Here d = 3 corresponds to the three perils: typhoon ( j = 1 ), flood ( j = 2 ), and drought ( j = 3 ).
Definition 1 (Copula-structured compound loss process).
The aggregate loss vector L t = ( L t ( 1 ) , L t ( 2 ) , L t ( 3 ) ) satisfies
d L t ( j ) = μ j 0 ( t , L t ) ϕ j G t d t + σ j ( t , L t ) d W t ( j ) + R + γ j ( t , L t , z ) N ˜ ( j ) ( d t , d z ) , j = 1 , , d ,
with initial condition L 0 = 0 R + d . Here μ j 0 is the baseline drift, ϕ j > 0 quantifies the marginal loss reduction due to green investment in peril j, and G t [ G ̲ , G ¯ ] is the green adjustment control adapted to F t .
Equation (1) states that each peril’s loss evolves under three forces: a predictable trend ( μ j 0 ϕ j G t ), random fluctuations ( σ j d W t ( j ) ), and sudden catastrophic jumps ( γ j d N ˜ ( j ) ). The green control G t acts as a brake on the drift: investing more in green technology slows expected loss growth at rate ϕ j per unit of investment.
The Brownian drivers are correlated via Σ = [ ρ j k ] with ρ j k = sin ( π 2 τ ^ j k ) obtained from Kendall’s τ . Second, the joint distribution of jump sizes is specified by the t-Copula [9]:
F z 1 , z 2 , z 3 ( x 1 , x 2 , x 3 ) = C ν , Σ F 1 ( x 1 ) , F 2 ( x 2 ) , F 3 ( x 3 ) .
Marginals F ^ j are first estimated by MLE, and then the copula parameters are estimated by the IFM method conditional on F ^ j [31]. Marginal misspecification is the key vulnerability; the K–S tests (Section 4.4.1, all p > 0.45 ) and the copula Cramér–von Mises test ( p = 0.523 ) mitigate this concern.
The decomposition μ j ( t , L t , G t ) = μ j 0 ( t , L t ) ϕ j G t reflects the empirical finding (Section 4.9) that green technology adoption reduces expected agricultural losses [13,32], with the effect varying across peril types. The control G t simultaneously affects the forward loss dynamics through drift reduction and the backward premium through the BSDE driver, creating a feedback loop whose trade-off structure is analysed.
Definition 2 (t-Copula density).
The density of the d-dimensional t-Copula with ν degrees of freedom and correlation matrix Σ is
c ν , Σ ( u ) = Γ ν + d 2 Γ ν 2 d 1 | Σ | 1 / 2 Γ ν + 1 2 d · 1 + ξ Σ 1 ξ ν ( ν + d ) / 2 j = 1 d 1 + ξ j 2 ν ( ν + 1 ) / 2 ,
where ξ = ( t ν 1 ( u 1 ) , , t ν 1 ( u d ) ) .
Theorem 1 (Tail dependence).
The bivariate t-Copula with parameters ( ν , ρ ) has symmetric tail dependence
λ U = λ L = 2 t ν + 1 ( ν + 1 ) ( 1 ρ ) 1 + ρ > 0 for all ν < , ρ > 1 .
Figure 2, Figure 3 and Figure 4 illustrate the forward loss dynamics for each peril separately. Figure 2 shows the typhoon loss paths with jump-diffusion characteristics; Figure 3 highlights the co-movement with typhoon via the copula structure; and Figure 4 displays the contrasting drought dynamics with negative typhoon correlation.
Remark 1 (Time-varying copula extension).
The fixed copula extends naturally to a sliding-window estimator Σ t = [ sin ( π 2 τ ^ j k [ t w c , t ] ) ] j , k (see Section 4.7 for the empirical analysis). If Σ t is F t -predictable and uniformly bounded ( σ ̲ I d Σ t σ ¯ I d ), the Lipschitz bound in z becomes κ = η σ ¯ 1 / 2 and Theorems 2–3 and Theorem 4 continue to hold.
Definition 3 (Time-varying copula estimator).
The sliding-window t-copula with window length w c > 0 replaces the static correlation matrix Σ by the F t -predictable estimator
Σ t = sin π 2 τ ^ j k [ t w c , t ] j , k = 1 d ,
where τ ^ j k [ t w c , t ] denotes Kendall’s τ ^ estimated from observations in the window [ t w c , t ] , and the degrees-of-freedom parameter ν ^ t is re-estimated jointly over the same window by IFM. We say Σ t is uniformly bounded if σ ̲ I d Σ t σ ¯ I d holds uniformly in t for constants 0 < σ ̲ σ ¯ < .
Lemma 1 (Driver regularity under time-varying copula).
Suppose Σ t satisfies Definition 3 and is uniformly bounded with constants σ ̲ , σ ¯ . Then the driver f in (8), with Σ replaced pointwise by Σ t , satisfies Assumption 2 with
β = ψ ( 1 + H ¯ ) G ¯ + λ 1 , κ = η σ ¯ 1 / 2 .
Consequently, Theorems 2–4 all continue to hold; the convergence constant C in (19) now also depends on σ ¯ .
Proof. 
The Lipschitz bound in y is unchanged from Lemma 2. For the z-bound, replace Σ by Σ t σ ¯ uniformly in t, giving κ = η σ ¯ 1 / 2 . All remaining steps in the proofs of Theorems 2–4 carry through with this modified constant. □
Assumption 1.
The coefficients μ j 0 , σ j , γ j satisfy
(A1)
Lipschitz continuity: There exists K 1 > 0 such that for all t, , R + d , | μ j 0 ( t , ) μ j 0 ( t , ) | + | σ j ( t , ) σ j ( t , ) | K 1 | | .
(A2)
Linear growth: | μ j 0 ( t , ) | + | σ j ( t , ) | K 2 ( 1 + | | ) .
(A3)
Jump integrability: E 0 T R + | γ j ( t , L t , z ) | 2 ν J ( d z ) d t < .
Under Assumption 1, (1) has a unique strong solution L S 2 ( [ 0 , T ] ; R + d ) [33].

2.2. Driver Construction

The driver is assembled from four components. Step 1 (expected-loss baseline): the instantaneous net loss rate j = 1 d [ μ j 0 ( t , L t ) ϕ j G t ] . Step 2 (risk loading): the term η Z t Σ Z t bridges copula dependence and premium, reducing to the standard-deviation principle [29] when Σ = I . Step 3 (green-finance cost): implementation costs ψ ( 1 + H t ) G t · Y t yield the net marginal effect
f g = j = 1 d ϕ j + ψ ( 1 + H t ) Y t .
when j ϕ j > ψ ( 1 + H t ) Y t , higher green investment reduces the premium; otherwise it is counterproductive. Q-learning resolves this trade-off to yield G t . Step 4 (actuarial adequacy penalty): the penalty λ 1 | Y t y ^ t | prevents excessive deviation from the exogenous target
y ^ t = L ¯ [ t w , t ] · ( 1 + θ reg ) ,
where L ¯ [ t w , t ] is the trailing w-period average loss ratio and θ reg > 0 is the regulatory safety loading.
Assembling Steps 1–4:
f ( t , , y , z , g ) = j = 1 d [ μ j 0 ( t , ) ϕ j g ] Step   1 :   net   expected   loss + η z Σ z Step   2 :   risk   loading   + ψ ( 1 + H t )   g · y Step   3 :   implementation   cost λ 1 y y ^ t Step   4 :   adequacy   penalty .

2.3. BSDE Formulation of the Premium

Definition 4 (Premium BSDE).
The premium process ( Y t , Z t , U t ) t [ 0 , T ] solves
d Y t = f ( t , L t , Y t , Z t , G t ) d t Z t d W t R + d U t ( z ) N ˜ ( d t , d z ) , Y T = ξ ( L T ) ,
where Y t R is the premium, Z t R d is the diffusion-hedging process, U t : R + d R is the jump-hedging process, and ξ ( L T ) is the terminal claim.
In plain terms, the comparison theorem states that “worse inputs yield higher premiums”: if the terminal claim is larger or the instantaneous risk loading is higher under scenario (1) than under scenario (2), then the premium under scenario (1) dominates at every point in time. This is the BSDE analogue of the monotonicity axiom in coherent risk measures.

2.4. Existence and Uniqueness

Assumption 2.
The driver f satisfies
(B1)
Lipschitz in ( y , z ) : There exist β , κ > 0 such that for all t , , g , | f ( t , , y , z , g ) f ( t , , y , z , g ) | β | y y | + κ | z z | .
(B2)
Uniform bound in g: | f ( t , , 0 , 0 , g ) | K 3 ( 1 + | | 2 ) .
(B3)
Square-integrable terminal condition: ξ ( L T ) L 2 ( Ω , F T , P ) .
Lemma 2.
The driver (8) satisfies Assumption 2 with β = ψ ( 1 + H ¯ ) G ¯ + λ 1 and κ = η Σ 1 / 2 , where H ¯ = sup t H t and G ¯ = sup t G t .
Proof. 
Lipschitz in y: The terms depending on y are ψ ( 1 + H t ) g y and λ 1 | y y ^ t | . Hence,
f ( , y , z , g ) f ( , y , z , g ) ψ ( 1 + H t ) g ( y y ) + λ 1 | y y ^ t | | y y ^ t | ψ ( 1 + H ¯ ) G ¯ + λ 1 | y y | = β | y y | .
Lipschitz in z: We use the regularisation φ ε ( z ) = z Σ z + ε . For any z , z R d ,
| φ ε ( z ) φ ε ( z ) | | ( z z ) Σ ( z + z ) | φ ε ( z ) + φ ε ( z ) Σ 1 / 2 | z z | .
Taking ε 0 gives κ = η Σ 1 / 2 . □
Theorem 2 (Existence and uniqueness).
Under Assumptions 1 and 2, for any fixed admissible control G, the BSDE (9) has a unique adapted solution ( Y , Z , U ) S 2   ×   H 2   ×   J 2 .
Proof sketch.
The driver is globally Lipschitz in ( y , z ) by Lemma 2 and ξ L 2 by (B3); jumps are handled via [30,34]. The Picard iteration contracts under ( Y , Z , U ) μ 2 = E [ 0 T e μ t ( | Y t | 2 + | Z t | 2 + U t 2 ) d t ] for μ sufficiently large. □
In actuarial terms, Theorem 2 guarantees that for any given green-finance policy, a well-defined and unique premium process exists, leaving no ambiguity in the price assigned by the framework.

2.5. Comparison Theorem

Theorem 3 (Comparison).
Let ( Y ( i ) , Z ( i ) , U ( i ) ) , i = 1 , 2 ; solve BSDE (9) with drivers f ( i ) and terminal conditions ξ ( i ) . If ξ ( 1 ) ξ ( 2 ) a.s. and f ( 1 ) ( t , , y , z , g ) f ( 2 ) ( t , , y , z , g ) for all arguments a.s., then Y t ( 1 ) Y t ( 2 ) for all t [ 0 , T ] a.s.
Proof. 
Define δ Y = Y ( 1 ) Y ( 2 ) , δ Z = Z ( 1 ) Z ( 2 ) , δ U = U ( 1 ) U ( 2 ) . Then d ( δ Y t ) = δ f t d t δ Z t d W t δ U t ( z ) N ˜ ( d t , d z ) with δ Y T 0 . By (B1), δ f t a t δ Y t + b t δ Z t where | a t | β , | b t | κ . Applying Itô’s formula to e μ t ( δ Y t ) 2 with μ sufficiently large and using δ Y T = 0 yields δ Y t = 0 for all t. □
Corollary 1 (Actuarial monotonicity).
If Σ ( 1 ) Σ ( 2 ) in the Loewner order, then Y t ( 1 ) Y t ( 2 ) : the premium rises with dependence strength.
Remark 2 (Dependence ordering vs. tail dependence).
Corollary 1 establishes premium monotonicity with respect to the Loewner ordering of Σ , which controls the correlation component of dependence. The tail dependence coefficient (4) depends on both ρ and ν: increasing ρ (for fixed ν) increases both tail dependence and Σ in the Loewner order, so the corollary applies directly. However, decreasing ν (for fixed Σ ) increases tail dependence without changing Σ ; in this case, the premium increase operates through the magnitude of the risk-loading term η z Σ z rather than through the comparison theorem. Formally ordering premiums with respect to ν requires additional structure on the driver and is left for future work.
Remark 3 (Sensitivity to copula misspecification).
If the true dependence structure is not a t-copula, the stability estimate [30] gives E [ sup t | δ Y t | 2 ] C Σ ^ Σ 0 2 · E [ 0 T | Z t | 2 d t ] , so the monotonicity degrades gracefully. Section 4.6 quantifies this empirically.
Informally, the stability estimate ensures that small perturbations in the terminal claim or the driver produce only small changes in the premium, a continuity property essential for practitioners who must work with estimated parameters.

2.6. Practitioner Implementation Guide

To facilitate adoption, we provide a modular pipeline (Algorithm 1) with clearly defined input–output interfaces, implementable via standard libraries (scipy 1.11, PyTorch 2.1.0, numpy 1.24).
Module B can be replaced by ARIMA or ETS provided Condition (C1) holds; Theorem 4 still guarantees the same convergence rate with a larger constant C L . Section 4.4.2 quantifies the trade-off ( R 2 = 0.561 vs. LSTM’s 0.746 ).

2.7. Nonlinear Expectation Interpretation

Proposition 1 (g-Expectation representation).
For a fixed admissible control G, define g G ( t , y , z ) : = f ( t , L t , y , z , G t ) . Then Y t = E g G [ ξ ( L T ) F t ] .
The g-expectation structure yields four actuarial properties:
(i)
Monotonicity: the comparison theorem is the monotonicity of E g G .
(ii)
Risk loading: the nonlinearity η z Σ z yields E g G [ ξ | F t ] > E [ ξ | F t ] ; the excess is the risk premium, amplified by Σ .
(iii)
Stability as continuity: The stability estimate is the continuity of E g G in both terminal condition and generator [3].
(iv)
From g-expectation to Euler scheme: computing E g G [ ξ | F 0 ] at each discrete time requires evaluating g G , which needs Σ (copula), E k [ Δ L k ] (LSTM), and G k (Q-learning). The three ingredients of Section 3 are evaluations of g G at discrete times.
Remark 4 (Controlled nonlinear expectation).
Optimising over G t yields V 0 = inf G G ad E g G [ ξ ( L T ) ] , bridging BSDE theory and stochastic control.

2.8. Optimal Green Control

Proposition 2 (HJB characterisation).
Define v ( t , ) = inf G Y t G given L t = . Under Assumptions 1 and 2, v is the unique viscosity solution of
t v + inf g [ G ̲ , G ¯ ] L g v + f ( t , , v , σ v , g ) = 0 , v ( T , ) = ξ ( ) ,
where L g is the infinitesimal generator of L t under control g.
Proof. 
For controlled forward–backward SDEs with jump diffusions, the viscosity solution framework is established in [24] (continuous diffusions) and extended to Lévy-driven processes in [35]. The key requirements—Lipschitz driver, bounded controls, and square-integrable jumps—are verified in Assumptions 1 and 2. □
The control G t [ G ̲ , G ¯ ] is a green-finance adjustment factor: G t < 1 is a green discount, G t > 1 a surcharge, and G t = 1 neutral. The HJB Equation (10) determines G t by balancing j ϕ j against ψ G ( 1 + H t ) Y t ; Algorithm 2 approximates this in discrete time. The coefficients ϕ j = ( 0.12 , 0.09 , 0.06 ) are estimated by panel regression; ψ = 0.08 is calibrated to Zhejiang pilot data [36].

3. Euler Discretisation of the Forward–Backward System

Let π n = { 0 = t 0 < t 1 < < t n = T } be a uniform partition with mesh Δ t = T / n .

3.1. Ingredient (I): Dependence Estimation via the t-Copula

The copula parameters ( ν ^ , Σ ^ ) are estimated via IFM, with goodness-of-fit assessed by the Cramér–von Mises statistic:
S n = [ 0 , 1 ] d C ^ n ( u ) C ν ^ , Σ ^ ( u ) 2 d C ^ n ( u ) .
To address limited historical observations, the estimated copula and marginals are used to generate N aug = 1000 synthetic correlated loss sequences via Monte Carlo simulation. These augmented data supplement the historical sample for LSTM training in Ingredient (II), improving the conditional expectation approximation without introducing distributional assumptions beyond those already embedded in the copula model.
p ^ e = 1 N sim i = 1 N sim 1 j = 1 d L j ( i ) > S .
Outputs include Σ ^ (to driver risk loading and Ingredient III risk index H k ), F ^ j and ( ν ^ , Σ ^ ) (to Ingredient II for forward simulation and data augmentation).

3.2. Ingredient (II): Conditional Expectation Approximation via LSTM

The LSTM approximates the mapping
x k : = ( L t k w , , L t k , covariates k ) E ^ L , k LSTM E [ L t k + 1 F t k ] .
Training uses both historical observations ( N   ×   T = 110 ) and copula-augmented sequences from Ingredient (I) ( N aug = 1000 ), with augmented data weighted at 0.3 to prevent oversmoothing.
The train–test split is strictly temporal: 2014–2022 (99 city-year observations) for training, 2023 (11 observations) for testing. Hyperparameters are selected via 5-fold temporal cross-validation with expanding windows. Augmented observations from the fitted t-Copula are weighted at 0.3 relative to historical data, selected by cross-validation from { 0.1 , 0.2 , 0.3 , 0.5 , 1.0 } . The single-year test set limits statistical power; this constraint is partially mitigated by leave-one-city-out cross-validation (Section 4.8) and the cross-province experiment (Section 4.3).
The LSTM follows the standard gated architecture of Hochreiter and Schmidhuber [37] with two stacked layers, h = 64 hidden units, and dropout p = 0.2 .
Assumption 3 (Forward approximation accuracy).
E [ | E ^ L , k LSTM E [ Δ L k F t k ] | 2 ] C L Δ t for a constant C L > 0 .
Proposition 3 (Sufficient condition for (C1) under regularity).
Suppose the conditional expectation function φ ( t , ) = E [ L t + Δ t L t = ] satisfies φ C 1 , 2 ( [ 0 , T ]   ×   R + d ) with bounded derivatives, and suppose the approximator φ ^ θ satisfies a uniform approximation bound sup | φ ^ θ ( t , ) φ ( t , ) | ε approx . Then
E | E ^ L , k a p p r o x E [ Δ L k F t k ] | 2 C φ Δ t + 2 ε approx 2 ,
where C φ depends on the derivatives of φ and the forward SDE coefficients.
Proof. 
Apply Itô’s formula to φ ( s , L s ) over [ t k , t k + 1 ] : the remainder satisfies E [ | O ( Δ t ) | 2 ] C φ Δ t . Adding the approximation error via the triangle inequality yields (14). Hence (C1) holds with C L = C φ + 2 ε approx 2 / Δ t provided ε approx 2 = O ( Δ t ) . □
Remark 5.
Proposition 3 decomposes Condition (C1) into regularity of φ (guaranteed by Assumption 1 via standard SDE theory) and a uniform bound ε approx 2 = O ( Δ t ) , which follows from universal approximation theorems for feedforward networks. A rigorous extension to LSTMs remains open; the empirical verification in Section 4.5.3 confirms the required scaling in practice.
Outputs are R 0 , k LSTM = E ^ L , k LSTM / S k (to driver Step 1) and H k with copula-based tail risk (to Ingredient III MDP state).

3.3. Ingredient (III): Discrete HJB Solution via Q-Learning

The Euler discretisation of (10) yields the Bellman equation
Q ( s , a ) = E r ( s , a ) + γ max a Q ( s , a ) s , a ,
with state s k = ( I k , Z k , Q k , H k ) (where H k uses copula-based tail risk from (I) and loss prediction from (II)), action a k { 0.85 , 0.865 , , 1.15 } , γ = e r Δ t , and r k = w 1 | L k L ^ k | + w 2 Δ Z k w 3 Δ I k .
The Q-learning update
Q ( s k , a k ) Q ( s k , a k ) + α k r k + γ max a Q ( s k + 1 , a ) Q ( s k , a k )
converges to Q under Robbins–Monro conditions.
The implementation uses | S | = 200 states encoding ( I k , Z gr , k , Q k , H k ) and | A | = 21 actions in [ 0.85 , 1.15 ] , with discount factor γ = e r Δ t 0.95 and reward r k = w 1 | L k L ^ k | + w 2 Δ Z gr , k w 3 Δ I k . The full Q-learning procedure is summarised in Algorithm 2.
As a comparison, we also implement a Deep Q-Network (DQN) that replaces the tabular Q-function with a neural network Q θ ( s , a ) parameterised by θ . The network consists of two hidden layers with 64 units each and ReLU activations. The loss function is
L ( θ ) = E r k + γ max a Q θ ( s k + 1 , a ) Q θ ( s k , a k ) 2 ,
where θ denotes the target network parameters updated every 50 episodes. Experience replay with buffer size 5000 is used. The DQN operates on the continuous state s k = ( I k , Z gr , k , Q k , H k ) R 4 without discretisation.
Table 2 compares four RL algorithms on the same MDP. All achieve comparable variance reductions (43.1–44.3%; pairwise differences insignificant, p > 0.15 ). Tabular Q-learning is retained as the default for its convergence guarantee (Theorem 4), interpretability, and efficiency; the gap narrows with finer discretisation ( | S | = 500 ).

3.4. Assembling the Discrete Premium

The discrete premium rate is
R f , k ( dyn ) = R 0 , k LSTM Ingr . ( II )   ×   ( 1 + H k )   ×   G k QL Ingr . ( III ) ,
with loss-ratio feedback R f , k + 1 ( adj ) = R f , k ( dyn ) ( 1 + χ ( L k L ) / L ) , χ ( 0 , 1 ) .
This is the explicit Euler step for the backward variable Y (i.e., Y k n = E k [ Y k + 1 n ] + f ( t k , L t k n , Y k n , Z k n , G k QL ) Δ t ) with the three ingredients substituted.

3.5. Convergence of the Euler Scheme

Theorem 4 states that the discrete premium computed by the Euler scheme approaches the true continuous-time premium at rate O ( Δ t ) . The three conditions (C1)–(C3) quantify the approximation quality of each ingredient: (C1) bounds the LSTM error, (C2) bounds the Q-learning control error, and (C3) bounds the copula estimation error. When all three are controlled, the overall scheme inherits the classical half-order BSDE convergence rate.
Theorem 4 (Discrete-time convergence).
Under Assumptions 1 and 2 and where
(C1)
Assumption 3 holds (Ingredient II);
(C2)
Q-learning satisfies | G k Q L G k | ε Q with ε Q 2 C Q Δ t (Ingredient III);
(C3)
| ν ^ ν | + Σ ^ Σ   ε C for a constant ε C > 0 (Ingredient I);
there exists C > 0 independent of n such that
max 0 k n E [ | Y t k Y k n | 2 ] + k = 0 n 1 E [ | Z t k Z k n | 2 ] Δ t C Δ t + C ε C 2 .
Remark 6.
The O ( Δ t ) rate is rigorous under (C1)–(C3). (C2) and (C3) have full theoretical backing [38]; (C1) has partial theoretical support (Proposition 3) and strong empirical support (Section 4.5.3, p perm = 0.003 ).

3.6. Complete Proof of Theorem 4

We adapt Zhang’s [39] framework with explicit tracking of the three ingredient errors. Throughout, C denotes a generic constant depending on β , κ , T but independent of n.
  • Step 1: Error decomposition.
Define e k Y = Y t k Y k n , e k Z = Z t k Z k n . From the continuous BSDE integrated over [ t k , t k + 1 ] :
Y t k = E k [ Y t k + 1 ] + E k t k t k + 1 f ( s , L s , Y s , Z s , G s ) d s .
Subtracting the Euler recursion for Y k n :
e k Y = E k [ e k + 1 Y ] + E k t k t k + 1 f ( s , L s , Y s , Z s , G s ) f ( t k , L t k , Y t k , Z t k , G t k ) d s = : δ k disc + f ( t k , L t k , Y t k , Z t k , G t k ) f ( t k , L t k n , Y k n , Z k n , G k QL ) Δ t = : δ k approx .
  • Step 2: Bounding the approximation error.
By the Lipschitz property (B1) and the triangle inequality:
| δ k approx | β | e k Y | + κ | e k Z | Δ t + j ϕ j ( G t k G k QL ) Φ ε Q by ( C 2 ) Δ t + η Z t k Σ ^ Z t k Z t k Σ Z t k C Σ ^ Σ | Z t k | by ( C 3 ) Δ t + j [ μ j 0 ( t k , L t k ) E ^ L , k LSTM ] bounded by C L Δ t ( in L 2 ) by ( C 1 ) Δ t ,
where Φ = j ϕ j .
  • Step 3: Bounding the discretisation error.
Standard regularity of the BSDE solution [39] gives E [ | δ k disc | 2 ] C ( Δ t ) 2 , using the Itô isometry and the regularity E [ t k t k + 1 | Z s Z t k | 2 d s ] C Δ t 2 .
  • Step 4: Z-estimate.
From the martingale representation and the Euler approximation Z k n Δ t = E k [ Y k + 1 n Δ W k ] :
Z t k Δ t = E k t k t k + 1 Z s d s = E k [ Y t k + 1 Δ W k ] + R k Z , Z k n Δ t = E k [ Y k + 1 n Δ W k ] ,
where E [ | R k Z | 2 ] C ( Δ t ) 3 . Hence | e k Z | 2 Δ t C E k [ | e k + 1 Y | 2 ] ( Δ t ) 1 · E k [ | Δ W k | 2 ] + C ( Δ t ) 2 , which, using E k [ | Δ W k | 2 ] = d Δ t , gives E [ | e k Z | 2 ] C E [ | e k + 1 Y | 2 ] / Δ t + C Δ t .
  • Step 5: Gronwall recursion.
Squaring (20), taking expectations, and applying Young’s inequality:
E [ | e k Y | 2 ] ( 1 + C Δ t ) E [ | e k + 1 Y | 2 ] + C Δ t E [ | e k Z | 2 ] + C ( Δ t ) 2 + C C L ( Δ t ) 2 + C Φ 2 ε Q 2 ( Δ t ) 2 + C C C 2 n 1 ( Δ t ) 2 E [ | Z t k | 2 ] .
Substituting the Step 4 bound for E [ | e k Z | 2 ] :
E [ | e k Y | 2 ] ( 1 + C Δ t ) E [ | e k + 1 Y | 2 ] + C Δ t 2 ,
where C = C 1 + C L + Φ 2 C Q , using (C2): ε Q 2 C Q Δ t . The copula estimation error from (C3) contributes C ε C 2 E [ | Z t k | 2 ] ( Δ t ) 2 per step; summing over n steps yields an additional term C ε C 2 independent of Δ t .
Since e n Y = ξ ( L T ) ξ ( L T n ) = 0 (both use the same terminal condition on the partition), the discrete Gronwall lemma yields
max 0 k n E [ | e k Y | 2 ] C ( e C T 1 ) C · Δ t + C ε C 2 .
Summing the Z-bounds: k = 0 n 1 E [ | e k Z | 2 ] Δ t C Δ t + C ε C 2 .
This completes the proof of (19).

4. Empirical Application and Analysis

4.1. Application: Dynamic Premium Rate Determination

Table 3 reports the dynamic premium rates R f , k ( dyn ) for all eleven cities alongside static and officially filed rates for the 2023 policy year.
We stress that with N   ×   T = 110 observations and a single test year, the empirical results serve primarily to demonstrate the framework’s operational feasibility and internal consistency with the theoretical predictions, rather than to establish definitive actuarial superiority. A multi-province, multi-decade study would be needed for robust generalisability claims.
Three patterns emerge. First, premiums rise for high-risk coastal cities (Taizhou: + 0.89 % , Wenzhou: + 0.68 % ), addressing chronic underpricing [18]. Second, premiums fall slightly for low-risk inland cities (Huzhou: 0.09 % , Lishui: 0.08 % ). Third, all loss ratios lie within [ 0.6 , 0.8 ] , confirming solvency is maintained.
The full pipeline runs in approximately 15 min on a standard workstation, making it feasible for annual rate filings.

4.2. Data and Variables

We acknowledge that the cross-sectional dimension of eleven cities and the resulting 110 city-year observations constitute a modest sample. Consequently, the empirical findings should be interpreted as an illustration of the framework’s applicability rather than as evidence for broad generalisability. To partially mitigate this limitation, we supplement the analysis with bootstrap inference, leave-one-city-out cross-validation (Section 4.8), and a cross-province transfer experiment to Jiangxi Province (Section 4.3). Zhejiang is a national pilot for green finance reform with diverse agricultural risks [36]. Table 4 summarises the variable definitions and data sources, while Table 5 reports descriptive statistics for the key variables. Figure 5 visualises these distributional characteristics.
The training protocol follows Section 3.2.
To quantify the uncertainty arising from the small sample, we perform a block bootstrap with 2000 replications (block length = 3 years to preserve temporal dependence). For each replication, the full Euler scheme is re-estimated, yielding a bootstrap distribution of the variance reduction statistic.
In Table 6, the 95% bootstrap confidence interval for variance reduction is [ 38.2 % , 48.7 % ] , indicating that even under sampling uncertainty the improvement over the static baseline is substantial (lower bound > 35 % ).

4.3. Cross-Province Transfer Experiment

We conduct a transfer experiment on Jiangxi Province data (eight cities, 2016–2023, N J   ×   T J = 64 ) under three settings: (i) direct transfer; (ii) copula re-estimated, LSTM frozen; (iii) full re-estimation.
Table 7 shows that (i) retains 65% of Zhejiang performance, (ii) recovers 80%, and (iii) achieves 90%. The copula is the most geographically sensitive component; full local re-estimation is recommended.
Each parameter choice follows a transparent rationale. The risk-loading coefficient η = 0.15 and adequacy penalty λ 1 = 0.10 are selected by leave-one-city-out cross-validation to minimise the mean absolute deviation of the loss ratio from the target L = 0.70 . The green-finance implementation cost ψ = 0.08 is calibrated to match the dispersion observed in Zhejiang pilot data [36]. The loss-reduction coefficients ϕ j = ( 0.12 , 0.09 , 0.06 ) are estimated from peril-specific panel regressions, with typhoons showing the largest effect due to infrastructure resilience investments.

4.4. Implementation of the Euler Scheme

Model parameters are calibrated as follows: η = 0.15 and λ 1 = 0.10 by LOCO cross-validation; ψ = 0.08 matched to Zhejiang pilot dispersion [36]; ϕ j = ( 0.12 , 0.09 , 0.06 ) by peril-specific panel regression; θ reg = 0.25 from CBIRC guidelines; L = 0.70 , w = 3 years, and χ = 0.30 from regulatory standards and cross-validation; QL reward weights ( w 1 , w 2 , w 3 ) = ( 0.5 , 0.3 , 0.2 ) by grid search.

4.4.1. Ingredient (I): Copula Dependence Estimation

Marginal distributions (Table 8) all pass the K–S test at the 5% level.
Kendall’s τ (Table 9) shows the strongest positive dependence between typhoon and flood ( τ ^ = 0.428 ) and weak negative dependence between typhoon and drought (Figure 6). We emphasise that the following verification is a consistency check: we test whether the data exhibit the qualitative patterns predicted by the theory (premiums increasing with dependence strength), not a proof of the theorem itself, which holds under its stated mathematical assumptions.
The t-copula dominates all competitors (Table 10); Clayton and Frank are rejected at the 5% level.
The estimated Σ ^ and F ^ j are passed to Ingredient (II).

4.4.2. Ingredient (II): LSTM Conditional Expectation

Table 11 compares the LSTM against five benchmarks on the 2023 test set. The LSTM achieves the best performance across all metrics, and copula-based augmentation (“LSTM + aug”) further reduces RMSE by 5.6%.
To assess the statistical significance of the LSTM’s advantage over the next-best model (GRU), we perform a paired t-test on the city-level absolute errors: the LSTM achieves significantly lower MAE ( t = 2.41 , p = 0.036 , n = 11 ). The difference between LSTM and LSTM+aug is also significant ( t = 2.18 , p = 0.054 ), marginally at the 10% level, confirming that copula augmentation provides a meaningful improvement despite the small test set.
Sensitivity analysis over N aug { 0 , 200 , 500 , 1000 , 2000 } and w aug { 0.1 , 0.2 , 0.3 , 0.5 , 1.0 } identifies the optimal configuration ( N aug = 1000 , w aug = 0.3 ); augmentation consistently reduces the train–test gap, and the Friedman test remains significant ( p < 0.001 ) even without augmentation.

4.4.3. Ingredient (III): Q-Learning Green Optimisation

Q-learning converges after approximately 340 episodes (reward stabilising at 0.121 ± 0.016 ). Figure 7 shows the training reward curve. Table 12 reports the optimised green adjustment coefficients for each city.

4.5. Theoretical Verification

4.5.1. Comparison Theorem (Theorem 3)

As Table 13 shows, all ratios exceed 1, confirming Y ( Σ ) > Y ( I d ) and thus empirically validating the actuarial monotonicity predicted by Corollary 1.

4.5.2. Convergence Rate (Theorem 4)

Table 14 confirms the theoretical half-order rate: ratios of approximately 2 and a regression slope of 0.49 1 / 2 . Note that the 10-year panel permits n { 10 , 20 , 40 , 80 } without the LSTM forecast horizon mismatch present in shorter panels.
We emphasise that the results in Table 14 constitute an empirical illustration of the convergence rate predicted by Theorem 4, not a formal mathematical proof. The theorem holds under the stated assumptions (Assumptions 1 and 2 and Conditions (C1)–(C3)), and the empirical exercise demonstrates consistency between the theoretical prediction and observed behaviour on the Zhejiang dataset. Verification on other datasets and under distributional shift remains an open empirical question.

4.5.3. Verification of Condition (C1)

Condition (C1) requires MSE k C L Δ t . We test this scaling relationship formally. For each Δ t { 1.0 , 0.5 , 0.25 , 0.125 } years (the coarsest step Δ t = 1.0 is enabled by the extended 10-year panel), we compute the empirical MSE on the 2023 test set and define C ^ L = MSE / Δ t . If (C1) holds, then C ^ L should be approximately constant across Δ t .
We emphasise that this verification is empirical rather than a rigorous mathematical proof of Condition (C1) for LSTM architectures. Proposition 3 provides a partial theoretical justification, but the uniform approximation bound for recurrent networks on jump-diffusion paths remains an open theoretical question. The assumption may fail under substantial distributional shift or for substantially different data-generating processes.
As shown in Table 15, C ^ L ranges from 0.01134 to 0.01192 (mean 0.01163, coefficient of variation 2.4%), confirming remarkable stability. A linear regression of log ( MSE ) on log ( Δ t ) yields slope = 1.02 ( R 2 =   0.9997 ); the hypothesis H 0 (slope = 1 ) cannot be rejected ( t =   0.38 , p = 0.74 ). A permutation test ( 10 , 000 permutations) yields p perm = 0.003 , confirming linear scaling. Together with Proposition 3, these results support Condition (C1), though the assumption may fail under distributional shift.

4.6. Ablation and Robustness

Table 16 shows each component contributes significantly (Friedman p < 0.001 ); Table 17 confirms variance reduction exceeds 35% across all sensitivity and misspecification scenarios, with premium error proportional to Σ ^ Σ 0 F (Remark 3).

4.7. Time-Varying Copula Analysis

To address the concern that a fixed copula may not capture evolving climate dependence, we implement the sliding-window estimator of Definition 3 with w c { 3 , 5 , 7 } years and compare against the static ( w c = 10 ) baseline.
Table 18 reveals two findings. First, the typhoon–flood correlation increases over recent windows ( ρ ¯ rises from 0.612 to 0.658), consistent with intensifying compound flood events under climate change. Second, the 5-year window achieves the highest variance reduction (46.2%), improving upon the static baseline by 2.7 percentage points, though the 3-year window shows higher estimation variance. These results confirm that dynamic copula estimation provides a meaningful improvement and validate the theoretical extension of Lemma 1. For the remaining analyses we retain the static copula as the conservative baseline, but report the 5-year dynamic results as a robustness check.

4.8. Robustness: Leave-One-City-Out

As a further robustness check, we perform leave-one-city-out (LOCO) cross-validation, re-estimating the full Euler scheme while sequentially holding out each city from training. Table 19 shows that the LOCO results ( 42.4 ± 1.3 % ) are stable and close to the full-sample estimate (43.5%), with no single city driving the overall performance.

4.9. Green Finance and Regional Heterogeneity

To explore the associative relationship between green-finance variables and the premium, we estimate a pooled OLS regression of the log premium on green-finance and control variables. We stress that this analysis is correlational: the small panel ( N   ×   T = 110 ) and potential omitted-variable bias preclude causal interpretation. Table 20 reports the results. Both green finance development (coefficient 0.17 , p < 0.001 ) and green technology adoption (coefficient 0.21 , p < 0.001 ) exhibit statistically significant negative associations with premiums, while carbon emissions (coefficient 0.13 , p = 0.002 ) show a positive association. These patterns are consistent with the broader green-finance literature [13,14,32,40], though establishing causality would require instrumental-variable or natural-experiment designs beyond the scope of this study.
Table 21 reveals substantial heterogeneity: the effect of green finance is strongest in developed cities and weakest in less-developed cities. The Chow test ( F = 5.13 , p = 0.002 ) confirms that the regional coefficients are statistically distinct. Figure 8 illustrates the regional variation in green-finance coefficients.
A Sobel mediation test yields z = 3.12 , p = 0.002 , suggesting that approximately 41% of the association is mediated through technology adoption. We interpret this as suggestive evidence of a mediation pathway rather than a causal demonstration, given the observational nature of the data and the small sample size. Within the BSDE framework, the theoretical mechanism is that green finance increases the drift-reduction parameters ϕ j through technology adoption, reducing the forward drift and hence the premium (see the trade-off condition (6)).

5. Discussion

The key structural distinction from existing methods is that static copula models [9], machine-learning forecasters [2,37], and reinforcement-learning controllers [23] each address one facet of the pricing problem in isolation. Setting η = 0 in the driver recovers the linear pricing of copula-only approaches; removing the LSTM eliminates dynamic forecasting; removing Q-learning eliminates policy optimisation. In the proposed framework all three are components of a single Euler scheme for a controlled g-expectation, unified by the convergence theorem rather than assembled ad hoc. The premium ordering of Corollary 1 is with respect to the Loewner order on Σ ; the separate role of ν in tail dependence is discussed in Remark 2.
A natural multi-agent extension leads to mean-field games, where each insurer’s BSDE driver depends on the empirical premium distribution; as the number of agents grows the equilibrium converges to a McKean–Vlasov BSDE [41], complementing game-theoretic [26,27], subsidy-design [19], and adverse-selection [28] perspectives with a continuous-time stochastic-control foundation.
Several limitations should be acknowledged: the sliding-window copula improves variance reduction by 2.7 pp (Section 4.7); Condition (C1) has partial theoretical and strong empirical support (Proposition 3, p perm = 0.003 ); copula misspecification degrades gracefully; DQN offers marginal gains at 8 × cost; and Algorithm 1 provides a modular pipeline with graceful degradation to ARIMA. Key open problems are rigorous approximation bounds for LSTMs on jump-diffusion paths, fully parametric dynamic copulas [42], and semiparametric marginals to strengthen copula identifiability.
The cross-sectional dimension of eleven cities remains modest. The bootstrap analysis (Table 6) and cross-province transfer experiment (Table 7) quantify these uncertainties. Nevertheless, extending the framework to a national-scale panel would strengthen external validity and allow finer regional stratification.

6. Conclusions

This paper develops a unified BSDE framework for multi-peril agricultural insurance pricing. Three progressive results adapt known BSDE theory to compound-dependence and policy-control settings: existence and uniqueness (Theorem 2), comparison and monotonicity (Theorem 3, Corollary 1), and half-order Euler convergence (Theorem 4), with copula, LSTM, and Q-learning as sequential components of a single discretisation. Developed cities receive green discounts while less-developed cities see negligible reductions, supporting targeted green-finance policies [15,32,40]. Extensions include fully parametric dynamic copulas [42], national-scale panels via deep BSDE solvers [25], and higher-dimensional peril structures ( d > 3 ), where the theory extends without modification but tabular Q-learning requires replacement by DQN or deep BSDE methods to avoid the curse of dimensionality.

Author Contributions

Conceptualization, Y.P., J.Z. and Y.C.; methodology, Y.P., J.Z. and J.L.; software, Y.P., Q.C. and Z.L.; validation, Y.P., J.Z., J.L. and Y.Z.; formal analysis, Y.P., J.Z. and Q.T.; investigation, Y.P., Y.C., Q.C. and X.L.; resources, J.Z., J.L. and Q.T.; data curation, Y.P., Q.C., Z.L. and X.L.; writing—original draft, Y.P., Y.C. and Z.L.; writing—review and editing, J.Z., J.L., Y.Z. and Q.T.; visualization, Y.P., Q.C., Z.L. and X.L.; supervision, J.Z. and J.L.; project administration, J.Z., Y.C. and Q.T.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Zhejiang Provincial Xinmiao Talent Program (Grant Nos. 2025R405A030, 2025R405A039) and supported by the National College Students’ Innovation and Entrepreneurship Training Program (Project: Research on Agricultural Insurance Pricing Mechanism under Green Finance Policy Based on Multi-hazard Joint Modeling and Distributed Intelligent Actuarial System). This research was also supported by the Project under Grant No. 25TJJD12. The corresponding author’s research was supported by the Zhejiang Provincial Natural Science Foundation of China under Grant Nos. LQ23A010012 and LZ26A010003, and the Ningbo Natural Science Foundation under Grant No. 2024J193.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Goodwin, B.K.; Hungerford, A. Copula-based models of systemic risk in U.S. agriculture: Implications for crop insurance and reinsurance contracts. Am. J. Agric. Econ. 2015, 97, 879–896. [Google Scholar] [CrossRef]
  2. Liu, Y.; Ker, A.P. Simultaneous borrowing of information across space and time for pricing insurance contracts: An application to rating crop insurance policies. J. Risk Insur. 2021, 88, 231–257. [Google Scholar] [CrossRef]
  3. Peng, S. Nonlinear expectations, nonlinear evaluations and risk measures. In Stochastic Methods in Finance; Springer: Berlin/Heidelberg, Germany, 2004; pp. 165–253. [Google Scholar]
  4. Pardoux, É.; Peng, S. Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 1990, 14, 55–61. [Google Scholar] [CrossRef]
  5. Liang, Z.; Yuen, K.C. Optimal dynamic reinsurance with dependent risks: Variance premium principle. Scand. Actuar. J. 2016, 2016, 18–36. [Google Scholar] [CrossRef]
  6. Cohen, S.N.; Elliott, R.J. A general theory of finite state backward stochastic difference equations. Stoch. Process. Appl. 2010, 120, 442–466. [Google Scholar] [CrossRef][Green Version]
  7. Zscheischler, J.; Westra, S.; van den Hurk, B.J.J.M.; Seneviratne, S.I.; Ward, P.J.; Pitman, A.; AghaKouchak, A.; Bresch, D.N.; Leonard, M.; Wahl, T.; et al. Future climate risk from compound events. Nat. Clim. Change 2018, 8, 469–477. [Google Scholar] [CrossRef]
  8. Carter, M.R.; Cheng, L.; Sarris, A. Where and how index insurance can boost the adoption of improved agricultural technologies. J. Dev. Econ. 2016, 118, 59–71. [Google Scholar] [CrossRef]
  9. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  10. Azahra, A.S.; Johansyah, M.D.; Sukono. The development of fractional Black–Scholes model solution using the Daftardar-Gejji Laplace method for determining rainfall index-based agricultural insurance premiums. Mathematics 2025, 13, 1725. [Google Scholar] [CrossRef]
  11. Necula, C. Option Pricing in a Fractional Brownian Motion Environment; Bucharest University of Economics, Center for Advanced Research in Finance and Banking: Bucharest, Romania, 2002; pp. 1–18. [Google Scholar]
  12. He, X.J.; Lin, S. A fractional Black–Scholes model with stochastic volatility. Expert Syst. Appl. 2021, 178, 114983. [Google Scholar] [CrossRef]
  13. Zhang, D.; Zhang, Z.; Managi, S. A bibliometric analysis on green finance: Current status, development, and future directions. Financ. Res. Lett. 2019, 29, 425–430. [Google Scholar] [CrossRef]
  14. Lee, C.-C.; Lee, C.-C. How does green finance affect green total factor productivity? Evidence from China. Energy Econ. 2022, 107, 105863. [Google Scholar] [CrossRef]
  15. Zhou, X.; Tang, X.; Zhang, R. Impact of green finance on economic development and environmental quality: A study based on provincial panel data from China. Environ. Sci. Pollut. Res. 2020, 27, 19915–19932. [Google Scholar] [CrossRef]
  16. Miao, C.; Fang, D.; Sun, L.; Luo, Q. Natural resources utilization efficiency under the influence of green technological innovation. Resour. Conserv. Recycl. 2017, 126, 153–161. [Google Scholar] [CrossRef]
  17. Hazell, P.; Anderson, J.; Balzer, N.; Hastrup Clemmensen, A.; Hess, U.; Rispoli, F. The Potential for Scale and Sustainability in Weather Index Insurance for Agriculture and Rural Livelihoods; IFAD/WFP: Rome, Italy, 2010. [Google Scholar]
  18. Goodwin, B.K.; Smith, V.H. What harm is done by subsidizing crop insurance? Am. J. Agric. Econ. 2013, 95, 489–497. [Google Scholar] [CrossRef]
  19. Mahul, O.; Stutley, C.J. Government Support to Agricultural Insurance; World Bank: Washington, DC, USA, 2010. [Google Scholar]
  20. Liu, S.; Jiang, R.; Liu, L.; Chan, F.T.S. A three-way decision and DEA game cross-efficiency hybrid approach to the procurement mode selection in contract farming. Socio-Econ. Plan. Sci. 2025, 102, 102336. [Google Scholar] [CrossRef]
  21. Xie, X.; Hu, Y.; Li, X.; Li, S.; Li, X.; Li, Y. Assessing and improving the food security of China in the climate change. Systems 2025, 13, 1054. [Google Scholar] [CrossRef]
  22. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  23. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  24. Pham, H. Continuous-Time Stochastic Control and Optimization with Financial Applications; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  25. Han, J.; Jentzen, A.; E, W. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, 8505–8510. [Google Scholar] [CrossRef] [PubMed]
  26. Bourgeon, J.-M.; Chambers, R.G. Optimal area-yield crop insurance reconsidered. Am. J. Agric. Econ. 2003, 85, 590–604. [Google Scholar] [CrossRef]
  27. Aase, K.K. Equilibrium in a reinsurance syndicate; existence, uniqueness and characterization. ASTIN Bull. 1993, 23, 185–211. [Google Scholar] [CrossRef]
  28. Rothschild, M.; Stiglitz, J.E. Equilibrium in competitive insurance markets. Q. J. Econ. 1976, 90, 629–649. [Google Scholar] [CrossRef]
  29. El Karoui, N.; Peng, S.; Quenez, M.C. Backward stochastic differential equations in finance. Math. Financ. 1997, 7, 1–71. [Google Scholar] [CrossRef]
  30. Barles, G.; Buckdahn, R.; Pardoux, É. Backward stochastic differential equations and integral-partial differential equations. Stoch. Stoch. Rep. 1997, 60, 57–83. [Google Scholar] [CrossRef]
  31. Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  32. Wang, Y.; Zhi, Q. The role of green finance in environmental protection: Two aspects of market mechanism and policies. Energy Procedia 2016, 104, 311–316. [Google Scholar] [CrossRef]
  33. Applebaum, D. Lévy Processes and Stochastic Calculus, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  34. Tang, S.; Li, X. Necessary conditions for optimal control of stochastic systems with random jumps. SIAM J. Control Optim. 1994, 32, 1447–1475. [Google Scholar] [CrossRef]
  35. Royer, M. Backward stochastic differential equations with jumps and related non-linear expectations. Stoch. Process. Appl. 2006, 116, 1358–1376. [Google Scholar] [CrossRef]
  36. Zhejiang Bureau of Statistics. Zhejiang Statistical Yearbook 2023; China Statistics Press: Beijing, China, 2023.
  37. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  38. Even-Dar, E.; Mansour, Y. Learning rates for Q-learning. J. Mach. Learn. Res. 2003, 5, 1–25. [Google Scholar]
  39. Zhang, J. A numerical scheme for BSDEs. Ann. Appl. Probab. 2004, 14, 459–488. [Google Scholar] [CrossRef]
  40. Flammer, C. Corporate green bonds. J. Financ. Econ. 2021, 142, 499–516. [Google Scholar] [CrossRef]
  41. Carmona, R.; Delarue, F. Probabilistic Theory of Mean Field Games with Applications; Springer: Cham, Switzerland, 2018; Volumes I–II. [Google Scholar]
  42. Patton, A.J. Modelling asymmetric exchange rate dependence. Int. Econ. Rev. 2006, 47, 527–556. [Google Scholar] [CrossRef]
Figure 1. Schematic of the modelling pipeline. Arrows indicate information flow; dashed arrows indicate feedback. Each module corresponds to one ingredient of the Euler discretisation (Section 3).
Figure 1. Schematic of the modelling pipeline. Arrows indicate information flow; dashed arrows indicate feedback. Each module corresponds to one ingredient of the Euler discretisation (Section 3).
Mathematics 14 01043 g001
Figure 2. Monte Carlo simulated loss paths for the typhoon peril ( j = 1 ) from the forward SDE (1). The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The shaded region represents the 90% confidence band from N sim = 10,000 simulations.
Figure 2. Monte Carlo simulated loss paths for the typhoon peril ( j = 1 ) from the forward SDE (1). The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The diffusion component captures seasonal volatility while the jump component models sudden catastrophic events. The shaded region represents the 90% confidence band from N sim = 10,000 simulations.
Mathematics 14 01043 g002
Figure 3. Monte Carlo simulated loss paths for the flood peril ( j = 2 ). Flood losses exhibit the highest correlation with typhoon losses ( τ ^ = 0.428 , Section 4.4.1), visible as co-movement in the jump arrival times. The shaded region represents the 90% confidence band from N sim = 10,000 simulations.
Figure 3. Monte Carlo simulated loss paths for the flood peril ( j = 2 ). Flood losses exhibit the highest correlation with typhoon losses ( τ ^ = 0.428 , Section 4.4.1), visible as co-movement in the jump arrival times. The shaded region represents the 90% confidence band from N sim = 10,000 simulations.
Mathematics 14 01043 g003
Figure 4. Monte Carlo simulated loss paths for the drought peril ( j = 3 ). Drought losses are characterised by smaller but more persistent deviations, with negative correlation to typhoon ( τ ^ = 0.142 ), reflecting the climatic opposition between excessive rainfall and water deficit.
Figure 4. Monte Carlo simulated loss paths for the drought peril ( j = 3 ). Drought losses are characterised by smaller but more persistent deviations, with negative correlation to typhoon ( τ ^ = 0.142 ), reflecting the climatic opposition between excessive rainfall and water deficit.
Mathematics 14 01043 g004
Figure 5. Distributional characteristics of the three peril loss ratios ( N   ×   T = 110 ). (a) Box plots with jittered individual observations; diamonds indicate means and horizontal bars indicate medians. (b) Kernel density estimates with dashed lines marking means and dotted lines marking medians.
Figure 5. Distributional characteristics of the three peril loss ratios ( N   ×   T = 110 ). (a) Box plots with jittered individual observations; diamonds indicate means and horizontal bars indicate medians. (b) Kernel density estimates with dashed lines marking means and dotted lines marking medians.
Mathematics 14 01043 g005
Figure 6. Pairwise dependence structure among the three perils.
Figure 6. Pairwise dependence structure among the three perils.
Mathematics 14 01043 g006
Figure 7. Q-learning training reward over 500 episodes.
Figure 7. Q-learning training reward over 500 episodes.
Mathematics 14 01043 g007
Figure 8. Regional heterogeneity in green-finance effects on the premium. Point estimates and 95% confidence intervals of the regression coefficients for green finance development (Q), green technology adoption ( Z gr ), and carbon emissions (I) across three regional groups (Developed, Intermediate, Less Developed). Negative coefficients (green region) indicate premium reduction; positive coefficients (red region) indicate premium increase. Significance levels: *** p < 0.01, ** p < 0.05, * p < 0.10.
Figure 8. Regional heterogeneity in green-finance effects on the premium. Point estimates and 95% confidence intervals of the regression coefficients for green finance development (Q), green technology adoption ( Z gr ), and carbon emissions (I) across three regional groups (Developed, Intermediate, Less Developed). Negative coefficients (green region) indicate premium reduction; positive coefficients (red region) indicate premium increase. Significance levels: *** p < 0.01, ** p < 0.05, * p < 0.10.
Mathematics 14 01043 g008
Table 1. Comparison with representative existing studies.
Table 1. Comparison with representative existing studies.
StudyBSDENonlinear Exp.Tail Dep.DynamicForecastControlGreen Fin.
El Karoui et al. [29]
Goodwin & Hungerford [1]
Azahra et al. [10]
Miao et al. [16]
Liu & Ker [2]
This paper
Note: ✓ indicates the feature is present in the study.
Table 2. Comparison across RL algorithms.
Table 2. Comparison across RL algorithms.
Algorithm | S | Var. Red. (%)Mean L | L L | Train (s)Conv. Guar.
Tabular QL20043.50.6780.03112Yes
Tabular QL50043.90.6760.03028Yes
DQNcont.44.30.6730.02897No
REINFORCEcont.43.10.6810.03345No
A2Ccont.44.00.6750.02968No
Table 3. Final dynamic premium rates vs. static and filed rates (2023 policy year, %).
Table 3. Final dynamic premium rates vs. static and filed rates (2023 policy year, %).
City R f ( stat ) R f ( dyn ) R f ( filed ) Δ ( dyn - stat ) Loss Ratio
Hangzhou2.783.183.05+0.400.672
Ningbo3.213.523.38+0.310.685
Jiaxing2.432.582.51+0.150.661
Shaoxing2.512.722.63+0.210.668
Huzhou2.142.052.18−0.090.643
Wenzhou4.054.734.42+0.680.701
Jinhua2.322.562.45+0.240.665
Taizhou5.025.915.53+0.890.712
Zhoushan3.453.783.61+0.330.689
Quzhou1.861.891.92+0.030.648
Lishui1.621.541.68−0.080.639
Mean2.853.133.03+0.280.678
Table 4. Variable definitions.
Table 4. Variable definitions.
TypeVariableSymbolUnitSource
DependentPremium/Final RateP, R f yuan/%Model
ExplanatoryCultivated AreaAmuAgri. Dept.
Market Price M p yuan/tonNBS
Agricultural Productivity Y p %Yearbook
Carbon EmissionsIton/muGovt. reports
RiskTyphoon/Flood/Drought loss ratios L typ , L fld , L drt %Insurance
Green Fin.Green Finance DevelopmentQbn yuanFin. reports
Green Technology Adoption Z gr %Surveys
Table 5. Descriptive statistics ( N   ×   T = 110 ).
Table 5. Descriptive statistics ( N   ×   T = 110 ).
Var.MeanStdMinMedMaxSkew
L typ 19.113.40.815.862.31.28
L fld 16.210.51.613.948.71.02
L drt 8.76.10.57.427.31.12
I0.450.170.150.420.820.52
Z gr 21.812.53.220.448.70.45
Q11.38.90.88.735.60.87
Table 6. Bootstrap confidence intervals for key metrics.
Table 6. Bootstrap confidence intervals for key metrics.
MetricPoint Estimate95% Bootstrap CIBootstrap SE
Variance reduction (%)43.5[38.2, 48.7]2.68
Mean loss ratio0.678[0.651, 0.706]0.014
| L L | 0.031[0.022, 0.041]0.005
Copula ν ^ 5.41[3.62, 8.37]1.21
ρ ^ Typ Fld 0.612[0.483, 0.728]0.063
Table 7. Cross-province transfer to Jiangxi Province.
Table 7. Cross-province transfer to Jiangxi Province.
SettingVar. Red. (%)Mean L R 2 (LSTM)
(i) Direct transfer28.40.7310.512
(ii) Copula re-estimated34.70.7080.512
(iii) Full re-estimation39.20.6940.681
Zhejiang (reference)43.50.6780.774
Table 8. Marginal distribution fitting.
Table 8. Marginal distribution fitting.
PerilDistributionParametersK–SpAIC
TyphoonLog-normal μ = 1.78 , σ = 0.96 0.0740.641 68.5
FloodGamma α = 2.21 , β = 0.069 0.0890.487 62.1
DroughtWeibull k = 1.42 , λ = 0.118 0.0810.553 57.3
Table 9. Pairwise dependence ( ν ^ = 5.41 , 95% CI [ 3.62 , 8.37 ] ).
Table 9. Pairwise dependence ( ν ^ = 5.41 , 95% CI [ 3.62 , 8.37 ] ).
Peril Pair τ ^ ρ ^ λ ^ p
Typhoon–Flood0.4280.6120.3580.001
Typhoon–Drought−0.142−0.2210.0520.108
Flood–Drought0.3010.4520.1970.012
Table 10. Copula model comparison.
Table 10. Copula model comparison.
CopulaLog-LikAIC S n (p)Tail
t ( ν ^ = 5.41)21.83 37.66 0.027 (0.523)Symmetric
Gaussian16.45 28.90 0.052 (0.148)None
Clayton12.78 21.56 0.081 (0.028)Lower
Frank10.34 16.68 0.098 (0.014)None
Table 11. Loss-ratio prediction performance (test set: 2023).
Table 11. Loss-ratio prediction performance (test set: 2023).
ModelRMSEMAE R 2 MAPE%Rank
Static Mean0.11870.09520.43817.826
ARIMA (2, 1, 1)0.10480.08210.56114.935
ETS0.10120.07930.59214.184
SVR (RBF)0.09380.07240.64912.673
GRU0.09050.06910.67312.012
LSTM0.07980.06080.74610.521
LSTM + aug0.07530.05740.7749.87
Note: Bold values indicate the best-performing model(s).
Table 12. Green adjustment coefficients.
Table 12. Green adjustment coefficients.
City G ( stat ) G ( QL ) Δ G Region
Hangzhou0.970.92−0.05Developed
Ningbo0.960.91−0.05Developed
Jiaxing0.980.95−0.03Developed
Shaoxing0.990.95−0.04Developed
Huzhou0.970.93−0.04Developed
Wenzhou1.021.01−0.01Interm.
Jinhua1.031.02−0.01Interm.
Taizhou1.051.04−0.01Interm.
Zhoushan1.010.98−0.03Interm.
Quzhou1.071.06−0.01Less Dev.
Lishui1.091.08−0.01Less Dev.
Table 13. Verification of comparison theorem.
Table 13. Verification of comparison theorem.
City R f ( Σ ) (%) R f ( I d ) (%)Ratio
Hangzhou3.182.721.17
Ningbo3.523.011.17
Taizhou5.914.971.19
Wenzhou4.734.021.18
Lishui1.541.401.10
Zhoushan3.783.231.17
Table 14. Empirical convergence rate.
Table 14. Empirical convergence rate.
Δ t (Years)nRMSE ( Y ) Ratio
1.000100.0398
0.500200.02761.44
0.250400.01921.44
0.125800.01381.39
Regression: log ( RMSE ) = 0.49 log ( Δ t ) + c R 2 = 0.998
Table 15. Verification of Condition (C1): the ratio C ^ L = MSE / Δ t is stable across four time-step sizes, supporting the linear scaling MSE = O ( Δ t ) .
Table 15. Verification of Condition (C1): the ratio C ^ L = MSE / Δ t is stable across four time-step sizes, supporting the linear scaling MSE = O ( Δ t ) .
Δ t (yr)Empirical MSE C ^ L 95% Bootstrap CI C ^ L Rel. Dev. (%)
1.0000.011890.01189[0.0091, 0.0158]+2.8
0.5000.005670.01134[0.0087, 0.0148]−1.9
0.2500.002980.01192[0.0092, 0.0155]+3.1
0.1250.001420.01136[0.0085, 0.0152]−1.7
Table 16. Ablation study.
Table 16. Ablation study.
Config.Prem. Var.Mean LVar. Red. | L L | Wilcoxon p
(i) Static 5.62   ×   10 7 0.7210.085
(ii) +Copula 4.91   ×   10 7 0.70812.6%0.0720.018
(iii) +Cop.+LSTM 3.58   ×   10 7 0.69136.3%0.048<0.001
(iv) Full 3.17   ×   10 7 0.67843.5%0.031<0.001
Table 17. Sensitivity and misspecification robustness.
Table 17. Sensitivity and misspecification robustness.
AnalysisConfigurationVar. Red. (%)Note
SensitivityBaseline ( γ = 0.95, h = 64, ν = 5.41)43.5
Worst ( γ = 0.90, h = 32, ν = 7.41)35.6
Best ( γ = 0.99, h = 128, ν = 3.41)47.0
Misspecificationt (correct)43.5Mono. Yes (11/11)
Gaussian ( δ Σ = 0.048)41.8Yes (11/11)
Clayton ( δ Σ = 0.127)38.2Yes (10/11)
Frank ( δ Σ = 0.083)40.1Yes (11/11)
Mixture ( δ Σ = 0.061)42.3Yes (11/11)
Table 18. Static vs. time-varying copula comparison.
Table 18. Static vs. time-varying copula comparison.
Window w c (Years) ν ¯ t ρ ¯ Typ Fld Var. Red. (%)Mean L
10 (static)5.410.61243.50.678
7 5.23 ± 0.41 0.624 ± 0.038 44.80.674
5 5.07 ± 0.68 0.641 ± 0.057 46.20.669
3 4.89 ± 1.12 0.658 ± 0.093 45.10.672
Table 19. Leave-one-city-out robustness.
Table 19. Leave-one-city-out robustness.
City Held OutVar. Red. (%)Mean L | L L |
Hangzhou41.30.6860.035
Ningbo42.50.6820.033
Taizhou39.80.6970.039
Wenzhou40.70.6910.037
Jinhua43.10.6760.031
Shaoxing42.80.6780.031
Jiaxing43.30.6770.030
Huzhou43.50.6750.030
Lishui44.00.6720.029
Quzhou43.70.6740.029
Zhoushan41.80.6840.034
Mean ± Std 42.4 ± 1.3 0.681 ± 0.008 0.033 ± 0.003
Table 20. Regression (dependent: ln R f ; N   ×   T = 110 ).
Table 20. Regression (dependent: ln R f ; N   ×   T = 110 ).
VariableCoeffSEtpVIF
Q−0.170.04−4.250.0001.78
Z gr −0.210.04−5.250.0001.63
Y p −0.100.03−3.330.0011.31
I0.130.043.250.0021.85
A−0.140.03−4.670.0001.42
D−0.070.02−3.500.0011.11
R 2 = 0.703 , Adj. R 2 = 0.686 F = 40.5 , p < 0.001 DW = 1.91
Table 21. Regional heterogeneity (Chow test: F = 5.13 , p = 0.002 ).
Table 21. Regional heterogeneity (Chow test: F = 5.13 , p = 0.002 ).
DevelopedIntermediateLess Dev.
Var β p β p β p
Q−0.24 ***0.000−0.11 **0.038 0.07 0.112
Z gr −0.29 ***0.000−0.14 ***0.002−0.09 *0.058
I0.07 **0.0410.14 ***0.0010.17 ***0.000
*** p < 0.01, ** p < 0.05, * p < 0.10.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pei, Y.; Zhao, J.; Chen, Y.; Li, J.; Chen, Q.; Liu, Z.; Li, X.; Zhai, Y.; Tang, Q. Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics 2026, 14, 1043. https://doi.org/10.3390/math14061043

AMA Style

Pei Y, Zhao J, Chen Y, Li J, Chen Q, Liu Z, Li X, Zhai Y, Tang Q. Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics. 2026; 14(6):1043. https://doi.org/10.3390/math14061043

Chicago/Turabian Style

Pei, Yunjiao, Jun Zhao, Yankai Chen, Jianfeng Li, Qiaoting Chen, Zichen Liu, Xiyan Li, Yifan Zhai, and Qi Tang. 2026. "Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning" Mathematics 14, no. 6: 1043. https://doi.org/10.3390/math14061043

APA Style

Pei, Y., Zhao, J., Chen, Y., Li, J., Chen, Q., Liu, Z., Li, X., Zhai, Y., & Tang, Q. (2026). Dynamic Pricing of Multi-Peril Agricultural Insurance via Backward Stochastic Differential Equations with Copula Dependence and Reinforcement Learning. Mathematics, 14(6), 1043. https://doi.org/10.3390/math14061043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop