Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China

Xuan, Shuai; Wang, Jianming; Chen, Yuling

doi:10.3390/f14122456

Open AccessArticle

Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China

by

Shuai Xuan

¹

,

Jianming Wang

^1,*

and

Yuling Chen

²

¹

School of Mathematics and Computer Science, Dali University, Dali 671003, China

²

Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(12), 2456; https://doi.org/10.3390/f14122456

Submission received: 30 November 2023 / Revised: 14 December 2023 / Accepted: 15 December 2023 / Published: 17 December 2023

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Forestry)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aiming to enhance the efficiency and precision of multi-objective optimization in southwestern secondary growth of Pinus yunnanensis forests, this study integrated spatial and non-spatial structural indicators to establish objective functions and constraints for assessing forest structure. Felling decisions were made using the random selection method (RSM), Q-value method (QVM), and V-map method (VMM). Actions taken to optimize the forest stand structure (FSS) through tree selection were approached as decisions by a reinforcement learning (RL) agent. Leveraging RL’s trial-and-error strategy, we continually refined the agent’s decision-making process, applying it to multi-objective optimization. Simulated felling experiments conducted across circular sample plots (P1–P4) compared RL, Monte Carlo (MC), and particle swarm optimization (PSO) in FSS optimization. Notable enhancements in the values of the objective function (VOFs) were observed across all plots. RL-based strategies exhibited improvements, achieving VOF increases of 17.24%, 44.92%, 34.66%, and 17.10% for P1–P4, respectively, outperforming MC-based (10.73%, 41.54%, 30.39%, and 15.07%, respectively) and PSO-based (14.08%, 37.78%, 26.17%, and 16.23%, respectively) approaches. The hybrid M7 scheme, integrating RL with the RSM, consistently outperformed other schemes across all plots, yielding an average 26.81% increase in VOF compared to the average enhancement of all schemes (17.42%). This study significantly advances the efficacy and precision of multi-objective optimization strategies for Pinus yunnanensis secondary forests, emphasizing RL’s superior optimization performance, particularly when combined with the RSM, highlighting its potential for optimizing sustainable forest management strategies.

Keywords:

reinforcement learning methods; Voronoi diagram; forest stand structure; multi-objective optimization

1. Introduction

Pinus yunnanensis holds a dominant position in the coniferous landscapes of southwestern China, establishing itself as the primary forest type within Yunnan Province. Widely cultivated for reforestation and ecological engineering endeavors, this species has garnered significant attention [1]. Stretching across the vast expanse of southwestern China, it swathes approximately 52% of forested terrain and contributes to 32% of the overall timber yield [2]. As a trailblazing tree species, Pinus yunnanensis thrives in rocky, nutrient-deprived soils. Its tolerance of shade and resilience in drought conditions, attributed to its deep-rooted system, further underscore its distinctive attributes [3]. Notably, its significance extends to regional economic advancement and ecological restoration pursuits [2,3,4]. Unraveling the inherent dynamics of Pinus yunnanensis plays a pivotal role in reinvigorating secondary forests, and in turn, contributes to China’s ambitious goal of achieving carbon neutrality by 2060. This prospect arises from the potential to cultivate effective forest carbon sinks within the region [5].

Secondary forests often grapple with challenges such as uneven forest stand structure (FSS), diminished biodiversity, heightened susceptibility to forest fires, and vulnerability to natural disasters like pests, diseases, and wildfires [6,7,8]. Presently, the majority of secondary forests harboring Pinus yunnanensis in Yunnan Province, China, are situated within the middle-aged and young forest stages. These forests exhibit a simplistic stand composition, unhealthy structures, severe quality degradation, and significant susceptibility to fires, pests, and diseases [9]. The local government has prioritized the advancement and safeguarding of ecological environments. Consequently, an imperative for research focused on the wholesome management of Pinus yunnanensis secondary forests has arisen. However, pertinent investigations remain scarce, particularly concerning the secondary forests of Pinus yunnanensis along the eastern slopes of Cangshan Mountain in Dali, Yunnan Province, which have yet to be addressed in the existing literature. As a result, embarking on comprehensive research dedicated to Pinus yunnanensis secondary forests not only holds notable theoretical scientific significance but also holds substantial practical implications.

Optimizing and regulating FSS is one of the most direct and effective technical approaches in forest management and planning, particularly in the management of secondary forests. The existing studies have predominantly focused on rectangular plots for adjustments, with a variety of optimization methods, yet research addressing the optimization of circular small plots regarding FSS remains scarce. The majority of studies related to FSS optimization primarily revolve around selective logging as the main optimization measure. Identifying the combination of trees that need selective logging rapidly and accurately is a critical and challenging issue within the optimization process. The key to resolving these issues lies in determining FSS indicators, the preliminary identification of trees for selective cutting, constructing optimization models, and designing solution algorithms. FSS indicators encompass both non-spatial and spatial structural indicators, including non-spatial indicators such as tree diameter distributions [10], species counts, and canopy density [11], alongside spatial indicators like the mingling index (M) [12,13], canopy competition index (CI) [14,15], angle index (W) [16,17], storey index (S) [18], and open comparison (OP) [19]. These indicators comprehensively evaluate various aspects of FSS from different perspectives.

Three primary methods are utilized for the initial determination of potential trees for felling: the random selection method (RSM) [20], Q-value method (QVM) [21], and V-map method (VMM) [22]. The RSM is straightforward, involving the random sampling of initial forest trees within a plot to assemble a collection of trees for potential felling. The QVM employs a multiplication and division strategy, considering larger-is-better metrics as the numerator and smaller-is-better metrics as the denominator when combining forest structure indicators. This process generates a comprehensive single-tree evaluation index, Q. By computing the Q-value for each central tree and sorting them in ascending order, trees ranked higher exhibit poorer performance within the spatial structure, thus making them more suitable for removal. The top ‘n’ central trees within the felling quota constitute the collection of trees identified by the QVM for potential felling. Similar to the QVM’s approach to tree selection, the VMM divides the forest trees within a plot into a Voronoi diagram. It calculates the angle index for each central tree and the average angle index for the plot. Utilizing the average angle index as the criterion for tree selection, trees with an angle index higher or lower than the average, based on optimization goals, are selected for the collection of trees identified for potential felling.

An enhanced understanding of the factors impacting forest ecosystems has led us to realize that a well-structured forest stand is not merely a singular objective. The improvement of multiple structural indicators interacts, presenting a challenge akin to multi-objective optimization, characterized by multi-objectives and non-linearity. In such scenarios, traditional simplistic mathematical planning methods struggle to suffice, whereas multi-objective optimization problems (MOPs) offer a more comprehensive solution. They permit us to concurrently balance and optimize multiple interrelated objectives, rather than solely pursuing a singular optimal solution. This approach enables a more holistic comprehension and management of the diversity and complexity within FSSs, thereby effectively enhancing the overall health of forest ecosystems. Commonly used intelligent optimization algorithms for solving MOPs include heuristic algorithms like Monte Carlo (MC) [23,24,25,26], bio-inspired algorithms such as genetic algorithms (GA) [27], simulated annealing (SA) [28,29], and particle swarm optimization (PSO) [30,31,32,33], among others. MC is an early heuristic algorithm applied in forest management due to its simplicity and high programmability, yet its results often lack accuracy due to inherent algorithmic shortcomings. Bio-inspired algorithms like GA and PSO frequently encounter issues like susceptibility to local optima. Throughout the solution process, these methods tend to exhibit high fluctuations and discreteness in the values of the objective function (VOFs) [34]. However, particle swarm optimization offers the advantage of fast search speed and finds relatively more applications in multi-objective optimization of forest stand structures (MOFSSs). Yet, due to the inherent complexity of forest ecosystems, using these algorithms for such problems often presents challenges such as extensive computational requirements, intricate solution processes, and difficulties in optimization [26,30,35]. Particularly in cases where forest stand structures are complex or involve numerous trees, the aforementioned algorithms often struggle with low solution efficiency and subpar optimization accuracy.

RL represents an intelligent algorithm rooted in the trial-and-error strategy, characterized by its conceptual simplicity, absence of strict modeling requirements, dynamic decision-making capabilities, and robust adaptability [36]. Leveraging continuous trial and learning, RL can progressively enhance its efficacy, rendering it well-suited for addressing dynamically evolving multi-objective optimization challenges. The inherent complexity of RL resides in formulating an effective reward function, a pivotal aspect that enables the algorithm to converge toward improved outcomes within a shorter time frame [37,38]. Furthermore, the efficiency of RL hinges on the ongoing interaction between the intelligent agent and its environment, necessitating a substantial volume of samples for training. As a result, RL has been widely applied in dynamic multi-objective optimization fields [39,40] such as optimal smart grid management, intelligent transportation systems, and unmanned aircraft control [41,42]. These domains not only provide ample training samples but also establish a conducive environment for effective interactions with intelligent agents.

RL has found extensive application in the field of forestry. Research endeavors have concentrated on various aspects such as simulated fire fighting [43], forest fire detection [44], optimal harvest modeling [45], and forest management [46], offering crucial insights into these domains. Despite their significant contributions to the efficient preservation and management of forests, these research areas differ significantly in their focus and methodologies. Each domain adopts distinct approaches and tools tailored to address specific challenges and requirements. However, there is currently a significant gap in research concerning the application of RL in multi-objective optimization within the field of forest stand structure. This study is focused on utilizing RL techniques to address the multi-objective optimization challenges associated with secondary forest stand structure of Pinus yunnanensis. We aim to bridge the existing gap in RL-based multi-objective optimization within the domain of FSS optimization. While other machine learning techniques have found widespread application in forestry [47,48], particularly in single-objective studies such as fire prediction [49], yield forecasting [50], and LiDAR-based forest structure identification [51], our extensive literature review has revealed a lack of these methods applied to address multi-objective optimization in FSSs. This highlights a deficiency in these techniques when tackling complex multi-objective optimization scenarios. By employing RL methods, our study focuses on the multi-objective optimization of secondary forests of Pinus yunnanensis. Differing from existing research, we concentrate on addressing the challenges posed by multiple objectives, an area not fully explored in the current literature. Our research carves out a new path in forest structure optimization, aiming to bridge the research gap in machine learning approaches to multi-objective optimization in FSSs. Simultaneously, while deep learning excels in managing large datasets and pattern recognition, its application in forest management encounters certain limitations. Challenges in environmental interaction and reward function design restrict its current primary application to forest inventory and planning [52,53], without fully addressing complex multi-objective optimization problems.

Hence, this study chose the MC algorithm for its simplicity, ease of programmability, and historical relevance, despite its potential for yielding less accurate results. As a supplementary method, we utilized the PSO algorithm, known for its rapid search speed and widespread application, albeit with a susceptibility to local optima, as our control group. Utilizing the forest stand survey database of Pinus yunnanensis secondary forests in Yunnan Province, southwestern China, we conducted simulated felling optimization across various sample plots. Through comparative analysis of the simulation outcomes, we assessed the impact of three methods for determining potential felling trees on the optimization of FSSs. Furthermore, this study evaluated the performance of three multi-objective optimization algorithms and explored the application of Q-learning, an RL method, in optimizing the structure of secondary forests dominated by Pinus yunnanensis. The research findings demonstrate the potential advantages of employing RL in addressing optimization challenges related to FSS. This approach holds promise in overcoming prevalent hurdles in widespread applications. Specifically focusing on the management and optimization of secondary Pinus yunnanensis forests in southwest China, this study provides valuable guidance and direction. Moreover, its adaptability and transferability offer valuable insights for similar forest ecosystem management issues in other regions.

2. Materials and Methods

2.1. Study Areas

The forest data collection was conducted in the vicinity of Cangshan Mountain (

25^{\circ} 34^{'}

–

26^{\circ} 00^{'}

N,

99^{\circ} 55^{'}

–

100^{\circ} 12^{'}

E), Yunnan Province, southwestern China (Figure 1). The surveyed areas encompass Zhonghe Peak, Foding Peak, Malong Peak, Lan Peak, and others. These peaks are situated on the eastern slope of Cangshan Mountain. The eastern slope of Cangshan Mountain falls within the subtropical climate zone, characterized by an annual average temperature of 15 °C and prevailing southwest monsoon winds. Abundant annual rainfall exceeding 1000 mm is observed, with distinct dry and wet seasons, primarily concentrated from May to October, contributing to 84% of the total annual precipitation [9]. The prevalent soil type in this area is red soil.

The main tree species in the study areas include Pinus yunnanensis, Pinus armandii Franch., Quercus acutissima Carruth, Vaccinium bracteatum Thunb, Ternstroemia gymnanthera, Betula alnoides, Gaultheria griffithiana Wight, Eucalyptus robusta Smith, Quercus aliena, Q. varia bilis, Abies fabri, Keteleeriaevelyniana, Alnus nepalensis, Juglans regia L, and Populus tremula Linn, among others.

2.2. Study Site and Data Collection

The fundamental statistics of the sample plots are presented in Table 1. The predominant tree species, Pinus yunnanensis, was surveyed across four circular sample plots, each located at a distinct peak. The radii of these circular sample plots measure 35 m (Zhonghe Peak), 32 m (Foding Peak), 20 m (Malong Peak), and 19 m (Lan Peak).

Between July and December 2022, forestry operations were carried out within the aforementioned sample plots. Over this timeframe, the geographical coordinates, elevation, slope, slope direction, and radius of each sample plot were meticulously measured and recorded. A comprehensive examination of live standing trees with a diameter at breast height (DBH, the diameter of a tree measured at 1.3 m above the ground) equal to or greater than 5 cm (DBH ≥ 5 cm) was conducted within the sample plots. For each tree, essential forestry attributes, such as species, DBH, tree height (TH, the vertical distance from the tree base to its highest point), and crown width (CW, the average of the longest horizontal distances of the crown in the four cardinal directions), were documented with specialized altimeters and distance measurers. The accurate relative coordinates to the center of the sample plots for the base of each tree were determined using a Topcon GTS-2002 autofocus total station (Topcon, Tokyo, Japan).

2.3. Determination of Spatial Structure Units and Edge Correction

The Voronoi method was employed to delineate the spatial structure units [54]. This approach to constructing forest stand spatial structure units using the Voronoi diagram accounts for the unique characteristics of each individual tree, resulting in a more rational and comprehensive representation. Based on the measured relative positional coordinates of the forest trees, the Voronoi diagram was generated using R4.2.0. In this diagram, each polygon corresponds to a spatial structure unit formed by each forest tree and its neighboring trees [55,56]. The spatial structure units of edge stands can be impacted by the boundaries of the sample, potentially introducing calculation errors in the spatial structure index.

To mitigate these errors, this study suggests implementing the buffer zone method [57,58]. Specifically, the sample plot boundary was confined within a b-meter circular area to serve as a buffer zone. When calculating the spatial structure index for the spatial structure unit composed of the central tree and its adjacent trees, only the forest trees located within the buffer zone were considered in forming the spatial structure unit as adjacent trees [59], as depicted in Figure 2.

2.4. Stand Structure Indexes

(1) Tree Diameter Classes [10]

The categorization of trees into distinct diameter classes stands as a pivotal facet of our analysis. Elevation in the number of these classes distinctly correlates with enhanced stand growth. For this investigation, we embarked upon categorization based on DBH, commencing from 6 cm DBH and advancing in 2 cm increments. This systematic approach ensured the preservation of the number of diameter classes amidst stand optimization:

D = D_{0}

(1)

where

D_{0}

denotes the count of diameter classes before selection cutting, and D signifies the count post-selection cutting.

(2) Number of Species

A paramount objective during selection cutting entails the safeguarding of tree species diversity, preventing any inadvertent eradication. We vigilantly upheld the assurance that the count of tree species would remain unaltered:

T = T_{0}

(2)

where

T_{0}

signifies the initial count of tree species, and T denotes the count post-selection cutting.

(3) Cutting Intensity

The vigor of the stand post-optimization hinges upon the determination of cutting intensity. Ideally, the annual volume of cutting should remain below the annual growth rate of the stand. Previous research [61,62] has indicated that the optimal thinning intensity for secondary forests of Pinus yunnanensis should not surpass 35%:

N \geq N_{0} (1 - 35 %)

(3)

where

N_{0}

signifies the total count of trees prior to selection cutting, and N represents the count post-selection cutting.

(4)

C D

(canopy density) [11]

In practical forest management, the formation of a continuous canopy is a quintessential requirement. Generally, a canopy density of not less than 0.7 signifies an adeptly continuous forest floor cover:

C D \geq 0.7

(4)

Spatial Structure Indexes

(1) M (mingling index)

The quantification of spatial segregation of tree species, known as M, is ascertained by the ratio of neighboring tree j stem count, not belonging to the same species as the focal tree i, to the overall neighboring tree stem count. Its mathematical representation is given by [12,13]:

M_{i} = \frac{1}{n} \sum_{j = 1}^{n} v_{i j}

(5)

where

M_{i}

symbolizes the mingling degree of the focal tree i, and

v_{i j}

is a binary variable. When the neighboring tree j is not of the same species as tree i,

v_{i j}

= 1; otherwise,

v_{i j}

= 0. This metric,

M_{i}

, ranges from 0 to 1, indicative of different degrees of mixing, with

M_{i} = 0

signifying zero-grade mixing and

M_{i} = 1

indicating extreme hybridization.

(2)

C I

(canopy competition index)

To gauge the competitive pressures among trees, we designed a

C I

that employs the overlap area of canopies. The index is calculated as follows [15]:

C I_{i} = \frac{1}{Z_{i}} \times \sum_{j = 1}^{n} A O_{i j} \times \frac{L_{j}}{L_{i}}

(6)

Equation

C I_{i}

is the canopy competition index of object tree i,

Z_{i}

is the projected area of the canopy of object tree i,

L_{i} = H_{i}

×

C w_{i}

×

C L_{i}

(height of object tree i × canopy width of object tree i × crown length of object tree i),

L_{j} = H_{j}

×

C w_{j}

×

C L_{j}

(height of competing tree j × canopy width of competing tree j × crown length of competing tree j).

A O_{i j}

is the canopy overlap area between object tree i and competitor tree j; when object tree i and competitor tree j do not overlap,

A O_{i j} = 1

.

When object tree i overlaps with competing tree j, the canopy overlap area

A O_{i j}

is calculated as follows.

The total area of shading produced by n competing trees j on object tree i is first calculated with the formula [14]:

\begin{matrix} S_{0} = \frac{C w_{i}^{2}}{2} arccos (\frac{q_{j}^{2}}{2 C w_{i}^{2}} - 1) - \frac{1}{4} \sum_{j = 1}^{n} q_{j} \sqrt{4 C w_{i}^{2} - q_{j}^{2}} \end{matrix}

(7)

The total shading area of object tree i over n competing trees j is then calculated by the formula [14]:

\begin{matrix} S_{1} = \frac{1}{2} \sum_{j = 1}^{n} [C w_{j}^{2} arccos (1 - \frac{4 C w_{i}^{2} - q_{j}^{2}}{2 C w_{j}^{2}}) - \frac{\sqrt{4 C w_{i}^{2} - q_{j}^{2}}}{2} \sqrt{4 C w_{j}^{2} - (4 C w_{i}^{2} - q_{j}^{2})}] \end{matrix}

(8)

Finally, the total area of the overlapping part of the object tree i and the competing tree j is calculated with the formula:

A O_{i j} = S_{0} + S_{1}

(9)

Equation

q_{j} = \frac{L_{i j}^{2} - (C w_{j}^{2} - C w_{i}^{2})}{L_{i j}}

, where

L_{i j}

is the distance between the competing tree j and the object tree i,

C w_{i}

is the canopy radius of the subject tree i,

C w_{j}

is the radius of the canopy of the competing tree j, n is the number of competing trees.

(3) W (uniform angle index)

The delineation of the stand’s horizontal distribution pattern finds expression through W. This index captures the proportion of angles

α

between the focal tree i and its nearest neighbors, smaller than a predefined standard angle

α_{0}

. Mathematically, this is represented as [16,17]:

W_{i} = \frac{1}{n} \sum_{j = 1}^{n} z_{i j}

(10)

where

W_{i}

signifies the angle index of tree i, and

z_{i j}

is binary. If the angle

α

between trees i and j is less than

α_{0}

,

z_{i j}

= 1; otherwise,

z_{i j}

= 0. The index values of

W_{i}

offer insights into the degree of distribution uniformity, ranging from an extremely uniform distribution (

W_{i} = 0

) to a highly inhomogeneous distribution (

W_{i} = 1

).

(4) S (storey index)

The vertical diversity and complexity within a stand are encapsulated by S, which quantifies the proportion of neighboring trees j at the same height level as the focal tree i. Mathematically, this index is represented as [18]:

S_{i} = \frac{1}{n} \sum_{j = 1}^{n} v_{i j}

(11)

where

S_{i}

represents the storey index of tree i, and

v_{i j}

is a binary variable. If tree i shares the same height level as tree j,

v_{i j}

= 0; otherwise,

v_{i j}

= 1. The

S_{i}

index provides insights into the vertical structural diversity of the stand.

(5)

O P

(open comparison)

The

O P

encapsulates the light environment and growing space size of tall trees within the stand. This metric quantifies the extent to which the focal tree i within a spatial structure unit is shaded by its neighboring tree j. Mathematically, it is expressed as [19]:

O P_{i} = \frac{1}{n} \sum_{j = 1}^{n} t_{i j}

(12)

Within Equation

(9)

,

O P_{i}

stands for the open comparison corresponding to object tree i, while

t_{i j}

is characterized as a discrete variable. If the horizontal distance between the object tree i and its neighboring tree j surpasses the disparity in height between them, then

t_{i j}

is assigned a value of 1; contrarily, it receives a value of 0.

O P_{i}

assumes five distinct categories: 0 signifies that the luminous environment for object tree i is entirely shaded;

0.25

implies that the light surrounding object tree i is somewhat obscured;

0.5

suggests a moderately open luminous environment for object tree i;

0.75

indicates a notably open luminous setting around object tree i; lastly, 1 indicates a luminous atmosphere of utmost openness, characterized by an abundant influx of direct light.

The quantification of stand structure is the basis of stand structure optimization. In this study, we selected the number of tree diameter classes, the number of tree species, and the canopy density as indexes to quantify the non-spatial structure of the stand and the uniform angle index, mingling index, crown competition index, storey index, and open comparison as indexes to quantify the spatial structure of the forest stand. The uniform angle index describes the horizontal distribution pattern of trees, the mingling index describes the degree of species isolation, the crown competition index describes the competitive pressure among trees, the storey index describes the vertical distribution pattern of trees, and the open comparison describes the intensity of understory illumination.

2.5. Methods for Selecting Trees for Cutting

2.5.1. RSM (Random Selection Method) [20]

A swift random sampling approach was employed from the original pool of retained trees (comprising all trees within the stand) to acquire the subset of trees designated for selection, ensuring that the cutting intensity remained within the prescribed limits (Figure 3).

2.5.2. QVM (Q-Value Method) [21]

In previous studies, a comprehensive index, Q, was formulated to assess the spatial structure of trees. This index integrated multiple spatial structural indicators using multiplication and division methods, with larger-is-better metrics as the numerator and smaller-is-better metrics as the denominator. These indicators, such as the mingling index (M), canopy competition index (

C I

), angle index (W), storey index (S), and open comparison (

O P

), collectively capture diverse aspects of tree spatial structure. Higher values of M,

O P

, and S, coupled with lower values of

C I

, typically indicate a superior overall spatial structure. The W tends to approximate standard random distribution values.

To ensure equal significance and influence of each structural index in the comprehensive evaluation, a normalization method with identical weights was employed. This approach effectively synthesizes various indices, ensuring their equitable contributions and mitigating the impact of different scales.

The final formula is represented as:

Q_{i} = \frac{\frac{1 + S_{i}}{δ_{S}} \cdot \frac{1 + O P_{i}}{δ_{O P}} \cdot \frac{1 + M_{i}}{δ_{M}}}{\frac{1 + C I_{i}}{δ_{C I}} \cdot \frac{1 + W_{i}}{δ_{W}}}

(13)

In Equation (13),

W_{i}

represents the angle index,

M_{i}

denotes the mingling index,

S_{i}

signifies the storey index,

O P_{i}

characterizes the open comparison, and

C I_{i}

quantifies the canopy competition index. The terms

δ_{W}

,

δ_{M}

,

δ_{S}

,

δ_{O P}

, and

δ_{C I}

correspond to the standard deviations of the respective indices. This formula provides a comprehensive evaluation for each tree, ranked in ascending order. Trees with higher rankings exhibit poorer spatial structure and are consequently more suitable for harvesting. Within a specific harvesting limit, the top n ranked trees form the set designated for felling.

2.5.3. VMM (V-Map Method) [22]

In random forests, the range of [0.475, 0.517] is identified as the optimal angle scale for an ideal stand [63], commonly centered around a midpoint value of 0.496. This range represents an optimal balance for the stand’s spatial structure. The choice of this interval is based on previous research or empirical evidence indicating that stands within this range tend to exhibit more favorable spatial characteristics. Additionally, the proximity of a stand’s horizontal pattern to a random distribution is determined by assessing the angle scale value and its deviation from the established midpoint of 0.496 (

| \bar{W} - 0.496 |

). This method offers insights into deviations from the expected random distribution, aiding in the assessment of a stand’s spatial arrangement relative to an ideal pattern. During the initial selection of trees for cutting, priority is given to the nearest neighbor of the reference tree with the highest

| \bar{W} - 0.496 |

value. The closest neighbor of the reference tree is initially chosen for cutting, while stands with trees exhibiting weak, moderate, or suppressed competition are identified as potential candidates for retention. If necessary, further calculations are performed using the critical distance formula proposed by Johann (1982) [64]:

G_{i z} = \frac{h_{z}}{A} \cdot \frac{d_{i}}{d_{z}}

(14)

where

G_{i z}

represents the critical distance between the reference tree and the competing tree,

h_{z}

is the tree height of the reference tree z,

d_{i}

signifies the DBH of competing tree i,

d_{z}

stands for the DBH of reference tree z, and A is the parameter defining the cutting weights, which is commonly assumed to be

A = 5

(Figure 4).

2.6. Optimization Model for Selection Cutting of Forest Stand Structure

2.6.1. Constraints

(1) Spatial Structure Constraints

Post-optimization and -adjustments, the values of each sub-objective should not degrade compared to the pre-optimization values. This ensures the conservation and improvement of the spatial diversity of the forest stand. Precisely, it guarantees the convergence of the stand’s horizontal distribution pattern towards a random distribution, enhancement in the level of mixing, reduction in competition within the stand, augmentation of vertical richness, and intensification of the stand’s openness and light penetration. Mathematically, it is expressed as:

\{\begin{matrix} |\bar{W} - 0.496| \leq |\bar{W_{0}} - 0.496| \\ \bar{M} \geq \bar{M_{0}} \\ \bar{C I} \leq \bar{C I_{0}} \\ \bar{S} \geq \bar{S_{0}} \\ \bar{O P} \geq \bar{O P_{0}} \end{matrix}

(15)

Here,

\bar{W}

represents the post-selection cutting uniform angle index of the forest stand,

\bar{M}

denotes the mingling index of the forest stand after cutting,

\bar{C I}

signifies the canopy competition index of the forest stand after cutting,

\bar{S}

reflects the storey index post-cutting, and

\bar{O P}

captures the open comparison post-cutting.

\bar{W_{0}}

,

\bar{M_{0}}

,

\bar{C I_{0}}

,

\bar{S_{0}}

, and

\bar{O P_{0}}

correspond to the values of the aforementioned indices before selection cutting.

(2) Non-spatial Structure Constraints

The selection cutting process must also adhere to non-spatial structure constraints. These constraints ensure the preservation of the number of diameter classes and tree species within the stand, maintain the cutting intensity under 35%, and uphold a canopy density greater than 0.7. They are formally represented as:

\{\begin{matrix} D = D_{0} \\ T = T_{0} \\ N \geq N_{0} (1 - 35 %) \\ C D \geq 0.7 \end{matrix}

(16)

where

D_{0}

signifies the number of diameter classes of trees within the stand before selection cutting, D represents the number of diameter classes after cutting;

T_{0}

denotes the count of tree species before cutting, T reflects the count of tree species after cutting;

N_{0}

stands for the total tree count within the stand before cutting, N corresponds to the total tree count after cutting; and

C D

quantifies the canopy density post-cutting.

2.6.2. Construction of the Model

Addressing a MOP (multi-objective optimization problem) involves the quest for an optimal solution that balances multiple objectives within a constrained framework. Often, these objectives are interconnected and bound by constraints, making the pursuit of an ideal solution challenging. Hence, it becomes imperative to amalgamate and synthesize these sub-objectives, shaping an overall OF (objective function), and subsequently discovering the optimal solution within this OF. In the context of MOFSS (multi-objective optimization of forest stand structure), a larger VOF (value of the objective function) corresponds to a more favorable spatial structure of the corresponding forest stand.

In this study, we carefully selected five pivotal spatial structure indices: W (uniform angle index), M (mingling index),

C I

(crown competition index), S (storey index), and

O P

(open comparison). Employing a ‘multiplication and division’ strategy, we orchestrated the multi-objective planning and synchronization of these spatial structure indices. This orchestration culminated in the formulation of the OF for the multi-objective optimization model of FSS:

max O F = \frac{1}{N} \sum_{i = 1}^{N} \frac{\frac{1 + M_{i}}{δ_{M}} \cdot \frac{1 + O P_{i}}{δ_{O P}} \cdot \frac{1 + S_{i}}{δ_{S}}}{\frac{1 + C I_{i}}{δ_{C I}} \cdot \frac{1 + |W_{i} - 0.496|}{δ_{|W - 0.496|}}}

(17)

Here,

W_{i}

,

M_{i}

,

S_{i}

,

O P_{i}

, and

C I_{i}

represent the uniform angle index, mingling index, storey index, open comparison, and canopy competition of the central tree i, respectively. Additionally,

δ_{W}

,

δ_{M}

,

δ_{S}

,

δ_{O P}

, and

δ_{C I}

denote the standard deviations of their respective structural parameters. The midpoint of the range [0.475, 0.517] is 0.496, where a smaller value of

|W_{i} - 0.496|

indicates a closer alignment of the forest stand’s horizontal distribution pattern with randomness.

2.7. Algorithms for Solving the Model

2.7.1. MC (Monte Carlo)

MC, a classical approach for addressing MOPs, has been extensively applied to tackle MOFSS problems [20]. Its fundamental principle involves calculating the VOF and related indices for the retained trees after each simulated logging iteration. By iteratively filtering out trees that do not meet the constraints, the MC algorithm generates an approximation of optimal solutions for the MOFSS problem.

The general procedure of the MC algorithm for the MOFSS is outlined as follows: (1) Set the initial control parameters and initialize

I = 0

(count of successive non-feasible solutions) and

U = 0

(count of non-improvement of feasible solutions). (2) Formulate the initial set of retained trees as

g *

and calculate its corresponding VOF value. (3) Determine the proposed trees for felling using the method for selecting candidate cutting trees, resulting in the retained tree set g. (4) Verify whether the set g satisfies all constraints. If it does, calculate the VOF for set g; otherwise, increase I and continue if I is within the limit

I_{\max}

. (5) If VOF > VOF*, assign the set g and its VOF to

g *

and VOF*, respectively, and reset

U = 0

. Then, re-select felling trees and repeat the process; otherwise, increment U. (6) If

U > U_{\max}

, the algorithm has converged to optimal solutions g and the corresponding OF.

The MC flow chart for solving MOFSS is presented in Figure 5.

2.7.2. PSO (Particle Swarm Optimization)

The integration of PSO, renowned for its spatial search efficiency, with multi-objective problems has gained traction in addressing the multi-objective optimization model of FSS [54,65]. The fundamental concept involves representing each tree in the forest as a particle within the solution space of the PSO algorithm. The two-dimensional spatial layout of the entire forest corresponds to the objective solution space, and the positions of particles in this solution space mimic the spatial coordinates of trees. The VOF for stand structure optimization acts as the fitness value of particles, transforming the MOFSS problem into an iterative optimization process within the solution space of PSO.

The key steps encompass setting up the parameters of the PSO algorithm, mapping the Voronoi space of the forest as the objective solution space, and associating the spatial coordinates of trees with potential particle positions. Particles collectively explore the Voronoi space of the forest to identify trees that fulfill the optimization objectives. During each iteration, particles search for a Voronoi polygon that satisfies the objectives. The position of the best-performing particle (corresponding to a tree) updates the current position of the particle, leading it to ‘land’ on a tree that aligns with improved FSS. The flowchart outlining the PSO procedure for MOFSS is illustrated in Figure 6.

2.7.3. RL (Reinforcement Learning)

Acknowledging the inherent accuracy limitations of MC and the susceptibility of PSO to local optimization, RL presents an alternative with distinct advantages. RL is characterized by its trial-and-error approach, absence of a predefined model, dynamic decision making, and adaptive capabilities. Introducing RL into the realm of MOFSS is a novel endeavor that transforms the process of tree selection into actions guided by an intelligent agent. This agent makes decisions about which trees to fell based on a system of rewards and penalties, continually refining its decision-making strategies, approximating the optimal approach, and ultimately identifying relatively optimal solutions.

In the context of RL applied to MOFSS, tree selection is transformed into intelligent agent actions. The agent’s actions involve deciding whether to cut or not cut specific trees from the set designated for cutting. The environment responds by providing rewards or penalties to the agent based on a predefined reward function. This feedback influences the agent’s subsequent actions. Through the process of maximizing rewards, the agent aims to identify the actions that yield the highest cumulative rewards, thereby contributing to the enhancement of the forest stand structure. The flowchart depicting the RL approach to solving MOFSS is illustrated in Figure 7. The pseudocode for applying RL to MOFSS is detailed in Appendix A.

2.7.4. Optimization Schemes

As shown in Table 2, to evaluate the effectiveness of these algorithms in practical MOFSS applications and to investigate the impact of tree count on model performance, four real forest stands characterized by varying densities and conditions were selected for simulation experiments. The choice of methods for selecting proposed cutting trees leads to different sets of selected trees, consequently influencing the overall performance of the models. In this study, spatial structure units based on the V diagram were utilized, and the tree selection methods included RSM, QVM, and VMM. The optimization of the model was carried out using MC, PSO, and RL, resulting in a total of nine distinct stand structure optimization strategies. A uniform maximum loop iteration of 10,000 was set for all of these strategies.

3. Results

3.1. Algorithm Parameter Settings

The parameter configurations for each solution algorithm in the experimental setup are outlined in Table 3.

For all algorithms, the initial number of iterations (I) is set to 0, while the maximum number of iterations (

I_{m a x}

) is set to 10,000, a conventional value commonly used in optimization studies. The variable U represents the initial number of consecutive iterations in which the objective function does not exhibit improvement for the MC algorithm, initialized as

U = 0

, and

U_{m a x}

is set to 500. In the context of MC, assuming that the nth feasible solution has a value of the objective function VOF(n), if none of the 500 consecutive feasible solutions found after the nth solution demonstrate superior quality compared to VOF(n), then VOF(n) is regarded as an optimal solution.

Parameters such as p, w,

c 1

, and

c 2

are common across the PSO algorithm. Specifically, p denotes the number of particles, w stands for the inertia weight,

c 1

represents the individual particle learning factor, and

c 2

signifies the global learning factor. The values of these parameters adhere to conventional standards. To implement RL for MOFSS, this study abstracts the problem into a bidirectional movement scenario for the RL agent. At the initiation of each iteration, the agent’s position is at the ‘starting point’ (

s t a t e = 0

). Should the agent choose to execute selective tree cutting to enhance FSS, it advances forward (

s t a t e = s t a t e + 1

). On the other hand, the agent might remain stationary (

s t a t e = s t a t e + 1

) or take a step backward (

s t a t e = s t a t e - 1

) if it decides not to cut trees (refer to Figure 7 or the pseudo-code for precise agent movement rules). The ‘end point’ corresponds to a distance of 100 units from the ‘starting point’ (

s t a t e_{m a x} = 100

). The parameters a, b, c, and d are meticulously adjusted reward and penalty values acquired through numerous trial runs, where rewards hold positive values and penalties carry negative values.

3.2. Results of Simulated Selective Cutting Optimization

The findings demonstrate that following the implementation of selective cutting adjustments across different scenarios, the counts of tree diameter classes and tree species within each sample site remained consistent. The spatial structure quality indices for each forest stand exhibited varying degrees of improvement. The mean angle index (

|W_{i} - 0.496|

), which represents the horizontal distribution pattern, experienced a slight reduction, bringing it closer to a random distribution. The canopy competition index decreased, indicating a reduction in inter-tree competition. The mingling index showed improvement, with the most notable enhancement observed in sample site P4. The storey index and open comparison values increased, thereby enhancing vertical structural diversity and light penetration. Overall, VOF scores were enhanced, indicating the successful optimization of forest stand structure across the different sites (see Figure 8).

3.3. Algorithm Performance

The optimization strategies were simulated across the four sample plots.

The selection of VOF as the evaluation metric for MOFSSs is founded on its comprehensive nature, alignment with prior research [66,67], and interpretive capacity. VOF enables a comprehensive evaluation of forest stand spatial structures and serves as a reliable metric commonly used in related studies, ensuring comparability. The exclusive use of VOF as the sole metric streamlines the problem, facilitates interpretable outcomes, and aligns with the optimization model employed in this study, effectively addressing challenges associated with model complexity and compatibility.

We selected VOF as the primary evaluation metric for MOFSSs based on its extensive application in the research domain, comprehensiveness, and interpretative capabilities. Prior studies [66,67] have demonstrated the effectiveness of VOF in assessing forest structure by considering the spatial arrangement comprehensively, ensuring comparability across related studies. In this research, solely using VOF as the exclusive evaluation metric assists in simplifying the problem, generating results that are easier to interpret, and aligning with our adopted optimization model, effectively addressing challenges related to model complexity and compatibility. While other metrics also possess certain advantages, within the current research context, VOF, as a comprehensive and reliable metric, best assesses the multi-objective optimization effects on forest structure we are investigating. Moreover, despite the existence of other parameters, such as object value [20], comprehensive homogeneity index [66], and fitness value [67], utilized in assessing the optimization effects of forest structure, their design principles and computation methods closely resemble those of VOF. Most parameters have adopted the principles of multiplication and division, integrating multiple forest structure evaluation indices logically.

Figure 9 illustrates the most favorable optimization outcomes achieved with each scheme. On the horizontal axis, various forest stand structure optimization models are represented, each applied to the four distinct sample plots. The vertical axis showcases the optimal objective function values corresponding to the FSS optimization model of each plot. In the context of MOFSS across the four sample plots, both MC-type and PSO-type optimization strategies encountered suboptimal solutions on two occasions. Specifically, MC-type schemes produced suboptimal outcomes for sample plots P2 and P3 using M1 (r-MC), while PSO-type schemes yielded suboptimal solutions for sample plots P1 and P4 via M6 (V-PSO) and M5 (Q-PSO), respectively. The disparity in suboptimal solutions between these two categories of schemes is relatively minor, indicating the effective performance of both MC-type and PSO-type approaches in addressing MOFSS challenges. Remarkably, M7 (r-RL) consistently achieved the relative maximum VOF within the 35% selection threshold for each individual sample plot.

To evaluate the optimization performance of each scheme on the secondary forest stand structure of Pinus yunnanensis, this study compares their effectiveness in terms of algorithm convergence speed and the extent of stand structure optimization. The advantages and drawbacks of the stand structure are gauged through the value of the objective function. To ensure that sample site conditions do not bias the comparison results, the optimization schemes were simulated for logging selection in the four sample plots, and the outcomes are presented in Table 4. Similarly, the convergence progress of each optimization scheme in addressing the stand structure optimization challenge within the same plots is depicted in Figure 10.

As indicated in Table 4, the objective function value of the stand structure for sample plot P1 increased from 0.3515 to 0.3894, P2 increased from 0.2814 to 0.3649, P3 increased from 0.3748 to 0.4598, and P4 increased from 0.4812 to 0.5340. In the simulated logging selection experiments across these four sample plots, the highest objective function value achieved by the reinforcement learning scheme was 0.5983, and the optimal objective function value reached by the Monte Carlo scheme was 0.4887. The values obtained using the reinforcement learning algorithm (0.4121, 0.4078, 0.5047, 0.5635, respectively) clearly exceeded those of the MC (0.3892, 0.3983, 0.4887, 0.5537, respectively) and PSO (0.4010, 0.3877, 0.4729, 0.5593, respectively) schemes. Observing Figure 10, the experiments logged the iterations conducted during model runs, offering partial insight into the model’s runtime. The results reveal varying convergence times as the number of iterations increases. The corresponding runtime table (Table 5) provides a comprehensive overview of the time required for each strategy to attain the maximum objective function values across different plots. The table demonstrates the practical runtime implications of each strategy, complementing the insights obtained from the iteration logs. Upon comprehensive analysis of iterations and runtime, it becomes evident that the RL algorithm demonstrates slightly faster convergence compared to the MC and PSO algorithms when achieving similar objective function values. Furthermore, it is evident that the MC algorithm converges more slowly in cases like M3 in sample plots P1 and P3 and M1 in sample plot P4. The PSO algorithm tends to encounter issues with local optimization, such as M6 in sample plots P1, P3, and P4.

3.4. Influence of Proposed Selective Felling Tree Determination Methods

As depicted in Figure A1, within the MC-class optimization schemes (M1, M2, M3), the optimization scheme utilizing random selection as the method for determining proposed logging (M1) significantly outperformed the optimization schemes employing the QVM or VMM (M2, M3) in enhancing the stand structure of the sampled areas.

In the PSO-class optimization schemes (M4, M5, M6), the optimal results achieved by the schemes utilizing the QVM or VMM for selecting trees to be felled (M5, M6) were generally comparable in optimizing the stand structure across each sample site, and slightly superior to the optimization scheme combining RSM and PSO (M4). Similar to the MC-class scheme, the scheme that combined the RL with the RSM for selecting trees to be felled (M7) demonstrated the best performance across the four sample sites (M7 > M8 > M9).

It is also evident from Table 4 that the average improvement in the objective function value of M7 across the four sample plots amounts to 26.81%, which is significantly higher than the average improvement of all the schemes (17.42%).

4. Discussion

In the domain of multi-objective optimization for forest stand structures (MOFSSs), the quest for effective solutions remains paramount. Although machine learning techniques like random forests and decision trees find applications in forest management, their research scope regarding dynamic multi-objective forest structure optimization remains limited. While deep learning excels in handling extensive data and complex pattern recognition, its practical application in forest management faces constraints related to environmental interaction and reward function design.

While certain machine learning methods such as random forests and decision trees have been applied in forest management scenarios, their exploration specifically concerning dynamic multi-objective forest structure optimization remains restricted in the literature. Our extensive review did not reveal substantial investigations employing these machine learning methods to address the intricate challenges of multi-objective forest stand structure optimization. This highlights a notable research gap in utilizing machine learning techniques for effectively handling complex multi-objective optimization issues within forest structures. Similarly, deep learning’s proficiency in managing extensive datasets and intricate pattern recognition has not translated effectively into practical applications within forest management due to inherent challenges in environmental interaction and reward function design.

As of now, a universally acknowledged, efficient approach to address MOFSSs remains elusive. To overcome the constraints of the current algorithms and leverage the inherent advantages of RL in multi-objective optimization, this study introduces a pioneering approach rooted in RL principles to address the complexities of optimizing multi-objective FSSs. Simulated felling experiments were conducted using data obtained from circular sample plots in the southwestern region of China within secondary forests of Pinus yunnanensis.

The findings underscore the promising potential of RL in optimizing FSSs, indicating its capability to mitigate prevailing challenges in widespread application. Particularly pertinent to the management and optimization of Pinus yunnanensis secondary forests in southwest China, this study offers crucial guidance and direction. The outcomes of this study are not confined to a specific region or tree species; their universality offers valuable insights for addressing similar forest ecosystem management issues in other regions or with different tree species.

4.1. Results of Simulated Selective Cutting Optimization

In terms of FSS, simulated selective cutting led to improved horizontal distribution patterns, enhanced the spatial segregation of tree species, reduced the competitive pressure among trees, enriched vertical tree structures, and improved understory light conditions [66,68,69]. While maintaining the number of tree diameter classes and species and adhering to cutting intensity and canopy density constraints, the post-cutting forest stand structure surpassed the pre-cutting condition [70,71,72]. This underscores the feasibility of applying RL to solve MOFSS and the potential of using RL to enhance FSS. The results also demonstrate the efficacy of both MC and PSO in addressing MOFSS, aligning with findings from previous research [20,54,65,73,74].

Due to the limited availability of current data, this study has relied on existing research findings and the insights of previous researchers [61,62]. In the simulated selection felling experiment with Pinus yunnanensis conducted in four sample plots, the selection intensity was uniformly set not to exceed 35% of the total tree count within each plot. However, it is important to note that varying site conditions and forest stand structures may lead to different optimal selection limits across different plots [67,75].

In the subsequent phases of our research, we intend to address this gap by conducting additional experiments to gather more relevant data. Our goal is to refine and tailor the selection intensity limits for each individual sample plot, taking into consideration the specific site characteristics and forest composition.

4.2. Algorithm Performance

Based on the results obtained from the simulated selective cutting experiments conducted in the four sample plots, the RL optimization schemes consistently yielded higher maximum VOFs (0.4121, 0.4078, 0.5047, 0.5635, respectively) compared to the MC class (0.3892, 0.3983, 0.4887, 0.5537, respectively) and the PSO class (0.4010, 0.3877, 0.4729, 0.5593). Notably, under similar VOFs, the RL algorithm exhibited slightly faster convergence compared to the MC and PSO algorithms. The optimization results for sample plots P1, P3, and M6 in P4 demonstrate that the MC scheme has a slower convergence speed, while the results for sample plots P1, P3, and P4 indicate that the PSO algorithm is prone to becoming stuck in ‘local optima’. In summary, the RL algorithm proves to be competitive with traditional approaches like the MC and PSO algorithms for solving the MOFSS.

In the simulated harvesting experiments, the RL optimization scheme did not exhibit significant advantages in terms of time complexity and space complexity [76,77,78] when compared with the MC and PSO classes. Our future research will focus on enhancing the efficiency of utilizing RL to solve the MOFSS. We aim to bridge the gap between RL and existing optimization methods for FSS in terms of time and space complexity.

4.3. Influence of Proposed Selective Felling Tree Determination Methods

To explore the impact of the proposed selection methods on experimental outcomes, this study simulates the selection of trees from the set of selected trees generated by the three proposed methods during the model-solving process. The experimental results indicate that optimization schemes combining the RSM with either the MC or RL algorithms (M1, M7) achieve superior results compared to schemes combining the QVM or VMM with the MC or RL algorithms (M1 > M2 > M3, M7 > M8 > M9). Moreover, the fitness level of schemes utilizing the QVM or VMM in combination with the PSO algorithm is higher (M5 > M4, M6 > M4).

Particularly, the M7 optimization scheme, which combines the RSM with the RL algorithm, exhibits an average enhancement of 26.81% in VOFs across the four sample plots. This improvement is significantly higher than the average enhancement observed in all schemes (17.42%). These results suggest that the RSM is highly adaptable and effective in conjunction with RL.

These findings may be attributed to the number of proposed felling trees generated by each method and the characteristics of the algorithms for solving the MOP. RSM generates the most proposed felling trees, while QVM and VMM yield relatively fewer. MC involves randomness in its estimations [79,80,81], while RL continuously improves effectiveness through experimentation and learning [42,82]. Larger sets of proposed felling trees provide more samples for training and enhancing experimental results in MC and RL [38,81,83,84]. PSO is more suitable for optimizing a relatively small number of trees chosen based on QVM or VMM, helping avoid local optimization [85,86].

The variation in proposed felling trees generated by each method and the characteristics of the algorithms for solving the MOP likely contribute to these findings. Relative to the others, the RSM generates a higher number of proposed felling trees, while the QVM and VMM produce comparatively fewer selections. It is worth noting that MC involves a stochastic element in its estimations, introducing randomness [80,81,87], whereas RL continually refines its performance through iterative experimentation and learning [38,42,88].

Larger sets of proposed felling trees inherently offer more training samples, thus potentially enhancing the experimental outcomes in both MC and RL [38,81,83]. On the other hand, PSO is better suited for optimizing a relatively modest number of trees chosen using QVM or VMM, which aids in avoiding issues related to local optimization [89,90]. The relatively favorable experimental outcomes (M5 > M4, M6 > M4) of the schemes that combine the PSO algorithm with the QVM or VMM provide indirect confirmation of this perspective. However, reducing the sample size might result in the omission of potential global optimal solutions and weaken the global search capability of the PSO algorithm [91]. Consequently, besides the quest for a more suitable method for determining proposed tree cutting within PSO, it is of paramount importance for us to comprehensively consider the characteristics inherent to the challenges of multi-objective optimization in the context of forest stand structural optimization. Further investigation is warranted to explore strategies for mitigating the issue of PSO becoming trapped in local optima when addressing such challenges.

Based on the optimal reinforcement learning scheme, we have also developed a harvested tree information table (Table 6) that provides strong support for real-world forest stand management decisions [54,65,92]. This table integrates scientific algorithms while addressing the practical needs of multi-objective optimization for forest stand structure. By implementing these optimization results in practical scenarios, we effectively achieve sustainable forest stand management, highlighting the practical application value of this study in the field of forest resource management.

Compared to some existing methods, this approach demonstrates higher solution efficiency; however, there remain several areas for continuous improvement:

(1) This study exclusively employs the fundamental Q-learning algorithm in RL. Exploring and validating more advanced algorithms for solving the MOP of FSS that offer improved solution efficiency is a promising avenue for further research.

(2) Due to the constraints posed by sample site conditions, the current study’s sample size for each site primarily ranges between 200 and 500. In-depth investigation and research are needed to address the efficiency of solving the MOP of FSS using the RL algorithm with larger and even super-large sample sizes.

(3) Achieving an optimized FSS is not a one-step process; it requires multiple adjustments to converge to an ideal state. The optimization objective is dynamic and adaptable. During each adjustment, the direction and goal of optimization should be precisely defined based on the forest stand’s specific circumstances and operational objectives.

(4) Replanting is a relative strategy to selective cutting in the FSS optimization process. Sustainable forest management necessitates the integration of selective logging and replanting principles. Investigating an optimization model that incorporates replanting in FSS optimization using the RL algorithm is an important topic for future research.

5. Conclusions

The introduction of reinforcement learning (RL) to address multi-objective optimization for forest stand structure (MOFSS) is a pioneering step, marking the first application of RL in forest stand structure (FSS) optimization research. By translating the tree selection process in FSS into RL-based intelligent agent actions, this research has successfully revolutionized the approach to solving FSS multi-objective problems. This pioneering use of RL showcases its potential to transform conventional methodologies in FSS optimization practices.

Extensive experimental comparisons unequivocally confirm RL’s exceptional effectiveness in MOFSS. The RL scheme achieves significantly higher values of the objective function (VOFs) across various plots compared to the Monte Carlo (MC) and particle swarm optimization (PSO) methods, quantifying RL’s excellence and establishing its practical relevance in forest management.

Three RL schemes (M7, M8, and M9) based on different tree selection methods have been introduced and rigorously analyzed. Among these, the M7 scheme, combining the random selection method (RSM) with RL, emerges as a superior performer across the four sample plots. This comprehensive analysis provides valuable insights into practical MOFSS solutions, emphasizing RL’s compatibility with RSM.

The empirical validation of the optimal RL-based scheme for generating felling tables strongly reinforces its support for real-world stand structure optimization decisions. This emphasizes the practical significance of integrating RL into FSS strategies. The results of this study transcend specific regional or tree species boundaries; their universality provides invaluable insights for tackling analogous forest ecosystem management challenges in diverse regions or with varying tree species.

Author Contributions

Conceptualization, S.X. and J.W.; methodology, S.X. and J.W.; software, S.X.; validation, S.X.; formal analysis, S.X. and Y.C.; investigation, S.X. and J.W.; resources, J.W.; data curation, S.X. and Y.C.; writing—original draft preparation, S.X.; writing—review and editing, S.X. and J.W.; visualization, S.X. and Y.C.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 32001313), Yunnan Fundamental Research Projects (grant number 202201AT070006), Yunnan Postdoctoral Research Fund Projects (grant number ynbh20057), and Fundamental Research Joint Special Youth Project of Local Undergraduate Universities in Yunnan Province (grant number 2018FH001-106).

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The pseudocode detailing the application of reinforcement learning (RL) to multi-objective forest stand structure optimization is provided as follows. Here, S represents states, A represents actions,

ϵ

signifies the exploration rate for the

ϵ

-greedy policy,

α

is the learning rate,

γ

denotes the discount factor, and

M A X E P I S O D E S

stands for the maximum number of episodes. The algorithm iteratively updates the Q-values using the Q-learning formula while considering different states and actions and their corresponding rewards (R) based on the observed outcomes. We trained the agent by designing a suitable reward function to facilitate quicker and more accurate identification of trees that should be harvested. Subsequently clearing these identified trees from the stand resulted in a substantial improvement in the overall quality of the forest structure within the sample plot (Algorithm A1).

Algorithm A1: RL for MOFSS

Appendix B

Appendix B.1

This figure illustrates the stacked plot of the best results attained by each optimization scheme in optimizing FSS across different sample sites. We believe that this figure helps to deepen the reader’s understanding of the advantages of reinforcement learning in multi-objective optimization of forest stand structures.

Figure A1. The stacked representation of optimization effects for each scenario. Note: This figure demonstrates the influence of methods for determining proposed selective cutting trees on optimization results.

Appendix B.2

The following figure supplements Table 5, providing a detailed overview of the objective function values achieved by various optimization approaches over different runtime intervals in the simulated multi-objective forest stand structure experiments. Analyzing the relationship between runtime and values of the objective function (VOFs) across different scenarios (as depicted in Figure A2a,c), it becomes evident that Scheme M7, based on reinforcement learning (RL), consistently achieves the highest VOFs in the shortest time. Moreover, in Figure A2b,d, where the attained VOFs are comparable, Scheme M7, driven by RL, also exhibits relatively shorter running times. Therefore, collectively, Scheme M7 based on reinforcement learning outperforms other schemes concerning both achieved VOFs and time complexity.

Figure A2. Convergence status of individual optimization strategies across varied sites on time scales Note: This figure illustrates the convergence behavior of each optimization scheme (M1–9) during a simulated selective cutting experiment conducted on four sample sites (P1–4).

References

Sun, H.L.; Li, S.C.; Xiong, W.L.; Yang, Z.R.; Cui, B.S. Influence of slope on root system anchorage of Pinus yunnanensis. Ecol. Eng. 2008, 32, 60–67. [Google Scholar] [CrossRef]
Jin, Z.; Peng, J. Yunnan Pine (Pinus yunnanensis franch.); Yunnan Science and Technology Press: Kunming, China, 2004; pp. 1–66. [Google Scholar]
Xu, Y.; Woeste, K.; Cai, N.; Kang, X.; Li, G.; Chen, S.; Duan, A. Variation in needle and cone traits in natural populations of Pinusyunnanensis. J. For. Res. 2015, 27, 41–49. [Google Scholar] [CrossRef]
Sinicae, I.B.K.A. Flora Yunnanica. Tomus 6. (Spermatophyta); Wu, C.Y., Ed.; Science Press: Beijing, China, 1995; p. 759. [Google Scholar]
Deng, W.; Xiang, W.; Ouyang, S.; Hu, Y.; Chen, L.; Zeng, Y.; Deng, X.; Zhao, Z.; Forrester, D.I. Spatially explicit optimization of the forest management tradeoff between timber production and carbon sequestration. Ecol. Indic. 2022, 142, 109193. [Google Scholar] [CrossRef]
Jiang, L.; Lu, Y.C.; Liao, S.X.; Li, K.; Li, G.Q. A study on diameteral structure of Yunnan pine forest in the plateaus of mid-Yunnan Province. For. Res. 2008, 21, 126–130. [Google Scholar]
Zaizhi, Z. Status and perspectives on secondary forests in tropical China. J. Trop. For. Sci. 2001, 2001, 639–651. [Google Scholar]
Li, L.F.; Han, M.Y.; Zheng, W.; Su, J.W.; Li, W.C.; Zheng, S.H.; Gong, J.B. The causes of formation of low quality forest of Pinus yunnanensis and their classification. J. West China For. Sci. 2009, 38, 94–99. [Google Scholar]
Yuan, R.; Yang, S.; Wang, B. Study on the altitudinal pattern of vegetation distribution along the eastern slope of Cangshan Mountain, Yunnan. China. J. Yunnan Univ. 2008, 30, 318–325. [Google Scholar]
de Lima, R.A.F.; Batista, J.L.F.; Prado, P.I. Modeling tree diameter distributions in natural forests: An evaluation of 10 statistical models. For. Sci. 2015, 61, 320–327. [Google Scholar] [CrossRef]
Abdollahnejad, A.; Panagiotidis, D.; Surovỳ, P. Forest canopy density assessment using different approaches—Review. J. For. Sci. 2017, 63, 107–116. [Google Scholar] [CrossRef]
Petritan, A.M.; Biris, I.A.; Merce, O.; Turcu, D.O.; Petritan, I.C. Structure and diversity of a natural temperate sessile oak (Quercus petraea L.)—European Beech (Fagus sylvatica L.) forest. For. Ecol. Manag. 2012, 280, 140–149. [Google Scholar] [CrossRef]
Bettinger, P.; Tang, M. Tree-level harvest optimization for structure-based forest management based on the species mingling index. Forests 2015, 6, 1121–1144. [Google Scholar] [CrossRef]
Bao, Q.; Liang, X.W.; Weng, G.S. The Concept of Tree Crown Overlap and its Calculating Methods of Area. J. Northeast For. Univ. 1995, 23, 103–109. [Google Scholar]
Wang, J.M. Study on Decision Technology of Tending Felling for Larix Principis-Rupprechtii Plantation Forest. Ph.D. Thesis, Beijing Forestry University, Beijing, China, 2017. [Google Scholar]
Zhang, G.; Hui, G.; Zhao, Z.; Hu, Y.; Wang, H.; Liu, W.; Zang, R. Composition of basal area in natural forests based on the uniform angle index. Ecol. Inform. 2018, 45, 1–8. [Google Scholar] [CrossRef]
Wan, P.; Zhang, G.; Wang, H.; Zhao, Z.; Hu, Y.; Zhang, G.; Hui, G.; Liu, W. Impacts of different forest management methods on the stand spatial structure of a natural Quercus aliena var. acuteserrata forest in Xiaolongshan, China. Ecol. Inform. 2019, 50, 86–94. [Google Scholar] [CrossRef]
Yong, L.; Hao, Z.; Xianjun, W.; Zhiang, D.; Jianjun, L. Storey structure study of Cyclobalanopsis myrsinaefolia mixed stand based on storey index. For. Resour. Manag. 2012, 81–84. [Google Scholar]
Joshi, C.; De Leeuw, J.; Skidmore, A.K.; Van Duren, I.C.; Van Oosten, H. Remotely sensed estimation of forest canopy density: A comparison of the performance of four methods. Int. J. Appl. Earth Obs. Geoinf. 2006, 8, 84–95. [Google Scholar] [CrossRef]
Tang, M.P.; Tang, S.Z.; Lei, X.D.; Li, X.F. Study on Spatial Structure Optimizing Model of Stand Selection Cutting. Sci. Silvae Sin. 2004, 40, 25–31. [Google Scholar]
Yu, Y.T. Study on Forest Structure of Different Recovery Stages and Optimization Models of Natural Mixed Spruce-fir Secondary Forests on Selective Cutting. Ph.D. Thesis, Beijing Forestry University, Beijing, China, 2019. [Google Scholar]
Hao, Y.L. Study on Cutting Tree Determining Method Based on Forest Stand Spatial Structure Optimization. Ph.D. Thesis, Chinese Academy of Forestry, Beijing, China, 2012. [Google Scholar]
Nelson, J.; Finn, S. The influence of cut-block size and adjacency rules on harvest levels and road networks. Can. J. For. Res. 1991, 21, 595–600. [Google Scholar] [CrossRef]
Haight, R.G.; Travis, L.E. Wildlife conservation planning using stochastic optimization and importance sampling. For. Sci. 1997, 43, 129–139. [Google Scholar]
Barrett, T.; Gilless, J.; Davis, L. Economic and fragmentation effects of clearcut restrictions. For. Sci. 1998, 44, 569–577. [Google Scholar]
Boston, K.; Bettinger, P. An analysis of Monte Carlo integer programming, simulated annealing, and tabu search heuristics for solving spatial harvest scheduling problems. For. Sci. 1999, 45, 292–301. [Google Scholar]
Okasha, N.M.; Frangopol, D.M. Lifetime-oriented multi-objective optimization of structural maintenance considering system reliability, redundancy and life-cycle cost using GA. Struct. Saf. 2009, 31, 460–474. [Google Scholar] [CrossRef]
Singh, H.K.; Ray, T.; Smith, W. Surrogate assisted simulated annealing (SASA) for constrained multi-objective optimization. In Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
Cakir, B.; Altiparmak, F.; Dengiz, B. Multi-objective optimization of a stochastic assembly line balancing: A hybrid simulated annealing algorithm. Comput. Ind. Eng. 2011, 60, 376–384. [Google Scholar] [CrossRef]
Liu, C.A. Research on Dynamic Multiobjective Optimization Evolutionary Algorithms. Nat. Sci. J. Hainan Univ. 2010, 28, 176–182. [Google Scholar]
Zhang, Y.; Gong, D.w.; Ding, Z.h. Handling multi-objective optimization problems with a multi-swarm cooperative particle swarm optimizer. Expert Syst. Appl. 2011, 38, 13933–13941. [Google Scholar] [CrossRef]
Hu, C.Y.; Yao, H.; Yan, X.S. Multiple Particle Swarms Coevolutionary Algorithm for Dynamic Multi-Objective Optimization Problems and Its Application. J. Comput. Res. Dev. 2013, 50, 1313–1323. [Google Scholar]
Biswas, D.; Panja, S.; Guha, S. Multi objective optimization method by PSO. Procedia Mater. Sci. 2014, 6, 1815–1822. [Google Scholar] [CrossRef]
Gunantara, N. A review of multi-objective optimization: Methods and its applications. Cogent Eng. 2018, 5, 1502242. [Google Scholar] [CrossRef]
Moore, C.T.; Conroy, M.J.; Boston, K. Forest management decisions for wildlife objectives: System resolution and optimality. Comput. Electron. Agric. 2000, 27, 25–39. [Google Scholar] [CrossRef]
Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
Wiering, M.A.; Van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Abels, A.; Roijers, D.; Lenaerts, T.; Nowé, A.; Steckelmacher, D. Dynamic weights in multi-objective deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 11–20. [Google Scholar]
Zou, F.; Yen, G.G.; Tang, L.; Wang, C. A reinforcement learning approach for dynamic multi-objective optimization. Inf. Sci. 2021, 546, 815–834. [Google Scholar]
Anderson, R.N.; Boulanger, A.; Powell, W.B.; Scott, W. Adaptive stochastic control for the smart grid. Proc. IEEE 2011, 99, 1098–1115. [Google Scholar] [CrossRef]
Glavic, M.; Fonteneau, R.; Ernst, D. Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine 2017, 50, 6918–6927. [Google Scholar] [CrossRef]
Haksar, R.N.; Schwager, M. Distributed deep reinforcement learning for fighting forest fires with a network of aerial robots. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 December 2018; pp. 1067–1074. [Google Scholar]
Moussa, N.; Nurellari, E.; Azbeg, K.; Boulouz, A.; Afdel, K.; Koutti, L.; Salah, M.B.; El Alaoui, A.E.B. A reinforcement learning based routing protocol for software-defined networking enabled wireless sensor network forest fire detection. Future Gener. Comput. Syst. 2023, 149, 478–493. [Google Scholar] [CrossRef]
Tahvonen, O.; Suominen, A.; Malo, P.; Viitasaari, L.; Parkatti, V.P. Optimizing high-dimensional stochastic forestry via reinforcement learning. J. Econ. Dyn. Control. 2022, 145, 104553. [Google Scholar] [CrossRef]
Malo, P.; Tahvonen, O.; Suominen, A.; Back, P.; Viitasaari, L. Reinforcement learning in optimizing forest management. Can. J. For. Res. 2021, 51, 1393–1409. [Google Scholar] [CrossRef]
Shi, Y.; Xu, L.; Zhou, Y.; Ji, B.; Zhou, G.; Fang, H.; Yin, J.; Deng, X. Quantifying driving factors of vegetation carbon stocks of Moso bamboo forests using machine learning algorithm combined with structural equation model. For. Ecol. Manag. 2018, 429, 406–413. [Google Scholar] [CrossRef]
Liu, Z.; Peng, C.; Work, T.; Candau, J.N.; DesRochers, A.; Kneeshaw, D. Application of machine-learning methods in forest ecology: Recent progress and future challenges. Environ. Rev. 2018, 26, 339–350. [Google Scholar] [CrossRef]
Pang, Y.; Li, Y.; Feng, Z.; Feng, Z.; Zhao, Z.; Chen, S.; Zhang, H. Forest fire occurrence prediction in China based on machine learning methods. Remote Sens. 2022, 14, 5546. [Google Scholar] [CrossRef]
Silva, J.P.M.; da Silva, M.L.M.; de Mendonça, A.R.; da Silva, G.F.; de Barros Junior, A.A.; da Silva, E.F.; Aguiar, M.O.; Santos, J.S.; Rodrigues, N.M.M. Prognosis of forest production using machine learning techniques. Inf. Process. Agric. 2021, 10, 71–84. [Google Scholar]
Neuville, R.; Bates, J.S.; Jonard, F. Estimating forest structure from UAV-mounted LiDAR point cloud using machine learning. Remote Sens. 2021, 13, 352. [Google Scholar] [CrossRef]
Hamedianfar, A.; Mohamedou, C.; Kangas, A.; Vauhkonen, J. Deep learning for forest inventory and planning: A critical review on the remote sensing approaches so far and prospects for further applications. Forestry 2022, 95, 451–465. [Google Scholar] [CrossRef]
Sani-Mohammed, A.; Yao, W.; Heurich, M. Instance segmentation of standing dead trees in dense forest from aerial imagery using deep learning. ISPRS Open J. Photogramm. Remote Sens. 2022, 6, 100024. [Google Scholar] [CrossRef]
Liu, S. Research on the Analysis and Multi-objective Intelligent Optimization of Stand Structure of Natural Secondary Forest. Ph.D. Thesis, Central South University of Forestry and Technology, Changsha, China, 2017. [Google Scholar]
Liao, J.; Li, Z.; Quets, J.J.; Nijs, I. Effects of space partitioning in a plant species diversity model. Ecol. Model. 2013, 251, 271–278. [Google Scholar] [CrossRef]
Magnussen, S.; Allard, D.; Wulder, M.A. Poisson Voronoi tiling for finding clusters in spatial point patterns. Scand. J. For. Res. 2006, 21, 239–248. [Google Scholar] [CrossRef]
Sterner, R.W.; Ribic, C.A.; Schatz, G.E. Testing for life historical changes in spatial patterns of four tropical tree species. J. Ecol. 1986, 74, 621–633. [Google Scholar] [CrossRef]
Hui, G.Y.; Hu, Y.B.; Zhao, Z.H. Research progress of structure-based forest management. For. Res. 2018, 31, 85–93. [Google Scholar]
Liu, S.; Zhang, J.; Li, J.J.; Zhou, G.X.; Wu, S.C. Edge Correction of Voronoi Diagram in Forest Spatial Structure Analysis. Sci. Silvae Sin. 2017, 53, 28–37. [Google Scholar]
Xiang, B. Study on the Structure Adjustment and Optimization of Quercus Secondary Forest in Hunan Province. Master’s Thesis, Central South University of Forestry and Technology, Changsha, China, 2019. [Google Scholar]
Su, J.W.; Li, L.F.; Zheng, W.; Yang, W.B.; Han, M.Y.; Huang, Z.M.; Xu, P.B.; Feng, Z.W. Effect of Intermediate Cutting Intensity on Growth of Pinus yunnanensis Plantation. J. West China For. Sci. 2010, 39, 27–32. [Google Scholar]
Han, M.Y.; Li, L.F.; Zheng, W.; Su, J.W.; Li, W.C.; Gong, J.B.; Zheng, S.H. Effects of different intensity of thinning on the improvement of middle-aged Yunnan pine stand. J. Cent. South Univ. For. Technol. 2011, 31, 27–33. [Google Scholar]
Zhang, G.; Hui, G.; Hu, Y.; Zhao, Z.; Guan, X.; von Gadow, K.; Zhang, G. Designing near-natural planting patterns for plantation forests in China. For. Ecosyst. 2019, 6, 1–13. [Google Scholar] [CrossRef]
Johann, K. Der A-Wert ein objektiver Parameter zur Bestimmung der Freistellungsstärke von Zentralbäumen. Dtsch. Verb. Forstl. Forschungsanstalten–Sekt. Ertragskunde 1982, 25, 27. [Google Scholar]
Huang, J. The Research on Dynamic Multi-objectives Optimization of Forest Space Based on multi-swarm PSO Algorithm. Master’s Thesis, Central South University of Forestry and Technology, Changsha, China, 2017. [Google Scholar]
Li, J.J.; Li, J.P.; Liu, S.Q.; Zhang, H.W.; Feng, X.L. Homogeneity Index of Stand Spatial Structure of Mangrove Ecological System. Sci. Silvae Sin. 2010, 46, 6–14. [Google Scholar]
Qing, D.; Peng, J.; Li, J.; Liu, S.; Deng, Q. Genetic algorithm to solve the optimization problem of stand spatial structure. J. For. Environ. 2022, 42, 434–441. [Google Scholar]
Poudyal, B.H.; Maraseni, T.; Cockfield, G. Evolutionary dynamics of selective logging in the tropics: A systematic review of impact studies and their effectiveness in sustainable forest management. For. Ecol. Manag. 2018, 430, 166–175. [Google Scholar] [CrossRef]
Ehbrecht, M.; Schall, P.; Ammer, C.; Seidel, D. Quantifying stand structural complexity and its relationship with forest management, tree species diversity and microclimate. Agric. For. Meteorol. 2017, 242, 1–9. [Google Scholar] [CrossRef]
Silva, P.H.d.; Gomide, L.R.; Figueiredo, E.O.; Carvalho, L.M.T.d.; Ferraz-Filho, A.C. Optimal selective logging regime and log landing location models: A case study in the Amazon forest. Acta Amaz. 2018, 48, 18–27. [Google Scholar] [CrossRef]
Cazzolla Gatti, R.; Castaldi, S.; Lindsell, J.A.; Coomes, D.A.; Marchetti, M.; Maesano, M.; Di Paola, A.; Paparella, F.; Valentini, R. The impact of selective logging and clearcutting on forest structure, tree diversity and above-ground biomass of African tropical forests. Ecol. Res. 2015, 30, 119–132. [Google Scholar] [CrossRef]
Li, J.J.; Liu, S.; Zhang, H.R.; Kuang, Z.F.; Wang, C.L.; Zhang, J.; Cao, X.P. Heterogeneity evaluation of forest ecological system spatial structure in Dongting Lake. Acta Ecol. Sin. 2013, 33, 3732–3741. [Google Scholar]
Li, N. The Research on Dynamic multi-objective optimization model of forest structure under CMIP5 model. Master’s Thesis, Central South University of Forestry and Technology, Changsha, China, 2019. [Google Scholar]
Chen, Y.T.; Chang, C.T. Multi-coefficient goal programming in thinning schedules to increase carbon sequestration and improve forest structure. Ann. For. Sci. 2014, 71, 907–915. [Google Scholar] [CrossRef]
Zhou, C.; Wu, Z.; Zhou, X.; Zeng, H.; Zeng, L. Impact of Cutting Intensity on Forest Stand Growth and Understand Species Diversity of the Mixed Plantations of Cunninghamia lancelata and Broadleaved Trees. For. Eng. 2023, 39, 46–53. [Google Scholar]
Alsuwaiyel, M.H. Algorithms: Design Techniques and Analysis; World Scientific: Singapore, 2021; Volume 15. [Google Scholar]
Zenil, H. A review of methods for estimating algorithmic complexity: Options, challenges, and new directions. Entropy 2020, 22, 612. [Google Scholar] [CrossRef] [PubMed]
Oliveto, P.S.; Witt, C. Improved time complexity analysis of the simple genetic algorithm. Theor. Comput. Sci. 2015, 605, 21–41. [Google Scholar] [CrossRef]
Roy, V. Convergence diagnostics for markov chain monte carlo. Annu. Rev. Stat. Its Appl. 2020, 7, 387–412. [Google Scholar] [CrossRef]
Sobol, I.M. A Primer for the Monte Carlo Method; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Zio, E.; Zio, E. Monte Carlo Simulation: The Method; Springer: Cham, Switzerland, 2013. [Google Scholar]
Szepesvári, C. Algorithms for Reinforcement Learning; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Shresthamali, S.; Kondo, M.; Nakamura, H. Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning. J. Low Power Electron. Appl. 2022, 12, 53. [Google Scholar] [CrossRef]
Fang, M.; Li, Y.; Cohn, T. Learning how to active learn: A deep reinforcement learning approach. arXiv 2017, arXiv:1708.02383. [Google Scholar]
Jain, M.; Saihjpal, V.; Singh, N.; Singh, S.B. An overview of variants and advancements of PSO algorithm. Appl. Sci. 2022, 12, 8392. [Google Scholar] [CrossRef]
Jordehi, A.R. Enhanced leader PSO (ELPSO): A new PSO variant for solving global optimisation problems. Appl. Soft Comput. 2015, 26, 401–417. [Google Scholar] [CrossRef]
Rubinstein, R.Y.; Kroese, D.P. Simulation and the Monte Carlo Method; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Botvinick, M.; Ritter, S.; Wang, J.X.; Kurth-Nelson, Z.; Blundell, C.; Hassabis, D. Reinforcement learning, fast and slow. Trends Cogn. Sci. 2019, 23, 408–422. [Google Scholar] [CrossRef]
Garg, H. A hybrid PSO-GA algorithm for constrained optimization problems. Appl. Math. Comput. 2016, 274, 292–305. [Google Scholar] [CrossRef]
Das, H.; Jena, A.K.; Nayak, J.; Naik, B.; Behera, H. A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for classification. In Computational Intelligence in Data Mining-Volume 2: Proceedings of the International Conference on CIDM, 20–21 December 2014; Springer: New Delhi, India, 2015; pp. 461–471. [Google Scholar]
Bansal, J.C. Particle swarm optimization. In Evolutionary and Swarm Intelligence Algorithms; Springer: Cham, Switzerland, 2019; pp. 11–23. [Google Scholar]
Stephens, S.L.; Lydersen, J.M.; Collins, B.M.; Fry, D.L.; Meyer, M.D. Historical and current landscape-scale ponderosa pine and mixed conifer forest structure in the Southern Sierra Nevada. Ecosphere 2015, 6, 1–63. [Google Scholar] [CrossRef]

Figure 1. Description of study sites: Cangshan Mountain, Yunnan Province, southwest China. P1–P4 designate the plot locations.

Figure 2. Enhancing precision in sample plot edge correction using the buffer zone method. Note: The cells represent the effective range around each tree within the plot, while the dots represent the location of trees. a represents the diameter of the plot, and b stands for the buffer zone width, with both units measured in meters. The determination of the buffer zone width is influenced by multiple factors. When partitioning buffer zones for plot adjustments, careful consideration should be given to plot size, analysis methods for stand structure indices, and geographical location. This study thoroughly takes into account the aforementioned influencing factors and, based on the existing research and experience [60], sets the buffer zone width at 2 m.

Figure 3. Illustration of the random selection method (RSM) process flow.

Figure 4. Illustration of the V-map method (VMM) process flow.

Figure 5. Flowchart for the dissolution process of MC in MOFSS.

Figure 6. Flowchart for the dissolution process of PSO in MOFSS.

Figure 7. Flowchart for the dissolution process of RL in MOFSS.

Figure 8. Alterations in stand structure indices for different optimization scenarios in different sites. Note: Horizontal coordinates represent stand structure indices of various optimization schemes after selective cutting: D is the number of diameter classes in the forest stand after selective cutting, T is the number of tree species in the forest stand after selective cutting,

C D

is the canopy density of the forest stand after selective cutting, W is the difference between the uniform angle index and the median value of the range of random distributions in the forest stand after selective cutting,

C I

is the crown competition index after selective cutting, M is the mingling index after selective cutting, S is the storey index after selective cutting,

O P

is the open comparison after selective cutting, VOF is the objective function value after selective cutting. Within our study, we have employed three distinct selective logging methods, along with three corresponding model-solving algorithms. These combinations have resulted in the formulation of nine unique scenarios for optimizing stand structure, denoted M1 to M9. The specific correspondence between these nine scenarios is meticulously detailed in Table 2. The vertical coordinate is the magnitude of change in each index before and after selective cutting.

Figure 8. Alterations in stand structure indices for different optimization scenarios in different sites. Note: Horizontal coordinates represent stand structure indices of various optimization schemes after selective cutting: D is the number of diameter classes in the forest stand after selective cutting, T is the number of tree species in the forest stand after selective cutting,

C D

is the canopy density of the forest stand after selective cutting, W is the difference between the uniform angle index and the median value of the range of random distributions in the forest stand after selective cutting,

C I

is the crown competition index after selective cutting, M is the mingling index after selective cutting, S is the storey index after selective cutting,

O P

is the open comparison after selective cutting, VOF is the objective function value after selective cutting. Within our study, we have employed three distinct selective logging methods, along with three corresponding model-solving algorithms. These combinations have resulted in the formulation of nine unique scenarios for optimizing stand structure, denoted M1 to M9. The specific correspondence between these nine scenarios is meticulously detailed in Table 2. The vertical coordinate is the magnitude of change in each index before and after selective cutting.

Figure 9. Optimal outcomes for each optimization strategy. Note: M1–9 represent nine distinct scenarios for optimizing stand structure, with the specific compositions of these scenarios provided in Table 2.

Figure 10. Convergence status of individual optimization strategies across varied sites. Note: This figure illustrates the convergence behavior of each optimization scheme (M1–9) during a simulated selective cutting experiment conducted on four sample sites (P1–4).

Table 1. Key information about sample plot attributes.

Sample Plots	East Long.	North Lat.	Elevation (m)	Slope ( $^{\circ}$ )	Slope Dir.	Sample Plot Radius (m)	Tree Species Composition	Stand Density (Trees/ha)
P1	100°08.2149 $^{″}$	25°41.5280 $^{″}$	2254	13.45	East	35	8 PY-2 PA-BA-TG	1481
P2	100°10.9639 $^{″}$	25°38.1518 $^{″}$	2271	16.15	South	32	7 PY-3 PA	1822
P3	100°09.3947 $^{″}$	25°39.9506 $^{″}$	2195	17.70	NE	20	7 PY-3 PA-QA-VB GG-BA	1830
P4	100°07.1906 $^{″}$	25°43.5923 $^{″}$	2138	5.10	NE	19	10 PY-QA	1975

Note: NE, northeast; PY, Pinus yunnanensis; PA, Pinus armandii; QA, Quercus acutissima; VB, Vaccinium bracteatum; GG, Gaultheria griffithiana; BA, Betula alnoide; TG, Ternstroemia gymnanthera; DBH, diameter at breast height. The column ’Tree Species Composition’ represents the quantity of tree species within each plot. The numerical values preceding each tree species indicate the proportion of that species for every 10 trees within the plot.

Table 2. Nine optimization schemes for FSS.

	RSM	QVM	VMM
MC	r-MC(M1)	Q-MC(M2)	V-MC(M3)
PSO	r-PSO(M4)	Q-PSO(M5)	V-PSO(M6)
RL	r-RL(M7)	Q-RL(M8)	V-RL(M9)

Note: The abbreviations correspond to the following methods and algorithms: “r” denotes random selection method (RSM), “Q” represents Q-value method (QVM), “V” signifies V-map method (VMM), “MC” stands for Monte Carlo algorithm, “PSO” refers to particle swarm optimization algorithm, and “RL” indicates reinforcement learning algorithm. These three methodologies for devising the proposed selection felling approach are paired with three distinct algorithms for solving the model, resulting in the generation of a total of nine schemes for MOFSSs.

Table 3. Optimizing algorithm parameter configuration.

Algorithms	Parameters and Parameter Values	Value Meaning
MC	$I = 0$	Initial iteration
	$I_{m a x}$ = 10,000	Maximum number of iterations
	$U = 0$	Initial number of consecutive iterations without OF improvement
	$U_{m a x} = 500$	Maximum number of consecutive iterations without OF improvement
PSO	$I = 0$	Initial iteration
	$I_{m a x}$ = 10,000	Maximum number of iterations
	$p = 20$	Number of particles
	$w = 0.5$	Inertia weights
	$c 1 = 0.5$	Individual particle learning factor
	$c 2 = 0.5$	Global learning factor
RL	$I = 0$	Initial iteration
	$I_{m a x}$ = 10,000	Maximum number of iterations
	$s t a t e = 0$	Initial position of the agent
	$s t a t e_{m a x} = 100$	Farthest position the agent is allowed to move to
	$a = 150$ , $b = 10$ , $c = - 1$ , $d = 1$	Reward and penalty values

Table 4. The variation in VOFs before and after simulated cutting.

VOF		P1	P2	P3	P4	Average Value of Scheme	Improvement Magnitude of Scheme	Average Improvement Magnitude of Scheme
Before Cutting		0.3515	0.2814	0.3748	0.4812	0.3722
After Cutting	M1	0.3892	0.3983	0.4887	0.5537	0.4575	22.92%	17.42%
	M2	0.3858	0.3420	0.4588	0.5180	0.4261	14.48%
	M3	0.3676	0.3066	0.4144	0.5109	0.3999	7.44%
	M4	0.3975	0.3705	0.4560	0.5292	0.4383	17.76%
	M5	0.3933	0.3773	0.4729	0.5593	0.4507	21.09%
	M6	0.4009	0.3877	0.4675	0.5373	0.4484	20.47%
	M7	0.4121	0.4078	0.5047	0.5635	0.4720	26.81%
	M8	0.3916	0.3866	0.4610	0.5186	0.4395	18.08%
	M9	0.3663	0.3076	0.4144	0.5149	0.4008	7.684%
Average Value of Sample Plot		0.3894	0.3649	0.4598	0.5340

Note: The ‘Average Value of Sample Plot’ represents the average maximum objective function value achieved by the 9 optimization schemes within a specific sample plot. The ’Average Value of Scheme’ represents the mean of the maximum VOFs obtained by a particular optimization scheme across the simulated cutting experiments in the four sample plots. The ‘Improvement Magnitude of Scheme’ signifies the percentage increase in the mean of the maximum VOFs relative to the mean initial VOFs before cutting in the four sample plots for a specific optimization scheme. The ’Average Improvement Magnitude of Scheme’ denotes the percentage increase in the mean of the maximum VOFs achieved by the nine optimization schemes in all sample plots relative to the mean initial VOFs before cutting in the four sample plots.

Table 5. Running time taken to attain Mmax VOF across different sites for each strategy.

Running Time/(min)	M1	M2	M3	M4	M5	M6	M7	M8	M9
P1	7847.881	4997.854	9793.233	4198.420	3404.310	3401.141	3398.441	7394.313	3400.292
P2	4845.591	3450.688	2635.718	5053.131	4240.855	5058.874	7229.825	4242.174	2641.545
P3	7892.756	2645.365	6600.294	7424.275	5039.715	5039.071	7372.265	8207.307	5007.491
P4	12,947.977	3475.821	6615.058	6632.085	3459.199	5033.976	5035.914	4242.309	6626.676

Note: This table illustrates the time required for each optimization strategy to achieve the maximum VOF across four different sites. Each row represents a specific strategy, and each column denotes a different site. The values in the table represent the time (in minutes) taken by each strategy to reach the maximum VOF in the respective site. These findings serve as a reference for the time efficiency of each method in achieving optimization. To mitigate the complexity arising from an extended manuscript, we have relocated the detailed depiction of line graphs illustrating the runtime and VOFs of various optimization models across different simulated FSSs to Figure A2. This graph further emphasizes the advantages of RL in multi-objective optimization of FSSs.

Table 6. Partial selective cutting information.

Sample Site	Optimal Scheme	Tree Number	DBH (cm)	TH (m)	CW (m)	Tree Specie
P1	M7	76	11.90	10.40	1.43	PY
	M7	234	7.80	7.20	1.78	PA
	M7	335	11.70	15.00	1.35	PY
	M7	519	14.80	12.40	1.85	PY
	M7	638	9.70	10.52	1.15	PY
P2	M7	66	15.50	7.80	1.95	PY
	M7	224	10.50	7.50	1.43	PY
	M7	319	8.50	5.70	2.08	PA
	M7	505	5.00	4.10	1.10	PY
	M7	670	19.40	12.75	2.25	PY
P3	M7	61	22.10	15.85	1.75	PY
	M7	143	16.00	13.27	1.45	PY
	M7	160	7.30	6.40	1.26	PY
	M7	275	9.40	8.67	1.24	PY
	M7	299	13.20	7.36	2.10	PY
P4	M7	19	8.32	7.67	0.93	PY
	M7	68	7.70	6.27	1.19	PY
	M7	145	22.80	11.20	1.93	PY
	M7	194	5.62	6.50	1.58	PY
	M7	247	16.15	11.56	2.00	PY

Note: PY, Pinus yunnanensis; PA, Pinus armandii. This table presents partial selective cutting information output from different selective cutting optimization schemes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xuan, S.; Wang, J.; Chen, Y. Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China. Forests 2023, 14, 2456. https://doi.org/10.3390/f14122456

AMA Style

Xuan S, Wang J, Chen Y. Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China. Forests. 2023; 14(12):2456. https://doi.org/10.3390/f14122456

Chicago/Turabian Style

Xuan, Shuai, Jianming Wang, and Yuling Chen. 2023. "Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China" Forests 14, no. 12: 2456. https://doi.org/10.3390/f14122456

APA Style

Xuan, S., Wang, J., & Chen, Y. (2023). Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China. Forests, 14(12), 2456. https://doi.org/10.3390/f14122456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Areas

2.2. Study Site and Data Collection

2.3. Determination of Spatial Structure Units and Edge Correction

2.4. Stand Structure Indexes

Spatial Structure Indexes

2.5. Methods for Selecting Trees for Cutting

2.5.1. RSM (Random Selection Method) [20]

2.5.2. QVM (Q-Value Method) [21]

2.5.3. VMM (V-Map Method) [22]

2.6. Optimization Model for Selection Cutting of Forest Stand Structure

2.6.1. Constraints

2.6.2. Construction of the Model

2.7. Algorithms for Solving the Model

2.7.1. MC (Monte Carlo)

2.7.2. PSO (Particle Swarm Optimization)

2.7.3. RL (Reinforcement Learning)

2.7.4. Optimization Schemes

3. Results

3.1. Algorithm Parameter Settings

3.2. Results of Simulated Selective Cutting Optimization

3.3. Algorithm Performance

3.4. Influence of Proposed Selective Felling Tree Determination Methods

4. Discussion

4.1. Results of Simulated Selective Cutting Optimization

4.2. Algorithm Performance

4.3. Influence of Proposed Selective Felling Tree Determination Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix B.1

Appendix B.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI