Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation

Guo, Jiaxing; Cool, Julie; Luo, Chaoguang; Zhong, Yan; Ji, Fengfeng; Yu, Kuanjie; Qin, Ruixia; Xu, Huadong; Hu, Yanbo

doi:10.3390/f17050618

Open AccessArticle

Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation

by

Jiaxing Guo

¹

,

Julie Cool

²

,

Chaoguang Luo

³

,

Yan Zhong

¹,

Fengfeng Ji

¹,

Kuanjie Yu

¹,

Ruixia Qin

¹,

Huadong Xu

^1,*

and

Yanbo Hu

^4,*

¹

College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin 150040, China

²

Centre for Advanced Wood Processing, Department of Wood Science, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada

³

School of Management, Zhejiang University, Hangzhou 310058, China

⁴

College of Life Sciences, Northeast Forestry University, Harbin 150040, China

^*

Authors to whom correspondence should be addressed.

Forests 2026, 17(5), 618; https://doi.org/10.3390/f17050618

Submission received: 26 March 2026 / Revised: 2 May 2026 / Accepted: 14 May 2026 / Published: 20 May 2026

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

A nondestructive, rapid, and portable detection method for moisture content (MC) in living tree trunks remains unavailable. Tree radar, developed based on ground-penetrating radar (GPR) technology, represents a promising approach for tree trunk MC detection owing to its high penetration depth and low susceptibility to environmental interference. However, its application to living tree MC detection is constrained by curvature-induced wave propagation complexity, interspecific structural heterogeneity and the limited availability of labeled MC samples obtained through destructive coring, collectively resulting in poor model performance. The study proposed a novel GPR-based MC detection method employing a multi-scale one-dimensional convolutional neural network integrated with an attention mechanism and mixed data augmentation (mixed-MS1DCNNAM). GPR amplitude data extracted from the first 6.5 ns of B-scan signals were used to capture MC-related features via a custom program developed in MATGPR. A mixed model for four tree species with 15–30 cm diameters at breast height (DBH) achieved an R² of 0.7908 and an RMSE value of 0.1059, outperforming traditional models, with test metrics calculated at the tree level by averaging predictions from five directional GPR scans per tree. Furthermore, three DBH-specific sub-models (15–20 cm, 20–25 cm, and 25–30 cm) and four single-species sub-models were developed, yielding improved performance (R² ≥ 0.7246, RMSE ≤ 0.1033; RMSE ≤ 0.0959, MAE ≤ 0.0626, except for European white birch). These results highlighted the effectiveness of stratification by DBH class and tree species. Overall, this study effectively addresses aforementioned challenges and establishes a generalizable nondestructive approach for living trees under field conditions, facilitating sustainable forest management in tree growth monitoring, forest disaster monitoring, harvested timber storage and wood quality assessment.

Keywords:

GPR signals; living tree moisture content; machine learning; non-destructive testing; sustainable forest management

1. Introduction

Living trees constitute an indispensable component of forest ecosystems [1,2]. They sustain atmospheric oxygen levels through photosynthesis, provide habitats for wildlife, support biodiversity, mitigate soil erosion, and contribute to water purification [3]. Moisture content (MC) in living trees is a key indicator of tree physiological status and directly regulates tree growth and development [4,5].

In sustainable forest management, accurate MC detection in living trees enables the timely identification of water stress, thereby supporting optimal physiological function and reducing the risks related to fire, pest outbreaks, and storm damage [5,6]. Previous studies have demonstrated that MC significantly affected non-destructive assessments of living tree biomass [7] as well as the evaluation of wood physical properties, such as elastic modulus and bending strength [8]. Therefore, reliable MC estimation can facilitate preliminary prediction of wood quality and mechanical performance. Moreover, accurate MC measurement supports the sorting of the raw material, which can improve drying efficiency and reduce costs in industrial wood production [9].

The MC of living trees frequently fluctuates during growth due to variations in climate, seasonal conditions, soil properties, tree species, and age, making accurate MC estimation challenging [10,11]. Researchers have explored diverse non-destructive approaches for measuring MC in living trees, but all faced inherent limitations that hinder field applicability. For example, γ-ray methods posed potential safety risks [12]. Nuclear magnetic resonance and computed tomography are costly and operationally impractical for field use [13,14,15,16]. Resistance methods and time domain reflectometry (TDR) detection both require pre-drilling, causing irreversible damage to living trees [6,17,18]. Currently available handheld wood MC meters, including two-pin resistance-based devices (e.g., MD914 and KT-80) and pinless electromagnetic meters (e.g., KT-50 and Proster), are primarily designed for flat surfaces. Their effective detection depth is typically limited to approximately 5 cm, making them unsuitable for measuring trunk MC in living trees [19]. In a word, the existing technologies cannot simultaneously satisfy the core demands of field-based MC detection: rapidity, accuracy, non-destructiveness, and safety.

Tree radar is a non-destructive testing method based on ground-penetrating radar (GPR) technology and has shown huge potential for MC detection in living trees [18,20]. The GPR signals can penetrate through tree trunks and provide information related to internal wood conditions, including MC and decay extent. By measuring the reflection of electromagnetic signals, GPR can detect significant dielectric differences in wood due to varying MC [9,21,22,23]. For instance, Hans et al. (2015) used amplitude data from GPR signals in combination with a partial least squares regression (PLSR) model to estimate MC through the bark of logs from three tree species [9]. However, the prediction results exhibited relatively low accuracy in thawed logs with diameters ranging from 10.4 to 34.5 cm (R² = 0.56–0.83, RMSE = 0.07–0.14). It is demonstrated that log curvature has a significant impact on the amplitude of GPR signals, which affects the MC detection accuracy [9]. Subsequently, Duchesne et al. (2023) developed predictive models based on GPR signals and locally weighted partial least squares regression to estimate MC in logs with diameters ranging from 12 to 32 cm (R² = 0.76–0.87, RMSE = 0.108–0.163) [24]. To mitigate the impact of log curvature, Guo et al. (2023) extracted 31 feature parameters from GPR signals and input them into a backpropagation neural network (BPNN) to detect MC in small-diameter (10–15 cm) logs, achieving improved prediction accuracy (R² = 0.9758–0.9846, RMSE = 0.036–0.043) [25]. Moreover, other machine learning models, such as gradient boosting decision trees with hyperparameter optimization (R² = 0.9635, RMSE = 0.0132) [26] and the group method of data handling (GMDH) (RMSE = 0.052) [27], have also been successfully applied to predict wood MC across different application scenarios. Collectively, these studies highlight the huge potential of machine learning techniques for wood MC detection.

During the field detection process, GPR signals are influenced by various factors such as temperature, humidity, surface curvature, and tree species, leading to increased signal variability. Additionally, obtaining MC through core sampling of living trees causes substantial damage to tree health, thereby limiting large-scale data collection. Since model performance strongly depends on the quantity and diversity of training data, data augmentation techniques are essential for mitigating data scarcity and improving the representativeness of the dataset. In recent years, these techniques have increasingly been applied across diverse fields to expand datasets and improve model performance. For example, Le Guennec et al. (2016) proposed data augmentation strategies, including window slicing and window warping, to address insufficient training data in convolutional neural networks for time-series classification [28]. Similarly, Um et al. (2017) incorporated a combination of rotation, permutation, and time-warping operations into a convolutional neural network to classify data on the motion state of Parkinson’s disease patients collected by using wearable sensors [29]. Their results demonstrated a substantial improvement in classification accuracy with an increase from 77.54% to 86.88%.

Convolutional neural networks (CNNs), as classic deep learning models, incorporate concepts such as local perception, weight sharing, and down-sampling techniques [28,30,31]. These design principles effectively reduce the complexity of the network and endow CNNs with strong robustness. For example, Li et al. (2024) employed a CNN-based model using near-infrared spectroscopy data for the non-destructive geographical origin identification of red dates from four regions in Xinjiang, achieving a classification accuracy of 86.67% based on a dataset of 400 samples [32]. These results suggest that 1DCNNs are capable of end-to-end feature learning from raw signals via properly configured convolutional and pooling operations. Nevertheless, the use of a single-scale convolutional kernel restricts the network to features of a fixed receptive field, which limits its ability to capture multi-scale signal characteristics and results in incomplete feature representations. In contrast, the multi-scale one-dimensional convolutional neural networks (MS1DCNNs) employ convolutional kernels with different receptive fields to extract features at multiple scales, which provides a more comprehensive representation of complex and high-dimensional signal data. Wang et al. (2022) indicated that an MS1DCNN can effectively capture multi-scale features from fruit vibration spectra for peach firmness prediction (R² = 0.844 and RMSE = 0.429 N/mm) and weight prediction (R² = 0.794, RMSE = 29.954 g) [33]. Wang et al. (2025) also demonstrated that multi-scale temporal modeling can effectively capture hierarchical patterns, including global trends and local subtle variations, with the advantages of multi-scale approaches for complex sequence modeling [34].

Attention mechanisms have been demonstrated to enhance model performance by dynamically reweighting features, thereby emphasizing task-relevant local information while suppressing less informative or redundant features [31,35,36]. When integrated with convolutional neural networks, attention mechanisms can further improve prediction accuracy. For example, Zhao et al. (2024) proposed a convolutional neural network-spatial channel attention-bidirectional long short-term memory model with dual attention mechanisms, achieving more accurate estimation of forest canopy height [36]. Li et al. (2024) developed a fault diagnosis method for critical mechanical components by incorporating a hybrid attention mechanism and improved convolutional layers, resulting in approximately 9% higher prediction accuracy compared with other methods [35].

In this study, a non-destructive approach for accurate MC detection in living trees based on GPR technology was proposed. To address the pronounced variability of GPR signals collected under field conditions and the limited availability of MC data, a multi-scale one-dimensional convolutional neural network integrating data augmentation and an attention mechanism was developed. This framework was expected to enhance the extraction of informative features across multiple scales and improve sensitivity to MC-related signal characteristics, thereby providing a feasible solution for MC detection in living trees that supports forest disaster monitoring and forest management.

2. Materials and Methods

2.1. Selection of Living Trees

This study was conducted in the Mao’ershan Forest Farm and the Da Hinggan Ling region in Northeast China, covering diverse growth environments during the period from May to October, 2021–2023. The study focused on four economically important tree species commonly found in Northeast China: Picea rubens Sarg. (Red spruce), Larix gmelinii (Dahurian larch), Betula pubescens (European white birch), and Fraxinus mandshurica (Manchurian ash). The typical wood density for these species varies between 0.35 and 0.80 g/cm³. In total, measurements were conducted on 307 living trees with DBH values between 15 and 30 cm.

2.2. GPR Data Acquisition in Living Trees

The experimental temperature ranged from 5 to 35 °C. Tree radar (TRU-900, TreeRadar Inc., Silver Spring, MD, USA) with a frequency of 900 MHz, a radar gain of 0, and a dielectric constant of 10 was used to obtain B-scan images for each living tree from five different directions at the breast height position (Figure 1a). Afterwards, wood cores were extracted from the breast height position using an increment borer, promptly placed in centrifuge tubes (previously weighed), and securely sealed with centrifuge tube caps (Figure 1b). A single wood core was collected from the first measurement direction at breast height, and its MC was used as the reference value for the corresponding tree. This tree-level MC value was assigned to all five directional GPR scans acquired at breast height. It was assumed that the MC derived from the core sample adequately represents the average moisture condition across the breast height cross-section. This approach minimizes wounding stress and potential physiological disturbance to the tree.

To prevent pathogenic infection, the tree holes were sealed with neutral-cure silicone sealant. The samples were then weighed, and dried at 105 °C for approximately 6 h without centrifuge tube caps until no further change in sample weight was observed. After drying, the samples were immediately tightened using centrifuge tube caps and the weights were re-measured (Figure 1c).

Finally, the MC of wood cores was calculated using Equation (1).

M C = \frac{(M_{w} - M_{t}) - (M_{d} - M_{t})}{M_{d} - M_{t}} \times 100 %

(1)

where MC is the wood core MC, M_w and M_d are the wet and dry weights, and M_t is the weight of the centrifuge tube, respectively.

Amplitude information was extracted from the central trace of the GPR time-domain waveforms, derived from the B-scan in living trees. During the initial and final stages of the scanning process, slight variations in antenna positioning may occur due to unavoidable hand movement, leading to an increase in measurement noise. By focusing on the central trace, such edge effects and motion-induced disturbances can be effectively minimized, thereby the stability and reliability of the extracted signal were improved. This extraction was facilitated by a custom MATLAB R2022b script engineered utilizing the MATGPR. A relationship between GPR signal amplitude and MC was observed, particularly in the first and second peak-valley regions (Figure 2). Hans et al. (2015) similarly noted that MC variations resulted in consistent positional offsets of wave amplitudes [9]. This might affect the accuracy of the MC prediction model. Meanwhile, to reduce computational complexity and processing time caused by long waveforms, the MC prediction model for tree radar was utilized for GPR signal amplitude within the first 6.5 ns. No additional noise filtering was applied due to the relatively smooth signal characteristics.

2.3. Data Augmentation Methods

2.3.1. Scaling Data Augmentation

Scaling data augmentation introduces controlled numerical scaling transformations to increase data diversity, simulating real-scenario changes. A scaling factor needs to be first generated from a specific interval

[a, b] (a < b)

. The scaling factor

s

is randomly sampled according to the uniform distribution

U (a, b)

. The probability density function of the uniform distribution

U (a, b)

is

f (x) = \frac{1}{b - a}

, which ensures that the probability density of each point within the interval

[a, b]

is the same, thus providing diverse ratios for subsequent scaling operations. In this study, the set interval is [0.9, 1.1].

Given a one-dimensional data sequence,

{x_{i}}_{i = 1}^{n}

, where

n

is the number of data points and

x_{i}

is the value of the

i^{t h}

data point. After the scaling operation, a new sequence

{y_{i}}_{i = 1}^{n}

is obtained:

y_{i} = s \cdot x_{i}

(2)

where

i = 1, 2, \dots, n

, and

s

is the scaling factor generated as described above. When

s > 1

, the data are numerically stretched and enlarged; when

s < 1

, the data are numerically compressed and reduced.

Three new data samples of different scales are generated through the above-described scaling operation, and these samples, together with the original data, constitute the dataset after scaling data augmentation.

2.3.2. Noise Data Augmentation

Noise data augmentation introduces stochastic noise into time-series data to expand the dataset diversity. A normal distribution

N (μ, σ^{2})

is adopted, specifically configured with

μ = 0

and

σ = 0.1

(

N (0, {0.1}^{2})

). Its probability density function is

f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}

, which can generate random values fluctuating around the mean

μ

with a standard deviation of

σ

, enabling the injection of diverse noise-induced disturbances into the time-series.

For original time-series data,

{x_{i}}_{i = 1}^{n}

, where

n

is the number of data points and

x_{i}

is the value of the

i^{t h}

data point. A noise sequence

{ε_{i}}_{i = 1}^{n}

is first generated. This noise sequence matches the dimensionality of the original time-series and adheres to the

N (0, {0.1}^{2})

distribution. A new sequence

{y_{i}}_{i = 1}^{n}

is then generated by introducing noise into the original data, where

y_{i} = x_{i} + ε_{i}

(3)

Among them,

i = 1, 2, \dots, n

. This adds normally distributed perturbations to the original data, simulating real-world noise interference.

The process is iterated to produce 3 augmented samples. These generated samples are integrated with the original dataset, which serves to accomplish the noise data augmentation.

2.3.3. Time-Warping Data Augmentation

Time-warping data augmentation introduces random non-linear temporal deformations into time-series data to enhance the diversity of the training dataset and improve model robustness.

Given an original one-dimensional time series,

x = {x (t)}_{t = 0}^{T - 1}

, where

T

denotes the sequence length and

x (t)

represents the observed signal value at time index

t

. First,

K

uniformly distributed temporal control points are defined over the time interval

[0, T - 1]

:

T = {τ_{k}}_{k = 1}^{K}, τ_{k} = \frac{(k - 1) (T - 1)}{K - 1} .

(4)

At each control point

τ_{k}

, a temporal scaling factor is independently sampled from a normal distribution:

r_{k} \sim N (1, σ^{2}), k = 1, 2, \dots, K,

(5)

where the mean is set to 1 to preserve the global time scale in expectation, and

σ^{2}

controls the magnitude of temporal perturbations. Subsequently, a continuous temporal scaling function is constructed over the entire time domain by performing cubic spline interpolation on the control points

{(τ_{k}, r_{k})}_{k = 1}^{K}

:

s (t) = Spline ({(τ_{k}, r_{k})}_{k = 1}^{K}), t \in [0, T - 1] .

(6)

The function

s (t)

characterizes the local temporal transformation rate at time index

t

:

s (t) > 1

corresponds to local temporal stretching, whereas

s (t) < 1

indicates local temporal compression.

To guarantee a strictly monotonic temporal transformation, the temporal scaling function

s (t)

is first converted into a cumulative time-mapping function by integration (or discrete cumulative summation):

ϕ (t) = \sum_{τ = 0}^{t} s (τ), t = 0, 1, \dots, T - 1 .

(7)

Since

ϕ (t)

may alter the overall temporal span, it is subsequently normalized to preserve the original time range

[0, T - 1]

:

ϕ^{⋆} (t) = (T - 1) \frac{ϕ (t) - ϕ (0)}{ϕ (T - 1) - ϕ (0)} .

(8)

The normalized mapping function

ϕ^{⋆} (t)

is strictly increasing and defines a valid temporal warping over the original domain.

Based on the resulting time-mapping function, a time-warped sequence

\bar{x} = {\bar{x} (t)}_{t = 0}^{T - 1}

is generated by resampling the original signal according to

\bar{x} (t) = x (ϕ^{⋆} (t)), t = 0, 1, \dots, T - 1,

(9)

where

x (\cdot)

denotes the original time series evaluated at non-integer indices via interpolation.

This transformation introduces smooth and non-linear temporal variations into the original sequence, effectively simulating temporal misalignment and local speed fluctuations that commonly arise in real-world measurement processes.

The above procedure is repeated three times to generate multiple time-warped variants. Together with the original sequence, these samples form the augmented training dataset used for model training.

2.3.4. Mixed Data Augmentation

To enhance the diversity of the training data and improve the generalization capability of the proposed model, a mixed data augmentation strategy is adopted by jointly applying multiple transformations to the original time-series data. To synthesize augmented data samples, a sequence of three synergistic data augmentation strategies was implemented: the introduction of noise injection, amplitude scaling and time warping (Figure 3).

Given a normalized time-series sample,

x = {x (t)}_{t = 0}^{T - 1}

(10)

Gaussian noise is first added to simulate measurement noise commonly encountered in practical data acquisition:

\tilde{x} (t) = x (t) + ϵ (t), ϵ (t) \sim N (0, σ_{n}^{2})

(11)

where

σ_{n}^{2}

controls the noise intensity.

Next, amplitude scaling is applied to introduce moderate variations in signal magnitude:

\hat{x} (t) = α \bar{x} (t), α ~ N (1, σ_{a}^{2})

(12)

where

α

is a randomly sampled scaling factor and

σ_{a}^{2}

determines the degree of amplitude variation.

Finally, time warping is performed by applying a smooth non-linear temporal transformation to the time axis. Using a strictly monotonic time-mapping function

ϕ^{⋆} (t)

, the time-warped signal is obtained via resampling:

\bar{x} = \hat{x} (ϕ^{⋆} (t)), t = 0, 1 \dots, T - 1,

(13)

where

ϕ^{⋆} (t)

is constructed through spline-interpolated temporal scaling and normalization, as described in the previous subsection.

By applying the above transformations, three augmented samples are generated for each original time-series instance. These augmented samples, together with the original data, form the final training dataset.

Overall, this mixed augmentation strategy increases data diversity from both amplitude and temporal perspectives and is expected to enhance the robustness of the model against noise contamination, scale variations, and temporal distortions.

2.4. Design of Attention Mechanism

Initially conceptualized from the selective visual processing mechanisms inherent in human cognition, the attention framework has subsequently achieved extensive integration across diverse domains, notably in computational vision and the processing of natural language. Its purpose is to optimize existing models by enabling computational resources to focus on task-relevant information [30,32]. The attention mechanism is mainly classified into hard attention and soft attention.

The hard attention mechanism selects regions of interest as input. In the process of image recognition, it can effectively filter out meaningless background data and thus precisely focus on the target object. However, given that the hard-attention mechanism has direct limitations on the input, it is difficult to fully adapt in time-series prediction scenarios [37]. In comparison, the soft attention mechanism uses weights trained by neural networks to measure the global input features in space or channels, enabling attention to specific spatial regions or channels. Meanwhile, this method supports gradient computation compatible with backpropagation, enabling end-to-end training.

A soft attention mechanism is designed for the task of predicting the MC of living trees based on GPR amplitude data:

d = \tanh (x \times W + b)

(14)

Here, the input

x

is multiplied by the weight matrix

W

through matrix multiplication, and the bias vector

b

is added. Then, a non-linear transformation is carried out through the

\tanh

activation function. The

\tanh

activation function compresses the value to the range of (−1, 1) to obtain the intermediate result

d

. Then, a

Softmax

operation is performed on

d

to obtain the attention weights

a

. Suppose the dimension of

d

is

(m, n)

(

m

is the number of samples, and

n

is the feature dimension).

a_{i j} = \frac{\exp (d_{i j})}{\sum_{k = 1}^{n} \exp (d_{i k})}

(15)

where

i = 1, 2, \dots, m

and

j = 1, 2, \dots, n

. The

Softmax

operation is performed along the feature dimension, so that for each sample

i

, the sum of the weights

a_{i j}

on all of its feature dimensions is 1. Finally, the element-wise multiplication of the input

x

and the attention weights

a

is carried out to obtain the output of the attention mechanism:

o u t p u t = x \times a

(16)

Through the above steps, the attention mechanism assigns a weight

a

to each feature dimension of the input

x

and weights the input according to these weights, thus achieving attention to important features. This approach enables the model to focus on the feature components most relevant to the task.

2.5. MS1DCNNAM Model

To further enhance the prediction performance for the MC detection in living trees, this study selects the amplitude data from the time-domain signals extracted from the middle trace of B-scan signals as training features. In the MS1DCNNAM model, the input layer first receives the 68 × 1 GPR signal amplitude data and transmits it to the first convolutional layer. This convolutional layer employs 8 convolutional kernels of size 3 for feature extraction. The rectified linear unit (ReLU) is employed as the activation function to mitigate the vanishing gradient problem and introduce non-linearities, thereby enhancing the model’s feature representation capability. The mathematical expression of the convolution operation is as follows:

y (i) = \sum_{j = 0}^{k - 1} x (i + j) \times w (j) + b

(17)

where

y

is the output,

x

is the input,

w

is the convolutional kernel,

k

is the kernel size, and

j

is the index within the convolutional kernel. Subsequently, the output of the convolutional layer is processed by a max-pooling layer (with a pooling window size of 2 and a stride of 2). Down-sampling reduces the feature dimension, lowers the computational complexity, and mitigates the risk of model overfitting. To enhance the model’s ability to capture multi-scale features from GPR signals, a multi-kernel convolution design is introduced in the second convolutional layer. This layer contains 4 groups of parallel filters with sizes of 1 × 1, 3 × 1, 5 × 1 and 7 × 1, respectively, with 8 filters in each group to extract feature information under different receptive fields. After the convolution operation, the outputs of each group are concatenated and then processed with batch normalization, which mitigates internal covariate shift and improves training stability. It is followed by a max-pooling operation (pool size = 2, stride = 2) to further down-sample the feature maps, thereby retaining important information and reducing the feature dimensionality. Next, the model further extracts the deep-layer abstract features through a convolutional layer containing 64 convolutional kernels of size 5 × 1, combined with batch normalization and a max-pooling layer (pool size = 2, stride = 2) for down-sampling. After the pooling operation, the aforementioned attention mechanism module is incorporated. To prevent model overfitting, a dropout layer is incorporated to randomly deactivate a fraction of neurons during training, thereby enhancing model robustness and generalization performance. Subsequently, the feature maps are flattened into a vector of length 512 × 1 and then fed into a fully connected layer. Compared with a single-layer fully connected structure, a multi-layer fully connected structure extracts higher-level and more abstract features through multiple non-linear transformations step by step, thus significantly enhancing the model’s generalization ability. Therefore, this study adopts a multi-layer fully connected structure to further enhance the model’s prediction ability for the MC of raw wood. A 6-layer fully connected structure is determined, with the dimensions of each layer being 256 × 1, 128 × 1, 64 × 1, 32 × 1, and 16 × 1 in sequence. The final output layer consists of a single neuron 1 × 1 for predicting the MC (Figure 4). Its calculation is as follows:

y = W^{T} x + b

(18)

where

y

denotes the output vector,

W

denotes the weight matrix,

x

denotes the input vector, and

b

represents the bias vector.

This study utilized a dataset of living tree MC containing 1535 GPR signal amplitude data points. Each set of 5 GPR signals from the same tree shares a single reference MC value derived from one core sample. The breast height diameters of the samples ranged from 15–30 cm. As shown in Table 1, the means, medians, and standard deviations of MC in the training set, test set, and complete set were similar, indicating that the tree-wise data partitioning effectively represented the overall data distribution.

The dataset was divided into a training set (accounting for 70% of the total data) and a test set (accounting for 30% of the total data). To further ensure a rigorous evaluation and avoid potential information leakage, a grouped data splitting strategy was additionally adopted, in which all scans acquired from the same tree were assigned exclusively to either the training set or the test set. This tree-wise partitioning prevented samples from the same individual tree from appearing in both sets, thereby providing a more realistic assessment of model generalization performance. For each tree in the test set, predictions from the five directional scans at breast height were averaged to obtain a single tree-level estimate. All reported test metrics (R², RMSE, and MAE) were then calculated based on these tree-level predictions by comparison with the corresponding reference MC values derived from core samples. This averaging strategy was adopted to reduce directional variability and to provide a more robust representation of the overall moisture condition of each tree.

To mitigate the influence of scale differences among features and promote stable and efficient model convergence, the feature data were standardized as follows:

z = \frac{x - μ}{σ}

(19)

where

z

represents the standardized feature value,

x

represents the original feature value,

μ

represents the mean of the features, and

σ

represents the standard deviation of the features.

2.6. Comparative Analysis of Prediction Models

The standardized training data were subjected to data augmentation operations, including scaling, noise addition, time warping (TW), and mixed data augmentation. The augmented data were fed into the MS1DCNNAM model for training, resulting in the scale-MS1DCNNAM, noise-MS1DCNNAM, TW-MS1DCNNAM, and mixed-MS1DCNNAM prediction models (Figure 4). The deep learning model was implemented in TensorFlow/Keras using the Adam optimizer with a mean squared error (MSE) loss, a batch size of 10, and 500 training epochs with a 20% validation split; a fixed random seed (seed = 1) was applied to ensure full reproducibility. Standardization was performed using z-score normalization, with parameters (mean and standard deviation) calculated exclusively from the training set and applied to the test set. RMSE and MAE were computed on the fractional scale (i.e., MC expressed as a proportion between 0 and 1). In contrast, residuals were calculated as the difference between predicted and measured MC and were expressed in percentage (%).

In addition, support vector regression (SVR), gradient boosting decision tree (GBDT), random forest (RF), K-nearest neighbors (KNN), partial least squares (PLS), and backpropagation neural network (BPNN) are commonly used for machine learning tasks in forestry engineering. Specifically, SVR transforms the input data into a high-dimensional feature space through kernel functions and identifies an optimal hyperplane to perform regression [38]; GBDT sequentially trains weak decision trees and iteratively accumulates their prediction results to optimize the model performance [39]; RF integrates multiple independent decision trees to enhance the model’s generalization ability [40]; KNN relies on distance calculations between samples for predictions [41]; PLS extracts latent variables to mitigate multicollinearity [9]; and BPNN adapts to complex relationships through backpropagation training [18].

Using the identical training and test datasets, the prediction performance of the aforementioned models was comparatively analyzed. The comparative models included six common machine learning algorithms (SVR, GBDT, RF, KNN, PLS, and BPNN), alongside 1DCNN, 1DCNNAM, MS1DCNNAM and MS1DCNNAM models with data augmentation. For the six common machine learning algorithms, grid search combined with five-fold cross-validation was performed (Table 2).

2.7. Model Construction for Different Diameter Ranges

Redman et al. (2016) emphasized that variations in log surface curvature should be taken into account when acquiring GPR signals on log bark surfaces [42]. These curvature variations, which are influenced by differences in log diameter, resulted in different gaps between the GPR antenna and the log surface, thereby causing a certain impact on the amplitude of electromagnetic waves [42]. For living trees, the surface of large-diameter logs is relatively more irregular and uneven compared to that of small-diameter logs, and this irregularity is especially pronounced in large-diameter white birch (Figure 1a). Consequently, the influence of living tree diameter variability on surface curvature must be considered in MC detection using tree radar. To further enhance the detection accuracy of the mixed model, the living trees with DBH in the range of 15–30 cm were subdivided into three DBH classes: 15–20 cm (90 trees) with 450 MC data points, 20–25 cm (102 trees) with 510 MC data points, and 25–30 cm (115 trees) with 575 MC data points. For the DBH ranges of 15–20, 20–25, 25–30, and 15–30 cm, the training and testing datasets exhibited similar descriptive statistics as those of living tree MC (minimum, maximum, mean, median, and standard deviation), indicating that the tree-wise data partitioning adequately represented the overall distribution (Table 1).

2.8. Model Construction for Different Tree Species

Previous studies have identified variations in internal structure (such as bark thickness, heartwood, and sapwood) among tree species. These differences could impact the predictive accuracy of MC for mixed-species scenarios [25]. Therefore, 307 living trees were categorized into four species groups for the development of four single-species models: red spruce (69 trees; 345 MC measurements, 38.9%–118.0%), Dahurian larch (56 trees; 280 measurements, 38.8%–63.2%), European white birch (89 trees; 445 measurements, 51.6%–136.3%), and Manchurian ash (93 trees; 465 measurements, 41.6%–115.2%). Across the four single-species models, the training and testing datasets exhibited similar descriptive statistics as those of living tree MC (minimum, maximum, mean, median, and standard deviation), indicating that the tree-wise data partitioning adequately represented the overall distribution (Table 1).

3. Results and Discussion

3.1. Comparative Analysis of Different Prediction Models

For MC prediction in living trees, all augmentation-enhanced models outperformed the model without augmentation, demonstrating the effectiveness of data augmentation in improving model performance (Table 3). Among them, the mixed-MS1DCNNAM model achieved the best predictive performance (R² = 0.7908, RMSE = 0.1059), followed by the TW-MS1DCNNAM (R² = 0.7824, RMSE = 0.1080), noise-MS1DCNNAM (R² = 0.7837, RMSE = 0.1076), and scale-MS1DCNNAM (R² = 0.7796, RMSE = 0.1087) models. The improvement is likely associated with an increased diversity of the training dataset, which facilitates more comprehensive feature learning and mitigates overfitting, thereby improving model generalizability to unseen samples.

Specifically, time-warping augmentation, by applying smooth, non-linear temporal deformations to waveform signals, reflects realistic variations in propagation time caused by heterogeneous moisture distributions in living trees. This approach preserves the overall signal morphology while enriching local temporal variability, thereby enhancing the model’s ability to capture moisture-sensitive temporal features. Noise augmentation, through injecting stochastic perturbations into the original signals, effectively simulates measurement uncertainties and environmental interference in real-world data acquisition. Training on these noisy samples enables the model to learn noise-invariant features, improving robustness and generalization under practical conditions. Scale augmentation, which alters signal amplitude through scaling transformations, expands the dynamic range of the input data and allows the model to learn feature representations across multiple signal magnitudes. This multi-scale learning capability increases feature diversity and improves the model’s adaptability to variations in signal intensity caused by differences in tree structure, moisture distribution, or sensor coupling conditions. The superior performance of the mixed-MS1DCNNAM model can be attributed to the complementary effects of integrating multiple augmentation strategies. By simultaneously enhancing temporal diversity, noise robustness, and adaptability to scale-related variations, the mixed augmentation scheme promotes a richer and more balanced feature space. This synergy enables the model to capture complex, non-linear relationships between waveform characteristics and MC more effectively than any single augmentation strategy alone, ultimately resulting in the highest predictive accuracy among all evaluated models.

As shown in Table 3, the MS1DCNNAM model (R² = 0.7576, RMSE = 0.1139) had a prediction accuracy higher than the 1DCNNAM model (R² = 0.7522, RMSE = 0.1152), suggesting that its multi-scale architecture enhanced feature extraction by capturing information at multiple dimensions and levels, thereby providing a more comprehensive description of the data. Further comparative experiments indicated that the 1DCNNAM model outperformed the 1DCNN model (R² = 0.7390, RMSE = 0.1183). This improvement can be attributed to the attention mechanism, whose weights were adaptively assigned to different features, emphasizing task-relevant information and facilitating the extraction of key features. Moreover, both MS1DCNNAM and 1DCNNAM models exhibited a superior prediction accuracy compared with traditional prediction models, including SVR, GBDT, RF, KNN, BPNN, and PLS (R² = 0.5318–0.7057, RMSE = 0.1256–0.1584). These results further demonstrated that integrating multi-scale convolutional layers with an attention mechanism enable the MS1DCNNAM model to effectively capture both local features and global contextual information from GPR signal data, thereby significantly enhancing prediction performance.

3.2. Mixed Model

As shown in Figure 5a, although the training and validation losses exhibit similar decreasing trends, indicating stable model convergence, a noticeable performance gap between the training and test sets persists. This suggests that moderate overfitting may be present, although it remains within an acceptable range. Such a gap may be primarily attributed to the limited sample size and the intrinsic variability of GPR signals under field conditions. Although the mixed-MS1DCNNAM model for the MC of living trees yielded better prediction performance with a relatively small mean absolute error (MAE = 0.0806) compared with the other prediction models, its maximum residual value exceeded 30% (Figure 5b). This phenomenon is mainly attributed to the significant differences in the surface curvature of living trees and the effect of tree species diversity on the model prediction results. Specifically, differences in surface curvature across trees with varying DBH led to significant changes in the gap between radar antenna and bark surface, resulting in varying degrees of electromagnetic wave refraction and scattering, thereby affecting signal stability. In addition, variations in wood density, heartwood-sapwood thickness, and texture structure among different species could influence the propagation characteristics of electromagnetic waves, further impacting the model’s predictive accuracy.

Therefore, the model was further subdivided according to the different smaller DBH ranges and tree species of living trees to reduce the influence of surface curvature and species diversity on model performance and to improve model prediction accuracy and generalization.

3.3. Different DBH Sub-Models

The sub-models for three different DBH ranges exhibited better predictive performance than the mixed model (R² ≥ 0.7246, RMSE ≤ 0.1033) (Table 4). This suggests that the more regular surface curvature associated with a smaller DBH range reduces the disturbance to GPR signal propagation, leading to higher predictive performance of the corresponding models. Among them, the model for the 15–20 cm DBH class had the best predictive performance (R² = 0.8885, RMSE = 0.0758) (Figure 6b). The models for the 20–25 cm and 25–30 cm DBH classes showed relatively lower predictive performance (with R² values of 0.7246 and 0.8031 and corresponding RMSE values of 0.1033 and 0.0872, respectively). The higher prediction accuracy in smaller DBH classes is likely attributed to their relatively regular trunk surfaces, whereas larger DBH living trees generally exhibit more irregular surfaces. This irregularity induces complex reflection and refraction of electromagnetic waves, resulting in greater signal variation and higher model errors, particularly in white birch trees with larger DBHs (Figure 1a).

Therefore, to reduce the influence of surface curvature on MC prediction in large-DBH living trees, future research will focus on accurately quantifying the curvature of the detection surface and developing sub-models for different curvature ranges to further improve prediction accuracy. Specifically, for each tree, GPR signals should be collected from a single direction corresponding to a defined surface segment, and the MC obtained from the wood core sample can be used as the label for that specific directional signal. This strategy is intended to establish a more direct correspondence between localized GPR responses and moisture measurements, thereby reducing the uncertainty introduced by cross-directional variability and improving the physical consistency between input signals and target labels.

In practical applications, the MC of a living tree can be estimated by conducting GPR measurements along multiple directions at breast height and averaging the predicted MC values, which helps mitigate the influence of directional variability. Specifically, each directional measurement can be processed using the sub-model corresponding to its surface curvature, and the mean MC value can be used to represent the MC of the living tree at breast height.

3.4. Single-Species Sub-Models

In contrast to the sub-models previously constructed based on DBH ranges, mixed-MS1DCNNAM models were developed separately for each tree species. According to Figure 7, within the DBH range of 15–30 cm, the maximum residuals of the four tree species sub-models were all within 25%. Among the models, the larch model showed the lowest MAE of 0.0450, with a maximum residual of approximately 8%, whereas the white birch model exhibited the largest MAE of 0.0941, with a maximum residual of approximately 20%. It is noted that a lower MAE does not mean better model performance. Both larch and spruce are coniferous species with non-porous wood structures and similar surface curvatures; nevertheless, their predictive performances differed markedly, with a determination coefficient of 0.3067 for larch and 0.8321 for spruce, respectively (Table 5). This discrepancy is likely attributable to the variation in MC between the two species, which is relatively narrow in larch (24.5%) compared to spruce (79.1%). As a result, the limited MC range in larch can lead to smaller absolute errors, while masking the model’s inability to capture the underlying relationship between GPR signals and MC. Considering that GPR signals are strongly influenced by surface curvature within the 15–30 cm DBH range, the limited MC variation in larch makes it difficult for the mixed-MS1DCNNAM model to capture subtle signal changes, resulting in low predictive performance. Therefore, due to its limited predictive reliability (low R²), the larch model was excluded from further comparative analysis.

Further comparison of MC prediction performance revealed marked differences among the coniferous species spruce (RMSE = 0.0959, MAE = 0.0626), the broad-leaved species white birch (RMSE = 0.1063, MAE = 0.0941) and Manchurian ash (RMSE = 0.0561, MAE = 0.0429) (Table 5). Although the MC ranges of spruce, white birch, and Manchurian ash are similar (79.1%, 84.7%, and 73.6%, respectively), the prediction error for white birch remained markedly higher than that of the other two species. This is primarily due to the relatively small curvature changes and regular surfaces of spruce and Manchurian ash, in contrast to the more irregular surface of white birch with pronounced curvature variations (Figure 1a). Substantial surface curvature variation alters the distance between the tree radar detection plane and the bark, which significantly affects electromagnetic wave propagation and increases model prediction errors. This factor likely contributes to the relatively large residual values observed in both the mixed model and the different DBH models. Therefore, future studies should incorporate targeted preprocessing strategies, including systematic outlier detection and treatment prior to model training. Such data cleaning procedures are expected to enhance model robustness, improve predictive accuracy, and further reduce overall error.

In addition, the predicted error for single-species models except for European white birch (RMSE ≤ 0.0959, MAE ≤ 0.0626) was significantly smaller than that of mixed models (RMSE = 0.1059, MAE = 0.0806). This indicates that significant variations in internal structure, bark curvature, and internal moisture distribution, particularly between different tree types (conifer vs. broad-leaved and diffuse-porous vs. ring-porous woods), could influence the reflection of electromagnetic wave signals.

According to the results of this study, constructing single-species models within specific surface curvature ranges could more effectively reduce prediction errors and further enhance detection accuracy in practical applications. Nevertheless, despite the higher predictive accuracy of such models, it is more practical to develop a universal mixed-species model to improve detection efficiency across forest management applications, such as drought severity assessment and fire risk evaluation. Wang et al. (2023) demonstrated that a grouping strategy capable of distinguishing characteristic patterns across different local regions can effectively improve model performance and mitigate noise interference [43]. Building on this insight, future work will focus on integrating a similar grouping mechanism into the proposed mixed-species model framework and expanding the dataset to include additional tree species, thereby further improving prediction accuracy, generalization capability, and robustness for practical field detection.

4. Conclusions

In this study, a novel non-destructive approach was developed by integrating GPR signals with the mixed-MS1DCNNAM model to predict the MC of living trees across different species and DBH ranges under experimental temperatures of 5–35 °C. The results demonstrated that the proposed mixed-species model based on the mixed-MS1DCNNAM framework outperforms other models in living tree MC prediction, while the smaller DBH and single-species sub-models further enhanced prediction accuracy. In particular, the 15–20 cm DBH class models exhibited higher predictive performance, which is likely attributable to their more regular surface geometry and the limited variation in surface curvature. Moreover, interspecific differences in internal structure among tree species significantly influenced electromagnetic waveform propagation, thereby affecting MC prediction accuracy. In conclusion, these findings underscore the importance of simultaneously accounting for both surface curvature and species variability in GPR-based MC detection and provide practical guidance for more effective sustainable forest management in tree growth monitoring, forest disaster monitoring, storage of harvested timber and wood quality assessment.

Author Contributions

J.G.: Conceptualization, Methodology, Software, Data curation, Formal analysis, Visualization, Funding acquisition, Writing—original draft. J.C.: Methodology, Writing—review and editing. C.L.: Formal analysis, Investigation, Writing—review and editing. Y.Z.: Investigation, Formal analysis, Validation, Writing—review and editing. F.J.: Investigation, Data curation, Writing—review and editing. K.Y.: Investigation, Data curation, Writing—review and editing. R.Q.: Investigation, Writing—review and editing. H.X.: Methodology, Supervision, Project administration, Funding acquisition, Writing—review and editing. Y.H.: Methodology, Supervision, Project administration, Funding acquisition, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the China Scholarship Council (No. 202306600035) and Innovation Foundation for the Doctoral Program of Forestry Engineering of Northeast Forestry University (No. LYGC202115) and Natural Science Foundation of Heilongjiang Province (Grant number: LH2024C054) and National Key Research and Development Program of China (Grant number: 2021YFD2201205).

Data Availability Statement

Data will be made available on reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Feeley, K.J.; Zuleta, D. Changing forests under climate change. Nat. Plants 2022, 8, 984–985. [Google Scholar] [CrossRef]
Oldekop, J.A.; Rasmussen, L.V.; Agrawal, A.; Bebbington, A.J.; Meyfroidt, P.; Bengston, D.N.; Blackman, A.; Brooks, S.; Davidson-Hunt, I.; Davies, P.; et al. Forest-linked livelihoods in a globalized world. Nat. Plants 2020, 6, 1400–1407. [Google Scholar] [CrossRef] [PubMed]
Duque, A.; Peña, M.A.; Cuesta, F.; González-Caro, S.; Kennedy, P.; Phillips, O.L.; Calderón-Loor, M.; Blundo, C.; Carilla, J.; Cayola, L.; et al. Mature Andean forests as globally important carbon sinks and future carbon refuges. Nat. Commun. 2021, 12, 2138. [Google Scholar] [CrossRef]
Scoffoni, C.; Chatelet, D.S.; Pasquet-kok, J.; Rawls, M.; Donoghue, M.J.; Edwards, E.J.; Sack, L. Hydraulic basis for the evolution of photosynthetic productivity. Nat. Plants 2016, 2, 16072. [Google Scholar] [CrossRef] [PubMed]
Ramage, M.H.; Burridge, H.; Busse-Wicher, M.; Fereday, G.; Reynolds, T.; Shah, D.U.; Wu, G.; Yu, L.; Fleming, P.; Densley-Tingley, D.; et al. The wood from the trees: The use of timber in construction. Renew. Sustain. Energy Rev. 2017, 68, 333–359. [Google Scholar] [CrossRef]
Constantz, J.; Murphy, F. Monitoring moisture storage in trees using time domain reflectometry. J. Hydrol. 1990, 119, 31–42. [Google Scholar] [CrossRef]
Djomo, A.N.; Ibrahima, A.; Saborowski, J.; Gravenhorst, G. Allometric equations for biomass estimations in Cameroon and pan moist tropical equations including biomass data from Africa. For. Ecol. Manag. 2010, 260, 1873–1885. [Google Scholar] [CrossRef]
Fathi, H.; Nasir, V.; Kazemirad, S. Prediction of the mechanical properties of wood using guided wave propagation and machine learning. Constr. Build. Mater. 2020, 262, 120848. [Google Scholar] [CrossRef]
Hans, G.; Redman, D.; Leblon, B.; Nader, J.; La Rocque, A. Determination of log moisture content using early-time ground penetrating radar signal. Wood Mater. Sci. Eng. 2015, 10, 112–129. [Google Scholar] [CrossRef]
Zheng, L.T.; He, D.; He, X.M.; Yin, F.; Yan, N.R. Neighborhood interactions and environment modulate individual-level trait correlations and divergent functional trade-offs among co-occurring trees. J. Plant Ecol. 2025, 18, rtaf099. [Google Scholar] [CrossRef]
McDowell, N.G.; Allen, C.D.; Marshall, L. Growth, carbon-isotope discrimination, and drought-associated mortality across a Pinus ponderosa elevational transect. Glob. Change Biol. 2010, 16, 399–415. [Google Scholar] [CrossRef]
Edwards, W.R.N.; Jarvis, P.G. A method for measuring radial differences in water content of intact tree stems by attenuation of gamma radiation. Plant Cell Environ. 1983, 6, 255–260. [Google Scholar] [CrossRef]
Jones, M.; Aptaker, P.S.; Cox, J.; Gardiner, B.A.; McDonald, P.J. A transportable magnetic resonance imaging system for in situ measurements of living trees: The Tree Hugger. J. Magn. Reson. 2012, 218, 133–140. [Google Scholar] [CrossRef]
Wang, Q.; Liu, X.E.; Yang, S.; Jiang, M.; Cao, J. Non-destructive detection of density and moisture content of heartwood and sapwood based on X-ray computed tomography (X-CT) technology. Eur. J. Wood Wood Prod. 2019, 77, 1053–1062. [Google Scholar] [CrossRef]
Kumar, R.; Hosseinzadehtaher, M.; Hein, N.; Shadmand, M.; Jagadish, S.V.K.; Ghanbarian, B. Challenges and advances in measuring sap flow in agriculture and agroforestry: A review with focus on nuclear magnetic resonance. Front. Plant Sci. 2022, 13, 1036078. [Google Scholar] [CrossRef]
Wei, Q.; Leblon, B.; La Rocque, A. On the use of X-ray computed tomography for determining wood properties: A review. Can. J. For. Res. 2011, 41, 2120–2140. [Google Scholar] [CrossRef]
Luo, Z.; Deng, Z.; Singha, K.; Zhang, X.; Liu, N.; Zhou, Y.; He, X.; Guan, H. Temporal and spatial variation in water content within living tree stems determined by electrical resistivity tomography. Agric. For. Meteorol. 2020, 291, 108058. [Google Scholar] [CrossRef]
Li, Y.; Guo, L.; Wang, J.; Wang, Y.; Xu, D.; Wen, J. An Improved Sap Flow Prediction Model Based on CNN-GRU-BiLSTM and Factor Analysis of Historical Environmental Variables. Forests 2023, 14, 1310. [Google Scholar] [CrossRef]
Forsén, H.; Tarvainen, V. Accuracy and Functionality of Hand Held Wood Moisture Content Meters; Technical Research Centre of Finland: Espoo, Finland, 2000; Volume 420. [Google Scholar]
Wu, X.; Li, G.; Jiao, Z.; Wang, X. Reliability of acoustic tomography and ground-penetrating radar for tree decay detection. Appl. Plant Sci. 2018, 6, e01187. [Google Scholar] [CrossRef]
Rodríguez-Abad, I.; Martínez-Sala, R.; García-García, F.; Capuz-Lladró, R. Non-destructive methodologies for the evaluation of moisture content in sawn timber structures: Ground-penetrating radar and ultrasound techniques. Near Surf. Geophys. 2010, 8, 475–482. [Google Scholar] [CrossRef]
Lei, Y.; He, Z.; Zi, Y.; Hu, Q. Fault diagnosis of rotating machinery based on multiple ANFIS combination with GAs. Mech. Syst. Signal Process. 2007, 21, 2280–2294. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, C.; Lu, Y.; Zheng, T.; Dong, Z.; Tian, Y.; Jia, Y. In-situ recognition of moisture damage in bridge deck asphalt pavement with time-frequency features of GPR signal. Constr. Build. Mater. 2020, 244, 118295. [Google Scholar] [CrossRef]
Duchesne, I.; Tong, Q.; Hans, G. Using Ground Penetrating Radar (GPR) to Predict Log Moisture Content of Commercially Important Canadian Softwoods. Forests 2023, 14, 2396. [Google Scholar] [CrossRef]
Guo, J.; Wang, P.; Qin, R.; Zhao, L.; Tang, X.; Zeng, J.; Xu, H. Detection of moisture content in logs using multi-parameter GPR signal analysis and neural network models. Holzforschung 2023, 77, 240–247. [Google Scholar] [CrossRef]
Yu, M.; Yan, J.; Chu, J.; Qi, H.; Xu, P.; Liu, S.; Zhou, L.; Gao, J. Accurate prediction of wood moisture content using terahertz time-domain spectroscopy combined with machine learning algorithms. Ind. Crops Prod. 2025, 227, 120771. [Google Scholar] [CrossRef]
Rahimi, S.; Avramidis, S. Predicting moisture content in kiln dried timbers using machine learning. Eur. J. Wood Wood Prod. 2022, 80, 681–692. [Google Scholar] [CrossRef]
Le Guennec, A.; Malinowski, S.; Tavenard, R. Data augmentation for time series classification using convolutional neural networks. In Proceedings of the ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data; HAL-SHS: Villeurbanne, France, 2016. [Google Scholar]
Um, T.T.; Pfister, F.M.J.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 216–220. [Google Scholar]
Wang, Z.; Luo, Y.; Huang, D.; Ge, N.; Lu, J. Category-Adaptive Domain Adaptation for Semantic Segmentation. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 3773–3777. [Google Scholar]
Sun, J.; Ding, H.; Li, N.; Sun, X.; Dong, X. Intelligent Fault Diagnosis of Hydraulic System Based on Multiscale One-Dimensional Convolutional Neural Networks with Multiattention Mechanism. Sensors 2024, 24, 7267. [Google Scholar] [CrossRef]
Li, X.; Wu, J.; Bai, T.; Wu, C.; He, Y.; Huang, J.; Li, X.; Shi, Z.; Hou, K. Variety classification and identification of jujube based on near-infrared spectroscopy and 1D-CNN. Comput. Electron. Agric. 2024, 223, 109122. [Google Scholar] [CrossRef]
Wang, D.; Feng, Z.; Ji, S.; Cui, D. Simultaneous prediction of peach firmness and weight using vibration spectra combined with one-dimensional convolutional neural network. Comput. Electron. Agric. 2022, 201, 107341. [Google Scholar] [CrossRef]
Wang, Z.; Ge, N.; Lu, J. Motion In-Betweening with Spatial and Temporal Transformers. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 5671–5683. [Google Scholar] [CrossRef]
Li, X.; Xiao, S.; Zhang, F.; Huang, J.; Xie, Z.; Kong, X. A fault diagnosis method with AT-ICNN based on a hybrid attention mechanism and improved convolutional layers. Appl. Acoust. 2024, 225, 110191. [Google Scholar] [CrossRef]
Zhao, Z.; Jiang, B.; Wang, H.; Wang, C. Forest Canopy Height Retrieval Model Based on a Dual Attention Mechanism Deep Network. Forests 2024, 15, 1132. [Google Scholar] [CrossRef]
Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Int. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
Wang, P.; Tan, S.; Zhang, G.; Wang, S.; Wu, X. Remote Sensing Estimation of Forest Aboveground Biomass Based on Lasso-SVR. Forests 2022, 13, 1597. [Google Scholar] [CrossRef]
Gao, W.; Li, Z.; Chen, Q.; Jiang, W.; Feng, Y. Modelling and prediction of GNSS time series using GBDT, LSTM and SVM machine learning approaches. J. Geod. 2022, 96, 71. [Google Scholar] [CrossRef]
Zhang, W.; Wu, C.; Zhong, H.; Li, Y.; Wang, L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci. Front. 2021, 12, 469–477. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient kNN Classification with Different Numbers of Nearest Neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 1774–1785. [Google Scholar] [CrossRef]
Redman, J.D.; Hans, G.; Diamanti, N. Impact of Wood Sample Shape and Size on Moisture Content Measurement Using a GPR-Based Sensor. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 221–227. [Google Scholar] [CrossRef]
Wang, Z.; Wang, J.; Ge, N.; Lu, J. HiMoReNet: A Hierarchical Model for Human Motion Refinement. IEEE Signal Process. Lett. 2023, 30, 868–872. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the field experiment. (a) Tree radar B-scan signal acquisition from a white birch with DBH of 28 cm. (b) Extracting a tree core from a living tree using an increment borer. (c) Weighing with an analytical balance after oven drying.

Figure 2. Time-domain variations of electromagnetic signals in living Manchurian ash trees (DBH 22–23 cm) under different MC conditions (41.6%–100%).

Figure 3. Flowchart of mixed data augmentation.

Figure 4. MS1DCNNAM prediction model combined with different data enhancements flowchart for living tree moisture content detection.

Figure 5. (a) Training and validation loss curves and (b) residual analysis of moisture content predictions for four tree species in the 15–30 cm DBH range. (The residuals are reported in percentage points).

Figure 6. Training and validation loss curves and residual analysis of moisture content predictions for four tree species in the 15–20, 20–25 and 25–30 cm DBH classes: (a) training and validation loss curves for the 15–20 cm DBH class; (b) residual analysis of moisture content predictions for the 15–20 cm DBH class; (c) training and validation loss curves for the 20–25 cm DBH class; (d) residual analysis of moisture content predictions for the 20–25 cm DBH class; (e) training and validation loss curves for the 25–30 cm DBH class; (f) residual analysis of moisture content predictions for the 25–30 cm DBH class.

Figure 7. Training and validation loss curves and residual analysis of moisture content predictions for four single tree species in the 15–30 cm DBH range: (a) training and validation loss curves for Red spruce; (b) residual analysis of moisture content predictions for Red spruce; (c) training and validation loss curves for Dahurian Larch; (d) residual analysis of moisture content predictions for Dahurian Larch; (e) training and validation loss curves for White birch; (f) residual analysis of moisture content predictions for White birch; (g) training and validation loss curves for Manchurian ash; (h) residual analysis of moisture content predictions for Manchurian ash.

Table 1. Descriptive statistics of moisture content across different DBH classes and tree species.

Category	Dataset	Min (%)	Max (%)	Mean (%)	Median (%)	Std (%)	Number
DBH 15–30 cm	All selected set	38.8	136.3	72.4	69.4	22.0	1535
	Training set	38.8	133.3	71.8	68.5	21.4	1070
	Testing set	38.9	136.3	73.8	71.6	23.2	465
DBH 15–20 cm	All selected set	43.2	133.3	74.2	73.3	21.6	450
	Training set	43.2	133.3	72.3	71.9	21.1	310
	Testing set	43.5	123.8	69.0	58.1	22.8	140
DBH 20–25 cm	All selected set	38.8	132.0	72.7	70.7	21.6	510
	Training set	38.8	132.0	75.6	71.6	21.8	355
	Testing set	41.6	123.1	66.1	57.1	19.7	155
DBH 25–30 cm	All selected set	38.9	136.3	70.8	64.0	22.5	575
	Training set	38.9	136.3	73.3	64.9	23.2	400
	Testing set	40.5	122.3	65.1	60.8	19.7	175
Red spruce	All selected set	38.9	118.0	70.7	69.5	21.7	345
	Training set	38.9	118.0	69.2	69.3	20.7	240
	Testing set	43.2	113.7	74.4	73.0	23.5	105
Dahurian larch	All selected set	38.8	63.3	51.0	50.5	5.7	280
	Training set	38.8	63.3	51.1	50.6	5.4	195
	Testing set	40.5	63.2	50.8	50.5	6.6	85
White birch	All selected set	51.6	136.3	86.0	80.0	25.0	445
	Training set	51.6	136.3	84.2	77.4	25.1	310
	Testing set	54.1	128.4	90.1	87.9	24.5	135
Manchurian ash	All selected set	41.6	115.2	75.1	76.1	14.1	465
	Training set	44.7	115.2	76.4	76.5	14.5	325
	Testing set	41.6	97.2	72.0	71.9	12.8	140

Each DBH class category includes four tree species, and each species category has a DBH range of 15–30 cm.

Table 2. Hyperparameter optimization for the SVR, GBDT, RF, KNN, BPNN and PLS models.

Algorithm	Hyperparameter	Definition	Search Range
SVR	C	Regularization parameter	(0.1, 1, 10)
	epsilon	Epsilon-insensitive loss parameter	(0.01, 0.1, 0.2)
GBDT	n_estimators	Number of trees	(300, 500, …, 1100)
	max_depth	Maximum depth of a tree	(3, 5, …, 9)
RF	n_estimators	Number of trees	(100, 300, …, 1500)
	max_features	Maximum number of features when splitting a node	(5, 7, …, 15)
KNN	n_neighbors	Number of neighbors	(1, 2, …, 20)
BPNN	hidden_layer_sizes	Number of hidden neurons	One-layer structure: (10), (20), (30), (40), Two-layer structure: (10, 5), (20, 10), (30, 15), (40, 20)

	activation	Activation function	(relu, tanh)
PLS	n_components	Number of latent variables	(6, 8, 12, 14)

Table 3. Performance of different models for living trees across four species in the 15–30 cm DBH range.

Category	Model	Hyperparameter Values	Testing Set			Training Set
Category	Model	Hyperparameter Values	R²	RMSE	MAE	R²	RMSE	MAE
Augmentation-enhanced models	Mixed-MS1DCNNAM	epochs = 500, batch_size = 10	0.7908	0.1059	0.0806	0.9951	0.0151	0.0092
Augmentation-enhanced models	TW-MS1DCNNAM	epochs = 500, batch_size = 10	0.7824	0.1080	0.0821	0.9837	0.0274	0.0101
	Noise-MS1DCNNAM	epochs = 500, batch_size = 10	0.7837	0.1076	0.0850	0.9815	0.0291	0.0099
	Scale-MS1DCNNAM	epochs = 500, batch_size = 10	0.7796	0.1087	0.0807	0.9886	0.0229	0.0066
Optimized 1DCNN models	MS1DCNNAM	epochs = 500, batch_size = 10	0.7576	0.1139	0.0868	0.9561	0.0448	0.0174
Optimized 1DCNN models	1DCNNAM	epochs = 500, batch_size = 10	0.7522	0.1152	0.0861	0.9521	0.0468	0.0210
	1DCNN	epochs = 300, batch_size = 16	0.7390	0.1183	0.0903	0.9548	0.0455	0.0172
Conventional machine learning models	SVR	Kernel: rbf, C: 1, epsilon: 0.01	0.7057	0.1256	0.0926	0.8223	0.0903	0.0563
Conventional machine learning models	GBDT	n_estimators: 900, max_depth: 3	0.6173	0.1432	0.1065	0.9988	0.0075	0.0058
	RF	n_estimators: 900, max_features : 9	0.6154	0.1436	0.1052	0.9537	0.0461	0.0328
	KNN	n_neighbors: 7	0.5470	0.1558	0.1106	0.7303	0.1112	0.0791
	BPNN	hidden_layer: (10, 5), activation: relu	0.6603	0.1349	0.1075	0.6645	0.1240	0.0946
	PLS	n_components: 14	0.5318	0.1584	0.1235	0.4804	0.1544	0.1207

Training metrics were calculated at the scan level to monitor model fitting during optimization, while test metrics were evaluated at the tree level by averaging predictions from five directional scans for each tree, providing a more application-relevant assessment.

Table 4. Performance of the mixed-MS1DCNNAM model for four living tree species in the 15–20, 20–25 and 25–30 cm DBH classes.

Category	Hyperparameter Values	Testing Set			Training Set
Category	Hyperparameter Values	R²	RMSE	MAE	R²	RMSE	MAE
DBH 15–20 cm	epochs = 300, batch_size = 10	0.8885	0.0758	0.0342	0.9991	0.0064	0.0046
DBH 20–25 cm	epochs = 300, batch_size = 8	0.7246	0.1033	0.0755	0.9976	0.0105	0.0079
DBH 25–30 cm	epochs = 300, batch_size = 4	0.8031	0.0872	0.0688	0.9989	0.0075	0.0050

Training metrics were calculated at the scan level, while test metrics were evaluated at the tree level by averaging predictions from five directional scans for each tree.

Table 5. Performance of the mixed-MS1DCNNAM model for four single tree species in the 15–30 cm DBH range.

Tree Species	Hyperparameter Values	Testing Set			Training Set
Tree Species	Hyperparameter Values	R²	RMSE	MAE	R²	RMSE	MAE
Red spruce	epochs = 300, batch_size = 4	0.8321	0.0959	0.0626	0.9978	0.0097	0.0073
Dahurian larch	epochs =200, batch_size = 4	0.3067	0.0544	0.0450	0.9902	0.0053	0.0042
White birch	epochs = 300, batch_size =12	0.8096	0.1063	0.0941	0.9991	0.0073	0.0053
Manchurian ash	epochs = 300, batch_size = 4	0.8053	0.0561	0.0429	0.9989	0.0049	0.0033

Training metrics were calculated at the scan level, while test metrics were evaluated at the tree level by averaging predictions from five directional scans for each tree.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, J.; Cool, J.; Luo, C.; Zhong, Y.; Ji, F.; Yu, K.; Qin, R.; Xu, H.; Hu, Y. Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation. Forests 2026, 17, 618. https://doi.org/10.3390/f17050618

AMA Style

Guo J, Cool J, Luo C, Zhong Y, Ji F, Yu K, Qin R, Xu H, Hu Y. Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation. Forests. 2026; 17(5):618. https://doi.org/10.3390/f17050618

Chicago/Turabian Style

Guo, Jiaxing, Julie Cool, Chaoguang Luo, Yan Zhong, Fengfeng Ji, Kuanjie Yu, Ruixia Qin, Huadong Xu, and Yanbo Hu. 2026. "Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation" Forests 17, no. 5: 618. https://doi.org/10.3390/f17050618

APA Style

Guo, J., Cool, J., Luo, C., Zhong, Y., Ji, F., Yu, K., Qin, R., Xu, H., & Hu, Y. (2026). Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation. Forests, 17(5), 618. https://doi.org/10.3390/f17050618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Prediction on Moisture Content of Living Trees Using a Multi-Scale One-Dimensional Convolutional Neural Network with Attention Mechanism Based on Data Augmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Selection of Living Trees

2.2. GPR Data Acquisition in Living Trees

2.3. Data Augmentation Methods

2.3.1. Scaling Data Augmentation

2.3.2. Noise Data Augmentation

2.3.3. Time-Warping Data Augmentation

2.3.4. Mixed Data Augmentation

2.4. Design of Attention Mechanism

2.5. MS1DCNNAM Model

2.6. Comparative Analysis of Prediction Models

2.7. Model Construction for Different Diameter Ranges

2.8. Model Construction for Different Tree Species

3. Results and Discussion

3.1. Comparative Analysis of Different Prediction Models

3.2. Mixed Model

3.3. Different DBH Sub-Models

3.4. Single-Species Sub-Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI