Next Article in Journal
Genome-Wide Identification of the Double B-Box (DBB) Family in Three Cotton Species and Functional Analysis of GhDBB22 Under Salt Stress
Next Article in Special Issue
Joint Selection for Growth and Leaf Color in Superior Trees of Sapium discolor in Fujian Province, China
Previous Article in Journal
Functional Characterization of Floral Gene Network Reveals a Critical FT1–AP1 Interaction in Flowering Regulation in Longan
Previous Article in Special Issue
Soil Microbes Mediate Productivity Differences Between Natural and Plantation Forests
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Intelligent Wood Species Identification Method Based on Multimodal Texture-Dominated Features and Deep Learning Fusion

1
College of Materials and Chemical Engineering, Southwest Forestry University, Kunming 650224, China
2
College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming 650224, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Plants 2026, 15(1), 108; https://doi.org/10.3390/plants15010108
Submission received: 24 November 2025 / Revised: 22 December 2025 / Accepted: 26 December 2025 / Published: 30 December 2025

Abstract

Aimed at the problems of traditional wood species identification relying on manual experience, slow identification speed, and insufficient robustness, this study takes hyperspectral images of cross-sections of 10 typical wood species commonly found in Puer, Yunnan, China, as the research object. It comprehensively applies various spectral and texture feature extraction technologies and proposes an intelligent wood species identification method based on the fusion of multimodal texture-dominated features and deep learning. Firstly, an SOC710-VP hyperspectral imager is used to collect hyperspectral data under standard laboratory lighting conditions, and a hyperspectral database of wood cross-sections is constructed through reflectance calibration. Secondly, in the spectral space construction stage, a comprehensive similarity matrix is built based on four types of spectral similarity indicators. Representative bands are selected using two Max–Min strategies: partitioned quota and coverage awareness. Multi-scale wavelet fusion is performed to generate high-resolution fused images and extract interest point features. Thirdly, in the texture space construction stage, three types of texture feature matrices are generated based on the PCA first principal component map, and interest point features are extracted. Fourthly, in the complementary collaborative learning stage, the ST-former model is constructed. The weights of the trained SpectralFormer++ and TextureFormer are imported, and only the fusion weights are optimized and learned to realize category-adaptive spectral–texture feature fusion. Experimental results show that the overall classification accuracy of the proposed joint model reaches 90.27%, which is about 8% higher than that of single-modal models on average.

1. Introduction

Wood has always been one of the essential renewable materials in people’s lives [1]. Due to its natural characteristics such as beautiful texture and color, good elasticity, light weight, and easy processing, wood plays a pivotal role in fields such as structural materials, decoration, and energy fuels and is one of the main substrates in people’s production and life [2,3,4]. However, driven by interests, counterfeiting and shoddy phenomena often occur in the wood trading process. This behavior of passing off inferior products as high-quality ones and passing off fake products as genuine ones has brought very serious negative impacts on the wood trading market [5,6,7]. At the same time, wood illegally logged accounts for as high as 10% of the total wood cut worldwide, causing great damage to the environment [8]. To protect the legitimate rights and interests of consumers, the use of convenient and accurate intelligent methods for wood species identification has become one of the research directions of many scholars [9].
Wood species identification methods are mainly divided into macro and micro categories. Macro identification judges the species through the color, texture, and smell of wood. Although this method is simple, it is prone to misjudgment because the features may not be significant or may be easy to counterfeit. Micro identification identifies tree species by means of the microscopic structure of wood (such as pore characteristics, wood ray characteristics, etc.). According to the different magnification, it can be further divided into high-magnification identification and low-magnification identification [10,11,12,13]. In recent years, innovative technologies such as near-infrared spectroscopy (NIRS), isotope and mineral element methods, and DNA molecular markers have provided important approaches for wood species identification [14].
As an analytical technology based on molecular vibration and rotational energy level transitions, NIRS achieves qualitative and quantitative analysis of substance components by detecting the absorption spectrum of molecules in the near-infrared region [15,16]. Richardson et al. [17] collected near-infrared spectral data of wood powder from different regions and converted the data with a mathematical relationship calibration model, which realized the spectral conversion between intact wood and wood powder. Zhuang et al. [18] conducted a feasibility study on the origin traceability of Pterocarpus santalinus based on peak and valley feature extraction. The main component content of the same wood from different origins is different. The vibrational absorption of hydrogen-containing groups in the near-infrared spectrum can reflect this difference information. The experimental results show the high sensitivity and accuracy of this method. Raobelina et al. [19] extracted key spectral feature variables of wood samples and used principal component analysis to reduce the dimensionality of the original high-dimensional wavenumber variables, which significantly reduced the number of wavenumber variables involved in modeling. Experimental results show that compared with traditional discriminant models, this method greatly reduces the model complexity while maintaining high classification accuracy, providing a reliable analytical method for wood species identification. Peng et al. [20] used three methods to preprocess the spectrum to eliminate interference information and then used a variety of machine learning algorithms to achieve good stability of the neural network model and high accuracy of wood species identification.
As a natural organic material, wood experiences isotope fractionation during its growth process under the combined effects of climatic conditions, ecological environment, and metabolic activities in organisms [21]. This fractionation phenomenon is manifested in significant differences in the abundance of isotopes from different pathways in wood [22]. Chen et al. [23] proposed that C3 and C4 plants have different fractionations when fixing CO2 through photosynthesis and are affected by environmental factors (such as drought and light intensity), thereby distinguishing wood from arid and humid areas (stomatal closure of plants under drought conditions leads to enrichment). Landry et al. [24] showed that oxygen isotopes (δ18O) reflect the isotopic composition of water sources (precipitation, groundwater), which is affected by temperature, altitude, and latitude, distinguishing wood from different river basins or altitudes. He et al. [25] revealed that plant water comes from precipitation, which is related to latitude and the continental effect. Tonouewaj et al. [26] confirmed, through the study of nearly 1000 wood samples collected from different regions such as Cameroon, Congo, Gabon, and Indonesia, that the elemental fingerprint method is highly feasible and accurate in distinguishing the origin of wood at national and even more refined regional scales. Kanbayashi et al. [27] adopted inductively coupled plasma mass spectrometry (ICP-MS) technology to carry out high-sensitivity detection and analysis of trace elements. Stukonyte et al. [28] proposed a rapid non-destructive testing method for wood based on X-ray fluorescence spectroscopy (XRF). Hao et al. [29] proposed an analytical method based on laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS), which can realize high-precision and high-resolution detection of the distribution characteristics of mineral elements in tree growth rings.
The DNA of different wood species shows significant similarity on the whole. It is these seemingly small differences that provide an effective way to distinguish woods that are almost indistinguishable in structure and appearance. Yan et al. [30] proposed a more refined wood classification method, aiming to break through the limitation of the traditional classification system that only stays at the “genus” level. Tran et al. [31] focused on selecting three DNA barcode sequences with important identification value, namely, rpoC1, trnH-psbA, and ITS, as molecular markers, established a complete molecular biological detection system, and conducted scientific and accurate traceability identification of the origin information of wood samples. Mo et al. [32] designed and constructed a wood species traceability system based on high-resolution barcode technology, highlighting the application value of DNA barcode technology in the field of wood identification. Ajdary et al. [33] constructed a wood DNA fingerprint database by analyzing the unique DNA sequence polymorphism, single nucleotide polymorphisms (SNPs), and microsatellite markers of wood samples, combined with bioinformatics analysis and machine learning algorithms, so as to realize the accurate identification of different woods.
The advantages and disadvantages of the commonly used methods for identifying wood species are shown in Table 1.
In view of the above problems, this study proposes an intelligent wood species identification method based on the fusion of multimodal texture-dominated features and deep learning. The overall flow of the research method in this paper is shown in Figure 1. Firstly, the SOC710-VP hyperspectral imaging instrument was used to collect hyperspectral data under standard laboratory lighting conditions, and a wood cross-sectional hyperspectral database was constructed through reflectance calibration. In the spectral space construction stage, two Max–Min strategies are used to screen representative bands, and multi-scale wavelet fusion is performed to generate high-resolution fused images. In the texture space construction stage, three types of texture feature matrices are generated based on the PCA first principal component map. In the single model training phase, spectral branch SpectralFormer++and texture branch TextureFormer are constructed, and a multi-mode interest point guidance mechanism is introduced to realize fine modeling and feature complementation in texture domain. Finally, in the complementary collaborative learning stage, the ST former model is constructed, and the trained SpectralFormer++and TextureFormer weights are imported. Only the fusion weights are optimized to achieve category-adaptive spectral–texture feature fusion.

2. Data Collection and Preprocessing

2.1. Wood Samples

This study selects hyperspectral images of cross-sections of 10 different tree species widely distributed in Puer, Yunnan, China, as the research objects. These samples are collected from different tree bodies to ensure the diversity and representativeness of the research data.

2.2. Collection Method

The specific steps are as follows. First, a circular saw is used to cut along the cross-section from the breast height of the tree (about 1.3 m high) to form a cylindrical wood sample with a thickness of about 30 cm. Subsequently, an SOC710-VP hyperspectral imager and a microscope are used to collect hyperspectral images of the selected samples, and the magnification of the microscope is set to 40–50×. Figure 2 shows the completed image collection environment.

2.3. Dataset Preprocessing

Since the samples have circular cross-sections, the acquired images display rectangular frames, and the four-corner areas usually do not contain effective wood cross-sectional texture information, which is likely to introduce background noise. To ensure the reliability and practicality of the experimental data, ENVI Classic 5.6 is used to crop the collected wood cross-section images to the inscribed rectangular region of the wood cross-section. Figure 3 shows a comparison between the original image (42 bands) and the cropped image (42 bands). Figure 4 presents the collected hyperspectral images of the 10 wood species samples and their corresponding RGB images.

3. Methods and Analysis

In digital image processing, the structure of image data exhibits a significant hierarchical organization depending on the image type [34]. During hyperspectral image processing, each pixel at a given spatial location contains complete spectral information, manifested as a spectral reflectance curve composed of L discrete sampling points. This curve accurately characterizes the spectral properties of the spatial location across different wavelengths [35,36,37].

3.1. Multimodal Texture-Dominated Spectral Space Construction

3.1.1. A Hyperspectral Band Selection Method Based on Four Types of Indicators

Before processing hyperspectral image data, dimensionality reduction is usually required. Currently, dimensionality reduction methods for hyperspectral images are mainly divided into two categories: feature extraction and feature selection. In terms of feature extraction, the main algorithms include principal component analysis (PCA), Fisher Linear Discriminant Analysis (FLD), and Minimum Noise Fraction (MNF) [38,39,40].
To comprehensively consider the image features of wood hyperspectral images, this study proposes a hyperspectral band selection method based on four categories of indicators. Compared with traditional methods, this method shows significant advantages in subsequent feature extraction and classification. To further investigate the characteristics of the 3028 cropped images, this study conducted a band similarity analysis on a hyperspectral cube consisting of 128 bands for each sample. Four analytical techniques were employed in the analysis, namely, Fast Frequency Domain Differential Mapping (FDDM), Difference Hash Similarity (DHashM), Mutual Information (MIM), and Structural Similarity Index (SSIM). The integrated use of these techniques aims to provide a detailed description and evaluation of the differences between bands from multiple dimensions, ensuring the accuracy and completeness of the analysis results.
To address the inconsistency in value scales among different indicators, the similarity results of each indicator are first normalized, and a comprehensive similarity matrix S i j c o m b is then constructed through weighted fusion to characterize the overall similarity between any two spectral bands.
After obtaining the comprehensive similarity matrix S i j c o m b , a multi-strategy band selection mechanism is introduced. This mechanism includes two forms: Strategy A (regional quota + Max–Min) and Strategy B (coverage reward Max–Min), to ensure that the selected bands cover the visible, red, and near-infrared regions, while achieving a balance between information redundancy and diversity.
(1)
Strategy A: Partition quota and maximum–min distance method (Max–Min)
First, based on the wavelength range calibrated by the spectrometer (372.53–1038.57 nm), the entire spectrum is divided into three segments: the visible light region (VIS), the red light region (RED), and the near-infrared region (NIR). Then, based on the number of bands and their distribution characteristics in each segment, a quota ratio of 3:3:4 is set, and within each segment, several bands are selected using the maximum–min distance method (Max–Min).
Let the comprehensive difference matrix Δ i j = 1 S i j c o m b , and the selected band set be Ω . Then, in each iteration, the band that satisfies the condition of having the largest minimum distance is selected:
b * = a r g   m a x b Ω   m i n s Ω   Δ b , s
This continues until the set number of bands k is reached. This strategy effectively avoids redundancy caused by excessively dense bands and ensures the representativeness of each spectral segment.
(2)
Strategy B: Coverage-aware Max–Min method
To further improve the coverage of the selected bands across the three major spectral ranges, a coverage reward factor λ c o v is introduced, and a scoring function is defined:
s c o r e ( b ) = m i n s Ω Δ b , s × 1 + λ c o v , i f   r ( b ) C , 1 , i f   r ( b ) C .
where r ( b ) denotes the spectral region (VIS, RED, or NIR) to which band b belongs, and C denotes the set of spectral regions that have already been covered by the selected band set.

3.1.2. Band Fusion Method Based on Wavelet Transform

After obtaining the most representative set of bands from the spectral data of each sample, this study designs a multi-band fusion algorithm based on multi-scale wavelet transform to further integrate the complementary information contained in different bands, performing joint spatial and frequency domain optimization of the spectral information.
Let the hyperspectral cube obtained after band filtering be I b a n d R B × H × W , where B represents the number of bands. Based on the band index sets obtained from strategies A and B, the corresponding band images { I b a n d , b } b = 1 N b a n d are extracted, where N b a n d = 10 . This study uses the Daubechies-4 wavelet basis function and sets the decomposition level L = 2 to ensure a balance between resolution and stability.

3.1.3. Spectral Feature Extraction

After successfully acquiring wavelet fusion images, this study employs an interest point detection algorithm based on frequency domain demeshing suppression and multi-feature fusion scoring to extract the most representative texture regions from wood cross-sectional images.
(1)
Frequency domain notch degrid
Because periodic interference fringes often appear during the imaging process of experimental equipment, this paper first performs notch filtering on the input fused image I f u s e ( x , y ) in the frequency domain.
The image is Fourier transformed into F ( u , v ) , the grid peak center ( u i , v i ) is automatically estimated on the amplitude spectrum, and a Gaussian notch mask is constructed:
M ( u , v ) = i   1 α n e x p u u i ) 2 + ( v v i ) 2 2 σ n 2
where α n is the suppression intensity (0.85) and σ n is the notch radius (3.5). After mask multiplication and inverse transformation, a smooth image I d n ( x , y ) is obtained, which effectively eliminates regular noise.
(2)
Local enhancement and denoising
In the spatial domain, nonlocal means denoising (NL-Means, intensity 4.0) is employed, followed by joint enhancement using CLAHE (contrast-limited adaptive histogram equalization) and an anti-sharpening mask.
CLAHE adaptively enhances the brightness distribution using an 8 × 8 local mesh, with a clip limit of 2.0 to prevent excessive noise amplification. Subsequently, an anti-sharpening mask (radius 1.2, enhancement 1.3) and γ h correction ( γ h = 0.95 ) are applied to improve local texture contrast. For samples with uneven illumination, a homomorphic filtering mode can be selected, decomposing the logarithmic domain into low-frequency illuminance and high-frequency reflectance components and amplifying the reflectance term to correct brightness.
(3)
Multi-feature scoring map construction
To comprehensively measure the texture saliency of local regions, a weighted scoring map S k p t ( x , y ) is defined:
S k p t = 0.25 N H a r r i s + 0.15 N E n t r o p y + 0.15 N S o b e l + 0.15 N L a p l a c e + 0.15 N D o G + 0.15 N S N R
where N ( ) represents the linear normalization operation.
(4)
Point of interest extraction
After selecting a multi-feature scoring map S k p t ( x , y ) , a candidate point set is generated using Harris corner detection combined with local peak search. Let there be a relative threshold t = 0.02 and a minimum spacing d = 6 . While ensuring uniform spatial distribution, the K = 100 highest-scoring points are selected as the final interest points, sorted from highest to lowest significance value based on the scoring map.

3.2. Multimodal Texture-Dominated Texture Space Construction

To fully explore the differential information of wood cross-sectional texture structure, this paper establishes a two-dimensional feature space dominated by multimodal texture based on hyperspectral data. First, using the first principal component of the original hyperspectral image as the grayscale base image, three types of complementary texture features are extracted from it: Sobel edge features, second-order geometric moment features, and Gabor energy features.
(1)
Principal component grayscale background image generation
The first principal component was extracted from the original hyperspectral cube C R H × W × B ( B = 128 ) using principal component analysis (PCA):
I p c a = P C A 1 ( C )
PCA employs a randomized singular value decomposition solver, retaining only the principal variance directions to obtain a standardized grayscale image I p c a [ 0,1 ] , which serves as the underlying image for texture feature extraction.
(2)
Sobel edge features
The gradients are calculated in the horizontal and vertical directions using the Sobel operator:
G x = I p c a x ,   G y = I p c a y
After normalizing the gradient magnitudes in both directions, the gradients are stacked to form a two-channel edge feature map:
F s o b e l = [ N ( G x ) , N ( G y ) ] R H × W × 2
(3)
Second-order geometric moment characteristics
To characterize the local structural distribution of wood grain, a two-dimensional weight kernel of window size w = 7 is constructed, and the second-order geometric moments are calculated:
M 20 = I p c a × k 20 ,     M 02 = I p c a × k 02 ,   M 11 = I p c a × k 11
where
k 20 = x 2 ,   k 02 = y 2 ,   k 11 = x y
After convolution, the matrices are normalized and stacked to form a three-channel moment feature map:
F m o m e n t s = [ N ( M 20 ) , N ( M 02 ) , N ( M 11 ) ]
This mode can reflect the anisotropy and geometric distribution of texture.
(4)
Gabor energy characteristics
To characterize the texture frequency characteristics at different directions and scales, a Gabor filter θ { 0 , 30 , 60 , 90 , 120 , 150 } with six directions is constructed, and its kernel function is defined as follows:
g x , y ; θ = e x p x 2 + γ g 2 y 2 2 σ g 2 c o s 2 π x λ
where
x = x c o s   θ + y s i n   θ , λ = 8 , σ g = 0.56 λ , γ g = 0.5
The filtered energy for each direction is normalized and stacked to form a six-channel feature map:
F g a b o r R H × W × 6
This feature can capture directional, periodic textures such as wood rings and rays.
(5)
Intramodal Interest Point Filtering and Saving
Calculate the comprehensive score matrix on each modal feature map (gradient magnitude for Sobel modes, L2 norm for moment modes, and maximum energy response for Gabor modes) and select the top K = 100 salient points from each matrix.

3.3. Single Model Training

3.3.1. Spectralformer++

To further improve the physical interpretability and classification robustness of wood spectral feature modeling, this paper proposes SpectralFormer++ based on the SpectralFormer model [41]. This model optimizes the input modeling and feature normalization based on the characteristics of spectral data while keeping the Transformer backbone unchanged, thereby significantly improving the model performance and stability. As shown in Figure 5, the spectral derivative prior is extended through the first-order and second-order difference spectral input channels, enabling the model to perceive changes in spectral morphology at the input stage, thus enhancing its ability to represent fluctuations in curvature, peak shape, and slope.
To further standardize the numerical scale of input features and stabilize the training process, SpectralFormer++ introduces a lightweight linear embedding layer on top of the differential multi-channel input, uniformly projecting the three-channel input into a 64-dimensional embedding space:
E = L N ( X s p W + b ) , W R 3 × d , d = 64
Layer normalization is applied at the output to eliminate amplitude deviations between different bands. This structure enables the model to express the dynamic features of the spectrum at the input stage, simultaneously sensing the slope changes at the edges of the absorption band and the curvature features of the peak region, thereby more accurately distinguishing the differences in spectral response of different wood components.

3.3.2. TextureFormer

To further enhance the spatial representation and structural robustness of wood cross-sectional texture features, this paper proposes the TextureFormer model based on the hierarchical shift window architecture of the Swin Transformer [42,43,44]. This model optimizes input modeling, front-end fusion, and attention guidance while maintaining the core hierarchical structure and local attention mechanism. The improved model significantly enhances local texture modeling accuracy and overall classification robustness without increasing computational cost.
(1)
Multi-channel input modeling based on Fusion–Sobel–Moments–Gabor
Wood cross-sections exhibit significant anisotropy and multi-scale texture features. A single modality is insufficient to simultaneously describe complex structures such as vessel outlines, ray distribution, and periodic textures. Therefore, TextureFormer constructs four-modal complementary features at the input and stacks them along the channel dimension to form a 12-channel tensor X m m R H × W × 12 :
X m m = c o n c a t ( X f , X s , X m , X g )
This multimodal construction allows the model to explicitly obtain complementary information on “global layering, edge orientation, geometry, and periodic texture” at the input layer, providing rich priors for subsequent fusion and attention allocation. After normalization and alignment, all modalities correspond completely in space, providing geometric consistency for subsequent interest point guidance.
(2)
Texture Stem: Lightweight Blending and Channel Recalibration
To achieve high-quality fusion of 12-channel features without reducing spatial resolution, this paper designs the Texture Stem module shown in Figure 6.
F s t e m = C o n v 1 × 1 ( S E ( σ a ( B N ( D W C o n v ( σ a ( B N ( C o n v ( X m m ) ) ) ) ) ) ) )
  • Local fusion: The first layer of 3 × 3 convolution mixes the 12 modal channels into a 64-dimensional feature space, completing the initial coupling of multi-source features:
    X 1 = σ a ( B N ( C o n v 3 × 3 ( X m m ) ) ) R H × W × 64
  • Spatial refinement: Depthwise separable D W C o n v 3 × 3 convolution enhances spatial consistency while maintaining low parameter count:
    X 2 = σ a ( B N ( D W C o n v 3 × 3 ( X 1 ) ) )
  • Channel recalibration via SE mechanism:
    w = σ W 2 δ W 1 G A P ( X 2 ) ,   X 3 = X 2 w
  • High-dimensional embedding: The final 1 × 1 convolution increases the feature dimension to e m b e d _ d i m = 384 :
    T R H × W × 384
(3)
Interest-guided group attention mechanism
Although the Texture Stem fuses 12-channel multimodal texture features into a unified high-dimensional representation, the translation invariance of convolutional operations tends to weaken or even eliminate the spatial saliency distributions of different texture modalities. As a result, the model may fail to preserve the key spatial locations of structural texture features, such as edge orientations, frequency responses, and geometric abrupt changes. Therefore, multimodal texture fusion performed solely in the pixel domain is insufficient to explicitly model spatial saliency information that is critical for structural discrimination.
To compensate for the lack of explicit spatial saliency priors in convolution-based feature fusion, we propose a Modality-Guided Grouped Multi-Head Attention (MG-GMH) mechanism to explicitly model the spatial importance of key texture regions. Rather than encoding texture content itself, MG-GMH introduces structured spatial saliency constraints to guide the attention mechanism toward regions that are more critical for wood structural discrimination.
(1)
Interest-Based Saliency Modeling
For each sample, candidate interest points are extracted independently from four texture modalities, namely, fusion texture, Sobel, moments, and Gabor features. Each interest point is represented as ( x i , y i , s i ) , where ( x i , y i ) denotes the spatial coordinate and s i [ 0,1 ] denotes the original confidence score. To jointly consider confidence strength and spatial dispersion, a True Score is defined as
T i = α s i + ( 1 α ) 1 K i j S i   m i n 1 , d i j L
where d i j is the Euclidean distance between point i and a previously selected point j , L is a normalization constant, and α [ 0,1 ] controls the trade-off between confidence and spatial diversity. Based on the True Score, a fixed number of interest points are selected for each modality.
The selected interest points are then projected onto the Transformer patch grid with stride p , and a sparse token-level saliency map is constructed as
t x = x k p ,     t y = y k p ,       H m [ t y , t x ] = m a x H m [ t y , t x ] , T k
where H m denotes the token-level saliency distribution of modality m . This process aligns interest points from continuous spatial coordinates to discrete Transformer tokens, providing modality-specific saliency priors for subsequent attention guidance.
(2)
Modality-Based Attention Head Grouping
To prevent saliency interference across different texture modalities, MG-GMH introduces a modality-level grouping constraint at the attention head level. Given an attention head index h , its corresponding modality is determined as
h m o d 4 { F u s i o n ,   S o b e l ,   M o m e n t s ,   G a b o r }
This grouping strategy ensures that each group of attention heads is guided exclusively by the saliency prior of its assigned modality, thereby achieving modality-specific structural decoupling within the attention mechanism.
(3)
Saliency Bias Injection
Within each attention window, the token-level saliency vector of modality m is denoted as
h m = ( h 1 m , . . . , h N m )
An additive saliency bias matrix is constructed as
B h [ i , j ] = h j m
and injected into the attention logits of the h -th attention head:
A l o g i t h = Q K d + R P B + B h
The bias is broadcast along the key dimension, allowing tokens with higher saliency values to receive larger attention weights after normalization. In this manner, MG-GMH explicitly guides the attention mechanism to focus on structurally salient texture regions, as illustrated in Figure 7.

3.4. Complementary and Collaborative Learning Training

3.4.1. Method Overview

To fully leverage the complementarity of SpectralFormer++ and TextureFormer, this section adopts the complementary collaborative learning paradigm ST-former, which trains only the inter-class fusion weights. After the two branches have fully converged on their respective data and tasks, their parameters are fixed, and only the one-dimensional fusion coefficient set { λ c } k = 1 K for each class is trained to complete the subsequent complementary fusion.
Given the spectral branch logit T k s and the texture branch logit T k t of class k , the fused logit is defined as
T ~ k = ( 1 σ s ( λ c ) ) T k s + σ s ( λ c ) T k t

3.4.2. Training Objective and Loss Function

Only { λ c } are updated by minimizing the cross-entropy loss:
L C E = 1 B i = 1 B   l o g e x p ( T ~ y i ) k = 1 K   e x p ( T ~ k )
To prevent σ s ( λ c ) from collapsing to extreme values (i.e., overly biased toward 0 or 1), a mild regularization term is introduced:
L r e g = 1 K c = 1 K   ( σ s ( λ c ) 0.5 ) 2
The final objective is as follows:
L = L C E + β L r e g , β [ 10 4 , 10 2 ]
In practice, we found that β = 5 × 10 3 stabilizes training while avoiding suppression of personalized bias. If the majority of σ s ( λ c )   values approach approximately 0.5 after training, β can be appropriately decreased.

4. Experiment and Results

4.1. Experimental Analysis of Wood Spectral Space Construction

4.1.1. Experimental Analysis of Hyperspectral Band Selection

Taking the hyperspectral cube of the cross-section of Quercus aliena Blume wood as an example, similarity matrices are first calculated via four types of metrics (FDDM, DHashM, MIM, SSIM), and heatmaps are generated, as shown in Figure 8.
In this study, the equal weighting method is employed to allocate weights to various evaluation indicators, specifically assigning identical weight coefficients to the four core indicator categories. This weighting design aims to ensure that the characteristic attributes of each dimension receive equally important consideration during the analysis, w m = 0.25 thereby preventing features of certain dimensions from being weakened or exaggerated in the final evaluation results due to weight discrepancies. With this balanced weight allocation strategy, the contribution of features from different dimensions to the overall evaluation system can be maintained relatively balanced, making the final evaluation results more comprehensive and objective, as shown in Figure 9.
The multi-strategy band selection mechanism applies appropriate weighting to uncovered spectral regions based on differences, ensuring that the selected bands are more evenly distributed across the full spectrum and provide more comprehensive information. Finally, the top 10 band pairs with the lowest similarity are extracted from the comprehensive similarity matrix, as shown in Table 2.
Overall, the bands selected by Strategy A are relatively evenly distributed across the visible, red, and near-infrared regions, effectively reflecting the main reflection characteristics of wood in different bands. In contrast, Strategy B, due to the introduction of a coverage reward factor, yields selected bands that are more dispersed across the full spectrum, with stronger information complementarity.

4.1.2. Experimental Analysis of Wavelet Transform Band Fusion

During image reconstruction, the fused subband set ( A F , { H F , V F , D F } ) is processed via inverse wavelet transform to obtain a single fused image I F . To eliminate brightness differences between various samples and improve visualization effects, the fusion results were subjected to linear normalization. The results are converted to uint8 format, with the pixel range normalized to [ 0,255 ] . Finally, fused images are generated according to Strategies A and B, respectively, as shown in Figure 10.

4.1.3. Experimental Analysis of Spectral Feature Extraction

To avoid the influence of a single fusion scheme on subsequent detection, the wavelet fusion results of Strategies A and B are, respectively, enhanced and subjected to multi-feature weighted evaluation, yielding two score maps, S A ( x , y ) and S B ( x , y ) , as shown in Figure 11.
It should be noted that the Harris corner response is not only used as a corner strength feature for weighted calculation in the multi-feature score map but also employed for generating candidate points during the interest point extraction stage. The former reflects the saliency of local intensity changes, while the latter determines candidate locations; the two cooperate with each other to achieve high-precision interest point detection, as shown in Figure 12.
Finally, the 128-dimensional reflectance spectral features of the corresponding pixels are extracted from the original hyperspectral image at the interest point locations. The resulting spectral features possess strong physical interpretability and can reflect differences in the spectral responses of different wood components. Figure 13 shows the average spectral reflectance curves of ten tree species.

4.2. Experimental Analysis of Wood Texture Space Construction

The Sobel modality can effectively characterize structural features such as wood pores, fiber boundaries, and vessel distributions. As shown in Figure 14, the left panel displays the multi-feature score distribution map of the Sobel modality, while the right panel presents the corresponding interest point detection results.
The Moments modality characterizes the local texture energy distribution and shape orientation through second-order geometric moments, effectively capturing the scale differences and geometric configuration features of wood structure. As shown in Figure 15, the left image shows the score distribution map of the Moments modality, and the right image shows the results of its interest point detection.
The Gabor modality reflects the directional and frequency characteristics of textures and can effectively characterize the periodic radial distribution of wood textures. As shown in Figure 16, the left panel shows the multi-feature score distribution map of the Gabor modality, while the right panel shows the corresponding interest point detection results; the interest points are mainly concentrated in the high-frequency regions of texture energy.

4.3. Experimental Analysis of Single Model Training

(1)
Training and Performance Evaluation of the SpectralFormer++
The AdamW optimizer (learning rate 1 × 10 4 ), batch size = 256, and cross-entropy loss function are adopted for end-to-end training, with the number of training epochs set to 300. As shown in Figure 17, the training loss continuously decreases, while the training accuracy and validation accuracy steadily increase and tend to converge in the later stages, indicating that the model exhibits a stable training process and excellent generalization performance.
On the test set, SpectralFormer++ achieves an overall classification accuracy of 81.55%. As shown in Figure 18 (confusion matrix), categories such as Ailanthus, Eucalyptus Wild, and Red-Stemmed Nan exhibit high recognition accuracy on the diagonal, indicating that the model has excellent discriminative ability for tree species with distinct spectral features. However, for woods with similar spectral characteristics, such as Iron Knife Wood and Birch, White Birch, White Gun Barrel and Yellow Camphor Tree, a certain degree of confusion still exists, with their diagonal accuracies ranging from 0.54 to 0.74. This is mainly due to the highly similar reflectance characteristics of these tree species in the visible and near-infrared bands.
(2)
Training and Performance Evaluation of the TextureFormer
To verify the effectiveness of the TextureFormer in multimodal texture feature recognition tasks, training and testing are conducted on the fused multimodal texture dataset. The AdamW optimizer (learning rate 3 × 10 4 , batch size = 32, number of training epochs = 300) is adopted. During training, the model input is a 12-channel texture stacked image composed of four modalities (Fusion, Sobel, Moments, and Gabor), combined with interest point-guided coordinates as auxiliary input. As shown in Figure 19, the model’s loss continuously decreases during training, while the accuracy improves rapidly in the early stages of iteration and enters a stable phase after approximately 100 epochs.
On the test set, the overall classification accuracy of the TextureFormer model reaches 86.27%. Its confusion matrix is shown in Figure 20, demonstrating that the model performs well overall in recognition. From the figure, it can be seen that Quercus acutissima, White Gun Barrel, and Simao Pine achieved high classification accuracy, indicating that the model could effectively learn the key features of these tree species with distinctly different texture structures. However, there was still some confusion among certain samples. Particularly, categories such as Red Stemmed Nan, Eucalyptus wild, and White Birch—characterized by similar textures or grayscale distributions—showed relatively lower accuracy, indicating that the model still faced difficulty in distinguishing these subtle texture differences.

4.4. Complementary Collaborative Learning Parameter Training and Performance Evaluation

Figure 21 shows the λ k evolution curves of ten major tree species. The results indicated that all categories of σ s ( λ k ) experienced a phase of rapid fluctuation at the beginning of training, followed by gradual convergence and stabilization in different ranges, with the overall distribution concentrated between 0.4 and 0.6. This suggested that the model achieves a relatively balanced weight allocation between the spectral and texture modalities. Among them, the λ k values for Eucalyptus wild, Red Stemmed Nan, and other species are slightly lower, indicating that the model tends to rely more on SpectralFormer++ when distinguishing woods with prominent spectral features, whereas the higher λ k values for Yellow Camphor Tree, Betula, and similar samples suggest that their recognition mainly depends on the texture branch TextureFormer. Overall, the convergence of λ k confirms that the strategy of training only the fusion parameters effectively avoids overfitting and oscillation issues and provides good interpretability at the category level.
The test results of the joint model are shown in Figure 22. Compared with single-modality models, the diagonals of the row-normalized confusion matrix is more concentrated, and the overall classification accuracy increases to 92.27%. For categories such as Ailanthus and Quercus acutissima, the diagonal values exceed 0.9, indicating that the model can fully utilize the complementary information of spectral morphology and texture structure to achieve high-confidence discrimination. For categories like Iron Knife Wood and White Birch—characterized by similar textures and adjacent spectral features—partial confusion still exists, but their error rates are significantly reduced compared with single-modality models.
In addition, classes such as Birch, White Gun Barrel, and Simao Pine, which are easily confused in single-branch models, show significant improvement after joint training, with the proportion of off-diagonal elements reduced by more than half, indicating that the adaptive weight adjustment mechanism effectively enhances the efficiency of modality collaboration. Overall, the complementary collaborative model significantly improves the model’s generalization capability and classification robustness while considering both spectral and texture features, providing reliable experimental support for subsequent multimodal wood recognition. In summary, the ST-former model, through learnable inter-class fusion weights, effectively balances the importance of spectral and texture features, not only improving overall recognition accuracy but also maintaining interpretability and stability in the model structure, offering a reliable fusion approach for future multimodal wood recognition.

5. Discussion

The research method of this article is the perfect cross application of computer vision, spectroscopy, and artificial intelligence in wood science. This research method has unique advantages over wood species identification methods such as NIRS, stable isotopes and mineral elements, DNA, etc. Compared with the NIRS method [19,45,46], the method studied in this paper can learn the most discriminative microscopic features from the hyperspectral images of wood cross-sections, with higher recognition accuracy and more robust models. Compared with stable isotope and mineral element methods [24,47,48], the research method in this paper can establish a complex nonlinear mapping model between isotope/element data and tree species and origin, which can explore deeper and more complex patterns and achieve more accurate identification of wood species. Compared with the DNA method [32,49,50], the research method in this article is completely non-destructive and has a faster testing time. However, DNA sampling usually requires drilling or cutting a small amount of wood chips for testing.
The classification results shown in Figure 22 indicate that, even under the spectral–texture joint modeling framework, noticeable confusion still exists among certain tree species, suggesting that the joint model retains inherent limitations when distinguishing specific categories. Further analysis reveals that species with lower classification accuracy generally exhibit highly similar spectral reflectance curve shapes, while their cross-sectional textures show only subtle differences in vessel distribution, directional patterns, and structural scale. Under such conditions, neither spectral nor texture modalities can provide sufficiently discriminative information, and multimodal fusion alone is unable to fundamentally overcome the limited separability of the original features. As a result, the performance improvement of the joint model for highly similar categories remains constrained.
From a modeling perspective, the current joint framework primarily relies on one-dimensional spectral sequences and two-dimensional cross-sectional texture features for classification. While this design is effective for many tree species, it imposes an upper bound on representational capacity when involving species that are highly similar in structure and composition. In particular, higher-level structural differences within wood, such as cell-scale organization, micro-anatomical characteristics, or three-dimensional spatial distributions, are not explicitly modeled in the current approach. This limitation further restricts the discriminative capability of the joint model for certain species.
In addition, environmental factors associated with tree growth are not incorporated into the current recognition framework, although such factors may introduce discriminative information that is not captured by spectral and texture features alone. Previous studies have shown that growth environment conditions, including climate factors, temperature variation, precipitation, and soil nutrient availability, can influence wood formation processes and are partially reflected in wood structural and material characteristics [21,51]. For tree species with highly similar spectral and texture features, variations induced by growth environments may provide supplementary cues that help reduce inter-class ambiguity.
Based on the above analysis, future work aimed at improving classification accuracy across all categories—particularly for highly similar species—may consider extending the current spectral–texture joint framework in several directions. First, incorporating samples collected from diverse growth regions and environmental conditions could enhance the model’s ability to capture intra-class variability. Second, integrating higher-resolution texture information, micro-anatomical features, or three-dimensional structural representations may help compensate for the limitations of one-dimensional spectral and two-dimensional texture features. Finally, environmental parameters such as climatic and nutrient indicators could be introduced as sample-level conditional information or as auxiliary constraints at the decision stage, thereby complementing existing features from a growth-mechanism perspective and improving discrimination among highly similar tree species.

6. Conclusions

This paper addresses the issues associated with traditional wood identification methods, such as low efficiency, susceptibility to noise interference, and difficulty in distinguishing similar tree species, and proposes an intelligent wood species identification method based on the integration of multimodal texture-dominant features and deep learning. Based on hyperspectral imaging data, this method constructs deep feature representations from two perspectives (spectral domain and texture domain) and combines learnable modal fusion weights to achieve the unity of high precision and interpretability in wood recognition. In terms of spectral feature modeling, this paper designs the SpectralFormer++ model and introduces a spectral derivative prior and a front-end embedding normalization mechanism, effectively alleviating the instability caused by band redundancy and amplitude differences and improving the morphological expression ability of spectral features and the convergence stability of the model. For texture feature modeling, the TextureFormer model is proposed, which fully captures the fine-grained texture structures and directional patterns of wood cross-sections through a fusion-based Texture Stem and a grouping attention mechanism for multimodal input. Based on the output results of the two models, the ST-former complementary collaborative fusion framework is further constructed, and inter-class adaptive weight learning is realized, enabling the model to dynamically balance the contributions of spectral and texture features for different wood categories.
The experimental results show that the overall classification accuracy of the proposed joint model on 10 typical wood datasets from Yunnan reaches 90.27%, which is approximately 8% higher than that of single-modal models. Among these species, Eucalyptus rudis Endl, Phoebe rufescens H. W. Li, and other tree species with significant texture differences achieve the highest recognition accuracy, while Betula platyphylla Sukaczev, Senna siamea (Lam.) H. S. Irwin & Barneby, and other tree species with similar spectra exhibit a significantly lower confusion rate. This proves the effectiveness of the complementary synergistic mechanism in feature fusion and modal decoupling. The inter-class distribution of parameters further elucidates the model’s interpretability: the values for texture-dominated tree species are significantly higher than those for spectrum-dominated classes, indicating that the model can adaptively learn the feature dependencies of different tree species.
In general, the spectral–texture collaborative learning framework proposed in this paper achieves a balance between recognition accuracy, stability, and interpretability for wood species recognition tasks and provides a novel and promising idea for multimodal intelligent recognition with potential for application in the field of hyperspectral wood recognition. Future work can extend this framework to larger-scale datasets with more tree species and integrate 3D spectral structures or microstructural images to develop a more versatile and robust wood recognition system.

Author Contributions

Conceptualization, Y.H. and Z.L.; methodology, T.Z.; software, H.L. and M.Q.; validation, R.N.; formal analysis, Y.M.; investigation, Q.F.; data curation, H.L.; writing—original draft preparation, M.C. and T.Z.; writing—review and editing, Y.H.; visualization, Q.F.; supervision, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key R&D Program Project of Yunnan Province, China (202503AA080013), and the Yunnan Fundamental Research Projects (202501AT070245); Research on Intelligent Perception and Mending Methods for Rotary-Cut Veneer in Plywood (YJS-KCJJ-2025-24).

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding authors. The data are not publicly available due to various data sources that have been introduced in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIAilanthusAilanthus altissima (Mill.) Swingle
BIBirchBetula alnoides Buch.-Ham. ex D. Don
EWEucalyptus WildEucalyptus rudis Endl
IKWIron Knife WoodSenna siamea (Lam.) H. S. Irwin & Barneby
QAQuercus AcutissimaQuercus aliena Blume
RSNRed Stemmed NanPhoebe rufescens H. W. Li
SPSimao PinePinus kesiya var. langbianensis (A. Chev.) Gaussen
WBWhite BirchBetula platyphylla Sukaczev
WGBWhite Gun BarrelFraxinus malacophylla Hemsl
YCTYellow Camphor TreeCinnamomum parthenoxylon (Jack) Meisn.
SymbolDescription
i , j Indices of spectral bands
S i j c o m b Comprehensive   similarity   between   spectral   bands   i   and   j , obtained by weighted fusion of multiple indicators
Δ i j Spectral   difference   between   band   i and   band   j ,   defined   as   Δ i j = 1 S i j c o m b
Ω Set of selected spectral bands
b Index of a candidate spectral band
b * Selected band maximizing the minimum distance to the current band set
k Target number of selected spectral bands
λ c o v Coverage reward factor in the coverage-aware Max–Min band selection strategy
r b Spectral   region   ( VIS ,   RED ,   or   NIR )   to   which   band   b belongs
C Set of spectral regions already covered by the selected band set
I b a n d Hyperspectral image cube after band filtering
B Number of spectral bands after selection
H , W Spatial height and width of the hyperspectral image
I b a n d , b Image   corresponding   to   the   b -th selected spectral band
N b a n d Number   of   selected   band   images   used   for   fusion   ( N b a n d = 10 )
L Decomposition level of the wavelet transform
d b 4 Daubechies-4 wavelet basis function
I f u s e ( x , y ) Input fused image in the spatial domain
F ( u , v ) Fourier transform of the input image
( u i , v i ) Center   coordinates   of   the   i -th periodic interference peak in the frequency domain
M ( u , v ) Gaussian notch filter mask in the frequency domain
α n Suppression intensity of the notch filter
σ n Radius of the Gaussian notch filter
I d n ( x , y ) Image after notch filtering and inverse Fourier transform
γ h Gamma correction parameter used in local enhancement
C R H × W × B Original hyperspectral image cube
I p c a Grayscale base image obtained from the first principal component of the hyperspectral cube
G x , G y Horizontal and vertical Sobel gradient responses
F s o b e l Two-channel Sobel edge feature map
N ( ) Linear normalization operator applied to feature response maps
w Window size for local geometric moment computation
k 20 , k 02 , k 11 Second-order geometric moment kernels
M 20 , M 02 , M 11 Second-order geometric moment response maps
F m o m e n t s Three-channel normalized geometric moment feature map
g x , y ; θ Gabor   filter   kernel   at   orientation   θ
θ Gabor filter orientation angle
x , y Rotated spatial coordinates in the Gabor filter
λ Wavelength of the Gabor filter
σ g Standard deviation of the Gaussian envelope in the Gabor filter
γ g Spatial aspect ratio of the Gabor filter
F g a b o r Six-channel Gabor energy feature map
S k p t ( x , y ) Multi-feature scoring map for texture saliency
H a r r i s Harris corner response map
E n t r o p y Local entropy feature map
S o b e l Gradient magnitude feature map
L a p l a c e Laplacian response map
D o G Difference-of-Gaussians response map
S N R Signal-to-noise ratio feature map
t Relative threshold for interest point detection
d Minimum spatial distance between interest points
K Number   of   selected   interest   points   ( K = 100 )
X s p Three-channel spectral derivative input token consisting of original, first-order, and second-order spectral features
W Linear projection matrix in the spectral embedding layer
b Bias term of the spectral embedding layer
d Embedding   dimension   of   the   spectral   token   ( d = 64 )
E Layer-normalized spectral embedding output
L N ( ) Layer normalization applied to the spectral embedding
X f Fusion-based texture feature map
X s Sobel-based texture feature map
X m Second-order geometric moment texture feature map
X g Gabor energy-based texture feature map
X m m Multimodal texture feature tensor concatenated along the channel dimension
F s t e m Output feature map of the Texture Stem module
X 1 Intermediate   feature   map   after   the   first   3 × 3 convolution in the Texture Stem
X 2 Intermediate feature map after depthwise separable convolution in the Texture Stem
X 3 Channel-recalibrated feature map after SE attention
w Channel-wise attention weight generated by the SE mechanism
T High - dimensional   texture   embedding   after   the   final   1 × 1 convolution   ( T R H × W × 384 )
m Modality index corresponding to fusion, Sobel, moments, or Gabor texture modality
h Attention head index in the multi-head attention mechanism
T i True   Score   of   the   i -th interest point combining confidence strength and spatial dispersion
s i Original   confidence   score   of   the   i -th interest point
d i j Euclidean   distance   between   interest   point   i and   a   previously   selected   point   j
L Normalization constant in the True Score computation
α Trade-off parameter controlling confidence strength and spatial dispersion
p Patch stride used to map spatial coordinates to Transformer tokens
H m Token - level   saliency   map   of   modality   m
h m Token - level   saliency   vector   of   modality   m within an attention window
B h Modality - guided   saliency   bias   matrix   for   the   h -th attention head
A l o g i t h Attention   logits   of   the   h -th attention head with saliency bias injection
R P B Relative position bias in window-based self-attention
Q , K , V Query, key, and value matrices in the attention mechanism
T k s Spectral   branch   logit   of   class   k
T k t Texture   branch   logit   of   class   k
T ~ k Fused   logit   of   class   k
λ c Class-wise fusion coefficient in complementary collaborative learning
σ s ( ) Sigmoid function applied to fusion coefficients
L C E Cross-entropy loss for fused logits
L r e g Regularization loss preventing fusion weight collapse
L Final training objective combining classification and regularization losses
β Weighting factor for the regularization term
B Batch size
K Number of classes
K i Number   of   selected   interest   points   in   S i ( i . e . ,   K i = S i )
i Training sample index
k Class index
c Class index for fusion coefficient

References

  1. Chen, Y.; Meng, Y.; Zhang, J.; Xie, Y.; Guo, H.; He, M.; Shi, X.; Mei, Y.; Sheng, X.; Xie, D. Leakage Proof, Flame-Retardant, and Electromagnetic Shield Wood Morphology Genetic Composite Phase Change Materials for Solar Thermal Energy Harvesting. Nano-Micro Lett. 2024, 16, 196. [Google Scholar] [CrossRef]
  2. Mo, L.; Crowther, T.W.; Maynard, D.S.; Johan, V.D.H.; Ma, H.Z.; Lalasia, B.M.; Liang, J.J.; Sergio, D.M.; Nabuurs, G.J.; Reich, P.B. The global distribution and drivers of wood density and their impact on forest carbon stocks. Nat. Ecol. Evol. 2024, 8, 2195–2212. [Google Scholar] [CrossRef]
  3. Khoo, P.S.; Rizal, M.A.M.; Ilyas, R.A.; Yajid, M.A.M.; Shukur, A.H.; Yahya, M.Y.; Wahit, M.U. Unveiling Favorable Mechanical Properties of Lignocellulosic Wood—Reinforced Thermoplastic Composites as Future Green and Sustainable Materials. Fibers Polym. 2025, 26, 1425–1448. [Google Scholar] [CrossRef]
  4. Korjakins, A.; Sahmenko, G.; Lapkovskis, V. A Short Review of Recent Innovations in Acoustic Materials and Panel Design: Emphasizing Wood Composites for Enhanced Performance and Sustainability. Appl. Sci. 2025, 15, 4644. [Google Scholar] [CrossRef]
  5. DeLigne, L.; Fredriksson, M.; Thygesen, L.G.; Thybring, E.E. Influence of degradation products from thermal wood modification on wood-water interactions. J. Mater. Sci. 2025, 60, 3346–3364. [Google Scholar] [CrossRef]
  6. Zhu, Y.; Li, L. Wood of trees: Cellular structure, molecular formation, and genetic engineering. J. Integr. Plant Biol. 2024, 66, 443–467. [Google Scholar] [CrossRef] [PubMed]
  7. Fu, Z.; Lu, Y.; Wu, G.; Bai, L.; Daniel, B.R.; Lyu, J.; Liu, S.; Rojas, O. Wood elasticity and compressible wood-based materials: Functional design and applications. Prog. Mater. Sci. 2025, 147, 101354. [Google Scholar] [CrossRef]
  8. Hussein, A.; Bulbul, A.; Wu, Q.; Lin, H.; Kameshwar, H.; Mohammad, S. Effect of partial delignification and densification on chemical, morphological, and mechanical properties of wood: Structural property evolution. Ind. Crops Prod. 2024, 213, 118430. [Google Scholar]
  9. Everton, J.; Thiago, F.; Camila, C.; Miller, L.; Goncalves, D.; Oliveira, S.L.; Marangoni, B.; Cena, C. Making wood inspection easier: FTIR spectroscopy and machine learning for Brazilian native commercial wood species identification. RSC Adv. 2024, 14, 7283. [Google Scholar] [CrossRef]
  10. Reh, R.; Kristak, L.; Kral, P.; Pipiska, T.; Jopek, M. Perspectives on Using Alder, Larch, and Birch Wood Species to Maintain the Increasing Particleboard Production Flow. Polymers 2024, 16, 1532. [Google Scholar] [CrossRef]
  11. Niemz, P.; Sandberg, D. Critical wood-particle properties in the production of particleboard. Wood Mater. Sci. Eng. 2022, 17, 386–387. [Google Scholar] [CrossRef]
  12. Janceva, S.; Andersone, A.; Spulle, U.; Tupciauskas, R.; Papadopoulou, E.; Bikovens, O.; Andzs, M.; Zaharova, N.; Rieksts, G.; Telysheva, G.; et al. Eco-friendly adhesives based on the oligomeric condensed tannins-rich extract from alder bark for particleboard and plywood production. Materials 2022, 15, 3894. [Google Scholar] [CrossRef]
  13. Szadkowska, D.; Auriga, R.; Lesiak, A.; Szadkowski, J.; Marchwicka, M. Influence of Pine and Alder Woodchips Storage Method on the Chemical Composition and Sugar Yield in Liquid Biofuel Production. Polymers 2022, 14, 3495. [Google Scholar] [CrossRef] [PubMed]
  14. Lima, M.; Ramalho, F.; Trugilho, P.; Bufalino, L.; Moreira, M. Classifying waste wood from Amazonian species by near-infraredspectroscopy (NIRS) to improve charcoal production. Renew. Energy 2022, 193, 584–594. [Google Scholar] [CrossRef]
  15. Sydor, M.; Mirski, R.; Stuper-Szablewska, K.; Rogoziński, T. Efficiency of Machine Sanding of Wood. Appl. Sci. 2021, 11, 2860. [Google Scholar] [CrossRef]
  16. Singh, N.; Rana, A.; Badhotiya, G.K. Raw material particle terminologies for development of engineered wood. Mater. Today Proc. 2021, 46, 11243–11246. [Google Scholar] [CrossRef]
  17. Novaes, T.; Ramalho, F.; Araujo, E.D.; Lima, M.D.R.; Silva, M.G.; Ferreira, G.C.; Hein, P.R.G. Discrimination of amazonian forest species by NIR spectroscopy: Wood surface effects. Wood Prod. 2023, 81, 159–172. [Google Scholar] [CrossRef]
  18. Richardson, S.; Simeone, J.; Deklerck, V. The global wood species priority list: A living database of tree species most at risk for illegal logging, unsustainable deforestation, and high rates of trade globally. Wood Fiber Sci. 2023, 55, 31–42. [Google Scholar] [CrossRef]
  19. Zhuang, P.; Niu, J.; Cheng, J.; Lu, J.; Sun, J.; He, T. Spectral Identification of Pterocarpus santalinus Based on Peak and Valley Feature Extraction Technology. Spectrosc. Spectr. Anal. 2024, 44, 3463–3472. [Google Scholar]
  20. Raobelina, A.; Chaix, G.; Razafimahatratra, A.; Rakotoniaina, S.P.; Ramananantoandro, T. Use of a portable near infrared spectrometer for wood identification of four dalbergia species from Madagascar. Wood Fiber Sci. 2023, 55, 4–17. [Google Scholar] [CrossRef]
  21. Peng, H.; Yu, H.; Zhan, T.; Jiang, J.; Lyu, J. Photo-stabilization effect of extractives on the photo-degradation of Red pine. Eur. J. Wood Wood Prod. 2024, 82, 905–915. [Google Scholar] [CrossRef]
  22. Gao, J.; Kim, Y.; Liang, Y.; Qiu, J. Study on the Fractionation Law of C, H, O Stable Isotopes in Different Parts of Tree Plants. J. Southwest For. Univ. (Nat. Sci.) 2022, 42, 178–183. [Google Scholar]
  23. Broda, M.; Popescu, C.; Poszwa, K.; Roszyk, E. How thermal treatment affects the chemical composition and the physical, mechanical and swelling properties of Scots pine juvenile and mature wood. Wood Sci. Technol. 2024, 58, 1153–1180. [Google Scholar] [CrossRef]
  24. Chen, Z.; Xue, X.; Cheng, R.; Wu, H.; Gao, H.; Gao, Z. Geographical origin classification of Phoebe zhennan and Phoebe bournei by solid phase microextraction and gas chromatography-mass spectrometry. Wood Sci. 2023, 69, 21–35. [Google Scholar] [CrossRef]
  25. Landry, U.; Boivin, G.; Schorr, D.; Mottoul, M.; Mary, A.; Abid, L.; Carrere, M.; Laratte, B. Recent Developments and Trends in Sustainable and Functional Wood Coatings. Curr. For. Rep. 2023, 9, 319–331. [Google Scholar] [CrossRef]
  26. He, Y.; Sun, Q.; Guo, B.; Niu, R.; Wang, D. Study on Origin Traceability of Pueraria lobata Based on Mineral Element Fingerprint. J. Nucl. Agric. Sci. 2021, 35, 1565–1573. [Google Scholar]
  27. Tonouewa, J.; Biaou, S.; Assede, E.; Aguilar, A. Timber traceability, determining effective methods to combat illegal logging in Africa: A review. Trees For. People 2024, 18, 100709. [Google Scholar] [CrossRef]
  28. Kanbayashi, T.; Matsunaga, M.; Kobayashi, M. Cellular-level chemical changes in Japanese beech (Fagus crenata Blume) during artificial weathering. Holzforschung 2021, 75, 900–907. [Google Scholar] [CrossRef]
  29. Stukonyte, L.; Borrell, A.; Drago, M.; Lockyer, C.; Vikingsson, G. Effect of Formic Acid Treatment on Carbon and Nitrogen Stable Isotope Ratios in Sperm Whale Teeth Dentine. Rapid Commun. Mass Spectrom. 2023, 37, 9500–9523. [Google Scholar] [CrossRef]
  30. Hao, Y.; Lu, F.; Pyo, S.; Kim, M.; Ko, J.; Yan, X.; Ralph, J.; Li, Q. PagMYB128 regulates secondary cell wall formation by direct activation of cell wall biosynthetic genes during wood formation in poplar. J. Integr. Plant Biol. 2024, 66, 1658–1674. [Google Scholar] [CrossRef] [PubMed]
  31. Yan, H.; Shang, Y.; Wang, L.; Tian, X.; Tran, V.; Yao, L.; Zeng, B.; Hu, Z. Construction of a New Agrobacterium tumefaciens-Mediated Transformation System based on a Dual Auxotrophic Approach in Cordyceps militaris. J. Microbiol. Biotechnol. 2024, 34, 1178–1187. [Google Scholar] [CrossRef] [PubMed]
  32. Tran, U.; Thai, H.; Vu, T.; Nguyen, G.; Trinh, M.; Tran, H.; Pham, H.; Le, N. An efficient Agrobacterium-mediated system based on the pyrG auxotrophic marker for recombinant expression in the filamentous fungus Penicillium rubens. Biotechnol. Lett. 2023, 45, 689–702. [Google Scholar] [CrossRef] [PubMed]
  33. Mo, J.; Tamboli, D.; Haviarova, E. Prediction of the color change of surface thermally treated wood by artificial neural network. Eur. J. Wood Wood Prod. 2023, 81, 1135–1146. [Google Scholar] [CrossRef]
  34. Chen, F.; Zeng, Y.; Zeng, Y.; Cheng, Q.; Xiao, L.; Ji, J.; Hou, X.; Huang, Q.; Lei, Z. Tissue culture and Agrobacterium-mediated genetic transformation of the oil crop sunflower. PLoS ONE 2024, 19, 298–312. [Google Scholar] [CrossRef]
  35. Zoubir, B.; Imane, S.; Nassim, L.; Kenza, A.; Ibtihal, A. Fast and automatic solar module geo-labeling for optimized large-scale photovoltaic systems inspection from UAV thermal imagery using deep learning segmentation. Clean. Eng. Technol. 2025, 28, 101048. [Google Scholar]
  36. Lu, J.; Wang, K.; Ding, H.; Shao, Z.; Qin, R.; Huo, G. MSCA-Sp R-CNN: A segmentation algorithm for pneumonia small lesions integrating multi-scale channel attention and sub-pixel upsampling. Multimed. Syst. 2025, 31, 70–77. [Google Scholar] [CrossRef]
  37. Doaa, A.; Karma, M.; Sherin, M. Pan-BCS: An Enhanced Panoptic Biventricular 3D Cardiac Assistive Model Integrating Feature Pyramid Networks and Parallel Semantic Segmentation. In Proceedings of the 2024 International Conference on Machine Intelligence and Smart Innovation, Alexandria, Egypt, 12–14 May 2024; Volume 5, pp. 74–79. [Google Scholar]
  38. Oleh, Z.; Ihor, F. Using Neural Networks to Identify Objects in an Image. Comput. Des. Syst. Theory Pract. 2024, 6, 232–240. [Google Scholar]
  39. Bian, L.; Wang, Z.; Zhang, Y.; Li, L.; Zhang, Y.; Yang, C.; Fang, W.; Zhao, J.; Zhu, C.; Meng, Q.; et al. A broadband hyperspectral image sensor with high spatio-temporal resolution. Nature 2024, 635, 73–81. [Google Scholar] [CrossRef]
  40. Ahmad, M.; Ghous, U.; Usama, M.; Mazzara, M. WaveFormer: Spectral-Spatial Wavelet Transformer for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5502405. [Google Scholar] [CrossRef]
  41. Yang, J.; Yang, Z.; Wang, N.; Wang, Y.; Wang, L. DMTF-GAN: A Dual-Branch Mapping GAN With Transformer for Few-Shot Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 25654–25671. [Google Scholar] [CrossRef]
  42. Ullah, F.; Irfan, U.; Khalil, K.; Salabat, K.; Wang, Q.; Algamdi, S.; Aldossary, H. Squeeze-SwinFormer: Spectral Squeeze and Excitation Swin Transformer Network for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 21400–21418. [Google Scholar] [CrossRef]
  43. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5518615. [Google Scholar] [CrossRef]
  44. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Volume 42, pp. 2011–2023. [Google Scholar]
  45. Fu, R.; Fu, X.; Zhang, W.; Li, D.; Guan, C.; Zhang, H. A Qualitative and Quantitative NIRs Study on Larch Wood Surface Color Change by UV Light Irradiation. Spectrosc. Spectr. Anal. 2022, 42, 56–61. [Google Scholar]
  46. Noernberg, L.; Pasquini, C.; Cardoso, G.; Santos, O.; Quevedo, F.; Fernandes, M.; Missio, A.; Gomes, N. Application of near-infrared spectroscopy (NIRS) in the chemical characterization of Corymbia hybrid wood. J. Wood Chem. Technol. 2025, 45, 35–42. [Google Scholar] [CrossRef]
  47. Andrea, R.; Corona, C.; Poszwa, A.; Belingard, C.; Dominguez, M.; Stoffel, M.; Crivellaro, A.; Crouzevialle, R.; Cerbelaud, F.; Costa, G.; et al. Combining conventional tree-ring measurements with wood anatomy and strontium isotope analyses enables dendroprovenancing at the local scale. Sci. Total Environ. 2023, 858, 159887. [Google Scholar] [CrossRef] [PubMed]
  48. Buntgen, U.; Oulehle, F.; Oppenheimer, C.; Svoboda, J.; Kochergina, Y.; Rybnicek, M.; Kolar, T.; Novak, M.; Kempf, M.; Trnka, M. Potential and Limitations of Strontium Isotopic Fingerprinting in Wood. Geophys. Res. Lett. 2025, 52, e2025GL117556. [Google Scholar] [CrossRef]
  49. Kim, M.; Im, S.; Kim, T. DNA Barcodes for Wood Identification of Anatomically Similar Species of Genus Chamaecyparis. Forests 2024, 15, 1106. [Google Scholar] [CrossRef]
  50. Qi, J.; Gao, X.; Nan, J.; Ayaovi, A.; Zhao, M.; Fan, J.; He, H. Early detection and tracking of wood borers using improved environmental DNA aggregation and TaqMan quantitative PCR approaches in forests. Insect Sci. 2025, 52, 1–16. [Google Scholar] [CrossRef]
  51. Wang, Y.; Wang, T.; Crocetti, R.; Walinder, M. Effect of moisture on the edgewise flexural properties of acetylated and unmodified birch plywood: A comparison of strength, stiffness and brittleness properties. Eur. J. Wood Wood Prod. 2023, 82, 341–355. [Google Scholar] [CrossRef]
Figure 1. Overall flowchart of the research methodology.
Figure 1. Overall flowchart of the research methodology.
Plants 15 00108 g001
Figure 2. Instrument placement diagram.
Figure 2. Instrument placement diagram.
Plants 15 00108 g002
Figure 3. Comparison between the original image and the cropped image: (a) original image (band 42); (b) image after cropping (band 42).
Figure 3. Comparison between the original image and the cropped image: (a) original image (band 42); (b) image after cropping (band 42).
Plants 15 00108 g003
Figure 4. Hyperspectral and corresponding RGB images of 10 tree species samples: (a) Ailanthus altissima (Mill.) Swingle; (b) Betula alnoides Buch.-Ham. ex D. Don; (c) Eucalyptus rudis Endl; (d) Senna siamea (Lam.) H. S. Irwin & Barneby; (e) Quercus aliena Blume; (f) Phoebe rufescens H. W. Li; (g) Pinus kesiya var. langbianensis (A. chev.) Gaussen; (h) Betula platyphylla Sukaczev; (i) Fraxinus malacophylla Hemsl; (j) Cinnamomum parthenoxylon (Jack) Meisn.
Figure 4. Hyperspectral and corresponding RGB images of 10 tree species samples: (a) Ailanthus altissima (Mill.) Swingle; (b) Betula alnoides Buch.-Ham. ex D. Don; (c) Eucalyptus rudis Endl; (d) Senna siamea (Lam.) H. S. Irwin & Barneby; (e) Quercus aliena Blume; (f) Phoebe rufescens H. W. Li; (g) Pinus kesiya var. langbianensis (A. chev.) Gaussen; (h) Betula platyphylla Sukaczev; (i) Fraxinus malacophylla Hemsl; (j) Cinnamomum parthenoxylon (Jack) Meisn.
Plants 15 00108 g004aPlants 15 00108 g004b
Figure 5. Spectral derivative prior modeling diagram.
Figure 5. Spectral derivative prior modeling diagram.
Plants 15 00108 g005
Figure 6. Texture Stem module. The red solid squares and dashed squares illustrate the channel-wise feature excitation and recalibration introduced by the SE module, reflecting the enhancement and adjustment of channel feature responses.
Figure 6. Texture Stem module. The red solid squares and dashed squares illustrate the channel-wise feature excitation and recalibration introduced by the SE module, reflecting the enhancement and adjustment of channel feature responses.
Plants 15 00108 g006
Figure 7. Multimodal keypoint-guided attention mechanism. The color bar indicates the relative magnitude of the saliency bias B i , where warmer colors represent higher saliency values.
Figure 7. Multimodal keypoint-guided attention mechanism. The color bar indicates the relative magnitude of the saliency bias B i , where warmer colors represent higher saliency values.
Plants 15 00108 g007
Figure 8. Similarity heatmap of Q. aliena Blume based on four types of indicator bands: (a) FDDM; (b) DHashM; (c) MIM; (d) SSIM.
Figure 8. Similarity heatmap of Q. aliena Blume based on four types of indicator bands: (a) FDDM; (b) DHashM; (c) MIM; (d) SSIM.
Plants 15 00108 g008aPlants 15 00108 g008b
Figure 9. Fusion heatmap of Q. aliena Blume based on the integrated similarity matrix.
Figure 9. Fusion heatmap of Q. aliena Blume based on the integrated similarity matrix.
Plants 15 00108 g009
Figure 10. Comparison of wavelet fusion images under different strategies: (a) Strategy A; (b) Strategy B.
Figure 10. Comparison of wavelet fusion images under different strategies: (a) Strategy A; (b) Strategy B.
Plants 15 00108 g010
Figure 11. Multi-feature fusion evaluation score maps of Strategies A and B: (a) Strategy A; (b) Strategy B.
Figure 11. Multi-feature fusion evaluation score maps of Strategies A and B: (a) Strategy A; (b) Strategy B.
Plants 15 00108 g011
Figure 12. Detected interest points overlaid on the multi-feature evaluation map.
Figure 12. Detected interest points overlaid on the multi-feature evaluation map.
Plants 15 00108 g012
Figure 13. Average spectral reflectance curves of ten representative wood species across all heights.
Figure 13. Average spectral reflectance curves of ten representative wood species across all heights.
Plants 15 00108 g013
Figure 14. Score distribution map and detected interest points of the Sobel modality.
Figure 14. Score distribution map and detected interest points of the Sobel modality.
Plants 15 00108 g014
Figure 15. Score distribution map and detected interest points of the Moments modality.
Figure 15. Score distribution map and detected interest points of the Moments modality.
Plants 15 00108 g015
Figure 16. Score distribution map and detected interest points of the Gabor modality.
Figure 16. Score distribution map and detected interest points of the Gabor modality.
Plants 15 00108 g016
Figure 17. Training loss and accuracy curves of the SpectralFormer++ model.
Figure 17. Training loss and accuracy curves of the SpectralFormer++ model.
Plants 15 00108 g017
Figure 18. Row-normalized confusion matrix of the SpectralFormer++ model on the test set.
Figure 18. Row-normalized confusion matrix of the SpectralFormer++ model on the test set.
Plants 15 00108 g018
Figure 19. Training loss and accuracy curves of the TextureFormer model.
Figure 19. Training loss and accuracy curves of the TextureFormer model.
Plants 15 00108 g019
Figure 20. Row-normalized confusion matrix of the TextureFormer model on the test set.
Figure 20. Row-normalized confusion matrix of the TextureFormer model on the test set.
Plants 15 00108 g020
Figure 21. Convergence curves of category-wise fusion parameters λ k .
Figure 21. Convergence curves of category-wise fusion parameters λ k .
Plants 15 00108 g021
Figure 22. Row-normalized confusion matrix of the TextureFormer model on the test set.
Figure 22. Row-normalized confusion matrix of the TextureFormer model on the test set.
Plants 15 00108 g022
Table 1. Comparison of identification methods for wood species.
Table 1. Comparison of identification methods for wood species.
MethodCharacteristic MarkersAdvantageDisadvantage
NIRSDetection of absorption spectra of molecules in the near-infrared region.This method has the characteristics of fast, non-destructive, and efficient.Difficult to identify characteristic chemical substances, interference caused by overlapping adjacent chromatographic peaks affects the detection results.
Stable IsotopeIsotopic ratioHigh success rate of origin identification and minimal pollution.δ13C, δ2H, δ18O, etc., are susceptible to interannual/seasonal driven radial fractionation interference; high requirements for instruments and equipment; the database of stable isotopes in wood is lacking.
Mineral ElementsFeature elementThere are many types of candidate elements and high throughput.The process of selecting feature element tags is cumbersome and labor-intensive; easy to be disturbed by artificial agricultural activities such as fertilizer application; the database of mineral elements in wood is lacking.
DNACharacteristic gene fragmentsEasy to classify, good repeatability, high stability, and less susceptible to environmental interference.High quality DNA from deep processed wood products is difficult to obtain stably.
Table 2. Lowest-similarity top 10 band pairs of Q. aliena Blume.
Table 2. Lowest-similarity top 10 band pairs of Q. aliena Blume.
Strategy AStrategy B
Band_8 (409.24 nm)Band_8 (409.24 nm)
Band_11 (424.97 nm)Band_12 (430.22 nm)
Band_16 (451.20 nm)Band_16 (451.20 nm)
Band_58 (671.46 nm)Band_40 (577.06 nm)
Band_64 (702.93 nm)Band_67 (718.66 nm)
Band_69 (729.15 nm)Band_111 (949.42 nm)
Band_72 (744.88 nm)Band_115 (970.39 nm)
Band_115 (970.39 nm)Band_120 (996.61 nm)
Band_125 (1022.84 nm)Band_124 (1017.59 nm)
Band_128 (1038.57 nm)Band_127 (1033.33 nm)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhu, T.; Liang, Z.; Li, H.; Qin, M.; Niu, R.; Ma, Y.; Feng, Q.; Chen, M. Research on Intelligent Wood Species Identification Method Based on Multimodal Texture-Dominated Features and Deep Learning Fusion. Plants 2026, 15, 108. https://doi.org/10.3390/plants15010108

AMA Style

Huang Y, Zhu T, Liang Z, Li H, Qin M, Niu R, Ma Y, Feng Q, Chen M. Research on Intelligent Wood Species Identification Method Based on Multimodal Texture-Dominated Features and Deep Learning Fusion. Plants. 2026; 15(1):108. https://doi.org/10.3390/plants15010108

Chicago/Turabian Style

Huang, Yuxiang, Tianqi Zhu, Zhihong Liang, Hongxu Li, Mingming Qin, Ruicheng Niu, Yuanyuan Ma, Qi Feng, and Mingbo Chen. 2026. "Research on Intelligent Wood Species Identification Method Based on Multimodal Texture-Dominated Features and Deep Learning Fusion" Plants 15, no. 1: 108. https://doi.org/10.3390/plants15010108

APA Style

Huang, Y., Zhu, T., Liang, Z., Li, H., Qin, M., Niu, R., Ma, Y., Feng, Q., & Chen, M. (2026). Research on Intelligent Wood Species Identification Method Based on Multimodal Texture-Dominated Features and Deep Learning Fusion. Plants, 15(1), 108. https://doi.org/10.3390/plants15010108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop