Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism

Zhou, Guoqing; Qi, Haoxin; Shi, Shuo; Bi, Sifu; Tang, Xingtao; Gong, Wei

doi:10.3390/rs17142411

Open AccessArticle

Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism

by

Guoqing Zhou

^1,†

,

Haoxin Qi

^1,*

,

Shuo Shi

^2,3,

Sifu Bi

²

,

Xingtao Tang

⁴

and

Wei Gong

^2,3,4

¹

College of Geomatics and Geo-Information, Guilin University of Technology, Guilin 541004, China

²

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

³

Perception and Effectiveness Assessment for Carbon-Neutrality Efforts, Engineering Research Center of Ministry of Education, The Institute for Carbon Neutrality, Wuhan University, Wuhan 430079, China

⁴

Electronic Information School, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

^†

First author.

Remote Sens. 2025, 17(14), 2411; https://doi.org/10.3390/rs17142411

Submission received: 9 June 2025 / Revised: 10 July 2025 / Accepted: 10 July 2025 / Published: 12 July 2025

(This article belongs to the Special Issue Advanced Lidar Remote Sensing for Atmosphere, Vegetation, and Ocean Observations)

Download

Browse Figures

Versions Notes

Abstract

High-quality multispectral LiDAR (MSL) data are crucial for land cover (LC) classification. However, the Titan MSL system encounters challenges of inconsistent spatial–spectral information due to its unique scanning and data saving method, restricting subsequent classification accuracy. Existing spectral reconstruction methods often require empirical parameter settings and involve high computational costs, limiting automation and complicating application. To address this problem, we introduce the dual attention spectral optimization reconstruction network (DossaNet), leveraging an attention mechanism and spatial–spectral information. DossaNet can adaptively adjust weight parameters, streamline the multispectral point cloud acquisition process, and integrate it into classification models end-to-end. The experimental results show the following: (1) DossaNet exhibits excellent generalizability, effectively recovering accurate LC spectra and improving classification accuracy. Metrics across the six classification models show some improvements. (2) Compared with the method lacking spectral reconstruction, DossaNet can improve the overall accuracy (OA) and average accuracy (AA) of PointNet++ and RandLA-Net by a maximum of 4.8%, 4.47%, 5.93%, and 2.32%. Compared with the inverse distance weighted (IDW) and k-nearest neighbor (KNN) approach, DossaNet can improve the OA and AA of PointNet++ and DGCNN by a maximum of 1.33%, 2.32%, 0.86%, and 2.08% (IDW) and 1.73%, 3.58%, 0.28%, and 2.93% (KNN). The findings further validate the effectiveness of our proposed method. This method provides a more efficient and simplified approach to enhancing the quality of multispectral point cloud data.

Keywords:

airborne multispectral LiDAR; spectral reconstruction method; spatial–spectral feature; attention mechanism

1. Introduction

Land cover (LC) classification is critical for monitoring the ecological environment [1], land resources [2], and climate change [3]. Passive remote sensing has achieved excellent land cover classification results in previous developments due to its rich spectral information [4,5]. Similarly, active LiDAR has achieved excellent classification results by using the 3D spatial information it collects, distinguishing between height and location data [6,7]. As the demand for land cover classification increases, multi-source data fusion techniques have become a new research hotspot. LC classification accuracy can be effectively improved by integrating spectral information from remote sensing images with spatial information from LiDAR [8,9]. However, data fusion requires precise alignment of spatial–spectral features and unification of resolution. This task is difficult to achieve perfectly due to the differences in data attributes and acquisition methods.

Multispectral LiDAR (MSL) can acquire the 3D spatial and spectral information of ground objects, offering a new approach for multi-source data fusion [10,11,12]. Several research institutions have conducted studies to use MSL effectively. For example, Hakala et al. [13] from the Finnish Geodetic Institute (FGI) designed an eight-channel full-waveform LiDAR system that successfully captured Norway spruce’s 3D multispectral point clouds. Gong et al. [14], from Wuhan University, discovered that a specific combination of four wavelengths (556, 670, 700, and 780 nm) effectively characterizes ground object features. Thus, they designed a four-channel MSL system, which enhanced data acquisition efficiency and application range. Niu et al. [15] developed an MSL system with four wavelengths (531, 570, 670, and 780 nm) and successfully extracted the vertical distribution of vegetation biochemical component features. As technology matures, the platforms for MSL systems gradually transition from ground-based to airborne, enabling large-scale data collection. The Titan airborne MSL system (532, 1064, and 1550 nm) developed by Teledyne Optech in Canada has been applied for large-scale earth observation. It has been successfully applied to tree species classification [16] and LC classification [17].

Researchers have further assessed the potential and feasibility of using Titan data for LC classification [18,19]. The acquired data suffer from spatial–spectral inconsistency [20], given that each channel of Titan has different acquisition angles and is saved independently, thereby affecting classification accuracy when applied directly. Rasterization [21] is a common solution that can lose effective feature dimensionality and valuable information. To address this challenge, Wichmann et al. [22] proposed a method of nearest neighbor interpolation that directly unifies spatial and spectral features in the 3D space, avoiding the negative impact of feature dimensionality reduction. However, this method is unsuitable for scenarios with uneven point cloud density distribution because it leads to noticeable discontinuities in intensity. Thus, fundamentally resolving the spatial–spectral inconsistency problem becomes difficult.

Thus, researchers have proposed some methods to optimize the radius selection process and solve the issues caused by spatial–spectral reconstruction in uneven point cloud density distribution. Weinmann et al. [23] proposed a feature entropy-based approach for selecting the optimal neighborhood, simplifying the neighborhood radius selection process. However, the selected neighborhood radius is fixed and does not fundamentally resolve the uneven point cloud density problem. Jing et al. [24] proposed an adaptive neighborhood radius selection method tailored to varying point cloud densities, effectively addressing the issue of spectral discontinuity. However, the bilinear interpolation method causes distortion at the edges of objects in areas with lower point cloud density. Subsequently, Shi et al. [25] proposed segmenting different objects into independent point cloud blocks. Then, they performed interpolation on point clouds belonging to a single class and avoided spectral interference between different categories. Nevertheless, these methods involve numerous empirical algorithm parameters that render them difficult to accurately implement.

The end-to-end structure and automatic feature extraction method in deep learning can effectively solve the difficulty of setting many empirical parameters in traditional machine learning. In recent years, researchers have successfully applied an increasing number of deep learning-based methods to point cloud optimization [26,27,28,29]. These methods leverage the depth and learning capabilities of deep neural networks to minimize errors in specific application tasks (such as registration [30], reconstruction [31], segmentation [32], and classification [33]), enhancing task accuracy. This study also adopted these researches’ approaches to integrating the attention mechanism from deep learning to optimize the spectral reconstruction process of Titan data, improving classification accuracy and exploring optimal solutions. The main contributions of this research can be summarized as follows.

DossaNet is an improved approach based on an attention mechanism and a learnable module that aims to adaptively adjust the weighting coefficients based on spatial–spectral features. This approach enables spectral reconstruction while precisely providing multispectral point clouds for subsequent LC classification.
We propose a spatial–spectral attention (SSA) reconstruction module. Using a feature concatenation approach, SSA successfully integrates spatial and spectral features, thereby achieving the complementary advantages of multimodal features and enhancing spectral reconstruction.
Our spectral reconstruction approach demonstrates good generalizability and applies to most models. Compared with models without spectral reconstruction, our approach makes some improvements in many metrics, with an overall accuracy (OA) exceeding 82.80%. Specifically, the OA of PointNet++ increased by 4.8%, RandLA-Net improved by 5.93%, and DGCNN increased by 1%. Furthermore, our approach improves the classification accuracy of most models when compared with the IDW and KNN methods.

2. Area and Dataset Partitioning

The study area is located in the vicinity of the University of Houston, Texas, and its surrounding regions. The area features lush vegetation, a dense population, and complex LC distribution [34]. The research uses the Houston 2018 dataset [35], which the Teledyne Optech Titan MW system collected on 16 February 2017, and then released publicly during the 2018 IEEE GRSS Data Fusion Contest. The official classification label is a rough label obtained using an automated algorithm. It cannot fulfill the demands of subsequent tasks. Therefore, we select areas with relatively consistent ground object types and use the high-resolution aerial orthophoto data obtained simultaneously as a reference to annotate based on CloudCompare (2.13.2) [36] software. The LC types in the study area were broadly categorized into seven classes, and the proportion of each category is roughly identical to improve the overall generalization ability of the model. The specific proportion of each class is presented in Table 1. Then, this selected area was divided into 32 sub-regions (300 m × 300 m), with 24 sub-regions used for training and eight for testing. Figure 1 provides an overview of the study area.

3. Methodology

We propose a multispectral point cloud spectral reconstruction method leveraging spatial spectral attention, which can effectively improve the accuracy of classification tasks. This method enhances the end-to-end classification accuracy and automation by using the advantages of spatial–spectral features and the ability of the attention mechanism to adjust weights adaptively. Figure 2 illustrates the specific implementation process of the method in detail. Initially, the Titan single-channel point cloud is preprocessed to obtain a standard single-channel point cloud.

Then, spectral reconstruction is performed on the single-channel point cloud set, integrating the attention mechanism and DossaNet to obtain multispectral point clouds. Finally, six models are used to classify the obtained multispectral point clouds, validating the effectiveness of the multispectral point cloud spectral reconstruction method for classification across different models.

3.1. Single-Channel Point Cloud Preprocessing

Building upon the Titan point cloud processing framework proposed by Luo et al. [38] and Wang et al. [39], denoising and normalization operations are performed on the single-channel point cloud to enhance data quality.

The raw point cloud dataset contains an amount of noise. We use the statistical outlier removal (SOR) filter [40] in CloudCompare [36] to remove outliers statistically. In this study, the nearest neighbor point and the standard deviation multiple of the SOR filter were set to 6 and 1, respectively. However, some noise points persist in the denoised point cloud due to multipath effects, atmospheric conditions, or sensor imperfections. These remaining noise points are further eliminated through visual inspection.

After denoising, due to the presence of strongly reflective objects in the study area, the spectral values of the point cloud exhibit a wide range. Spectral normalization improves the optimization convergence speed of subsequent models, ensuring input for downstream deep-learning models and the visualization of classification results. When spectral features are derived from spectral reflectance calculations, the intensity values of each channel are initially converted into relative reflectance. Then, following the data processing method proposed by Wichmann et al. [22], we comprehensively considered data availability and the distribution rules of point cloud intensity. A spectral correction factor φ (φ = 0.98) is added to the spectral intensities of each channel. Additionally, an isotropic normalization method is applied to ensure that the point cloud data has the same scale across all channels, thereby avoiding differences caused by varying scales between channels. Finally, values exceeding one are removed, resulting in a point cloud with corrected reflectance values. The calculation formula for spectral normalization is as follows:

\{\begin{cases} (I_{n 1}, I_{n 2}, I_{n 3}) = (\min (I_{1 (i n)}), (I_{2 (i n)}), (I_{3 (i n)})) \\ (I_{m 1}, I_{m 2}, I_{m 3}) = φ (\max (I_{1 (i n)}), (I_{2 (i n)}), (I_{3 (i n)})) \\ δ = \{I_{m 1} - I_{n 1}, I_{m 2} - I_{n 2}, I_{m 3} - I_{n 3}\} \\ (I_{1 (o u t)}, I_{2 (o u t)}, I_{3 (o u t)}) = \frac{I_{1 (i n)} - I_{n 1}, I_{2 (i n)} - I_{n 2}, I_{3 (i n)} - I_{n 3}}{δ} \end{cases},

(1)

where

(I_{n 1}, I_{n 2}, I_{n 3})

indicates the minimum intensity value,

(I_{m 1}, I_{m 2}, I_{m 3})

denotes the maximum intensity value, φ corresponds to the spectral correction coefficient, δ refers to the scaling factor that adjusts the distribution range of the point cloud intensity, and

(I_{1 (o u t)}, I_{2 (o u t)}, I_{3 (o u t)})

represents the normalized intensity value.

3.2. Dual Spectral Reconstruction Method Based on Spatial–Spectral Attention

The nearest neighbor interpolation method is a commonly used interpolation technique in the spectral reconstruction of Titan multispectral point clouds. It uses the spatial correlations between features to obtain multispectral point clouds. Yang et al. [41] applied the nearest neighbor interpolation method to interpolate the spectra of the Houston 2017 dataset and successfully performed LC classification using the resulting multispectral point clouds. However, the nearest neighbor interpolation method is not fully applicable to all regions due to differences in point cloud density. To address this issue, we propose DossaNet, which is based on an attention mechanism and learnable methods. DossaNet consists of three modules: the spatial attention (SA) spectral reconstruction module, the SSA, and the loss function supervision module (Loss).

3.2.1. SA Spectral Reconstruction Module

The SA module is designed based on the attention mechanism and spatial features, as shown in Figure 3. A point set

P_{i} = {p_{i}, p_{i 1}, p_{i 2}, \dots, p_{i k}}

is selected through KNN, where

p_{i}

is the central point of set

P_{i}

,

p_{i k}

represents the k-nearest neighbors of

p_{i}

, and

{s p e}_{i k}

refers to the corresponding spectral value. In Figure 3, the spatial and spectral features are represented by yellow and blue rectangles, respectively.

We extract the high-dimensional feature set

{R P}_{i} = {{R P}_{i 1}, {R P}_{i 2}, {R P}_{i 3}, \dots, {R P}_{i k}}

corresponding to

P_{i}

to capture more feature differences within the local neighborhood. The relative position coordinates

∆ p_{i} = {∆ p_{i 1}, ∆ p_{i 2}, \dots, ∆ p_{i j}}

between each neighboring point

p_{i j}

and the central point

p_{i}

are calculated. Then, the high-dimensional feature set

P_{i}

corresponding to

{R P}_{i}

is derived using a Gaussian function. The specific calculation formulas are as follows:

G (f) = \exp (- 2 f^{2})

(2)

Δ p_{i} = p_{i j} - p_{i}

(3)

R P_{i j} = G (Δ p_{i j})

(4)

Subsequently, the attention mechanism is applied to the high-dimensional feature set

{R P}_{i}

to generate the spatial attention feature vector H for the neighboring point set, further extracting the deep relationships between the neighborhood features. The calculation formula is as follows:

H = θ (Re l u (B N ((γ (R P_{i j}))))

(5)

In Equation (5), γ and θ represent two different multilayer perceptrons (MLPs) with nonlinear activation functions, where Relu and BN denote the activation and normalization functions, respectively. Then, we use the softmax function to reduce redundant features in the feature vector, generating the spectral weight vector

W^{s p a}

of spatial feature defined as follows:

W_{i}^{s p a} = = \frac{\exp (H_{i})}{\sum_{i = 1}^{j} \exp (H_{i})},

(6)

where

W_{i}^{s p a}

and

H_{i}

represent the elements of the spectral weight vector

W^{s p a}

and the spatial attention feature vector H, respectively.

Finally, the weighted spectral value spe at point

p_{i}

is calculated by performing the dot product between the spatial feature spectral weight vector

W^{s p a}

and the corresponding spectral vector

{s p e}_{i}

from the neighboring points. The formula is as follows:

s p e = \sum_{j = 1}^{k} W_{i j}^{s p a} \cdot s p e_{i j}

(7)

3.2.2. SSA Spectral Optimization Module

In spectral reconstruction, relying on spatial features to obtain multispectral point clouds often results in some deviations, which can reduce the accuracy of subsequent classification. Mikael et al. [42] identified a correlation in spectral values among spatially close point clouds. This relationship can be leveraged for point cloud spectral reconstruction tasks. On the basis of this finding, we developed an SSA module to optimize the multispectral point clouds generated by the SA module. The SSA module integrates spatial and spectral features, further exploring the correlations among point clouds, improving spectral reconstruction accuracy. The SSA architecture is illustrated in Figure 4.

The SSA module further extracts spectral feature differences in the neighborhood using the SA module. A linear layer is primarily used to map the multispectral value matrix C of the neighborhood point set, obtaining the spectral features

f_{i j}^{k}

and relative spectral features

{∆ f}_{i j}^{k}

, where

C_{i} = {[m_{i 1}, m_{i 2}, \dots m_{i j}]}^{T}

,

m_{i j} = (c_{i j}^{1}, c_{i j}^{2}, c_{i j}^{3})

,

m_{i j}

indicates the multispectral point cloud of the neighborhood points of point i, and k denotes the spectral channel indices. Then, the spectral high-dimensional feature set

{S S}_{i}^{k}

is obtained by calculating a Gaussian function, as described in the following equation:

\{\begin{cases} f_{i j}^{k} = L i n e a r (c_{i j}^{k}), \\ Δ f_{i j}^{k} = f_{i j}^{k} - f_{i}^{k}, k = 1, 2, 3 \\ S S_{i j}^{k} = G (Δ f_{i j}^{k}) . \end{cases}

(8)

Subsequently, we combine the high-dimensional spatial feature set

{R P}_{i}

and the spectral high-dimensional feature set

{S S}_{i}^{k}

. Then, the spatial–spectral feature relationship is applied to compute the local feature matrix I of the neighborhood, as follows:

I = θ (Re l u (B N ((γ (R P_{i j} | | S S_{i j}^{k}))))

(9)

The symbol

∥

represents the connect operation; the definitions of

γ

and

θ

are the same as in (3). Then, we generate the spectral secondary optimization weight matrix

W^{s p e}

through the Softmax function, obtaining the optimized weights for each spectral channel. Finally, the optimized multispectral point cloud

m_{i}

at point

p_{i}

is computed using the spectral optimization weight matrix

W^{s p e}

and neighborhood point set multispectral matrix C, as expressed by the following equation:

W_{i j}^{s p e} = = \frac{\exp (I_{i j})}{\sum_{i = 1}^{j} \exp (I_{i j})},

(10)

m_{i} = \sum_{j = 1}^{3} (W_{i j}^{s p e} ⊙ C_{i}),

(11)

where

⨀

represents the Hadamard product.

3.2.3. Mask L1 Loss

The goal of spectral reconstruction is to ensure that the reconstructed spectral values approach the spectral values obtained by the Titan sensor. The L1 norm loss (L1 Loss) constrains the difference between the model’s predicted and actual values. However, due to each point cloud has only one true value for a single channel, Mask L1 Loss supervises the model. The calculation process is shown in Figure 5, and the specific mathematical formula is as follows:

m a s k_{i} = \frac{y_{i}}{‖y_{i}‖},

(12)

L o s s = \frac{1}{n} \sum_{1}^{n} ({‖(‖y_{i}‖) - X_{i} ⊙ m a s k_{i}‖}_{1}),

(13)

where

y_{i}

represents the actual values of the three-channel spectrum,

x_{i}

denotes the predicted values of the three-channel spectrum, and

{m a s k}_{i}

corresponds to the mask encoding. The actual values in the mask are set to 1, whereas other positions are set to 0.

3.2.4. Point Cloud Reconstruction Quality Evaluation

We use the average spectral angle mapper (SAM) as an evaluation metric to assess the quality evaluation of the obtained point cloud quantitatively. Chakravarty et al. [43] found a correlation between SAM and spectral. The smaller the SAM value, the stronger the correlation, as shown in Equations (14) and (15):

S A M_{i} = \arccos \frac{m_{i (p r e)} \cdot m_{i (optimize)}}{| | m_{i (p r e)} | | \cdot | | m_{i (optimize)} | |},

(14)

\bar{S A M} = \frac{1}{n} \sum_{i = 1}^{n} S A M_{i},

(15)

where

m_{i (p r e)}

represents the original point cloud spectrum, and

m_{i (o p t i m i z e)}

denotes the spectrum of the point cloud after reconstruction in the range of

{S A M}_{i} ϵ [0, \frac{Π}{2}]

.

3.3. Point Cloud Classification Verification Method

The study evaluates the effectiveness of DossaNet in multispectral point cloud classification using six classic deep learning models. These models include three classic convolutional neural networks, PointNet++ [44], DGCNN [45], and RandLA-Net [46], and two methods based on adaptive convolution kernels for point clouds, KPConv [47] and PAConv [48], as well as a Point Transformer [49], which is derived from the Transformer architecture. These models are widely used in the point cloud classification field. To ensure the validity of the tests, the study integrates DossaNet as a plug-and-play module, embedding it in each model in an end-to-end manner. Then, the study is configured using the parameters outlined below, with the specific architecture and parameter configurations provided in Table 2. It is crucial to notice that down-sampling methods are crucial for subsequent point cloud processing. So, we employed different methods tailored to the characteristics of Titan MSL data and each model. Specifically, PointNet++, PAConv, and Point Transformer utilize farthest point sampling (FPS) to preserve the global distribution of points, ensuring hierarchical feature extraction while maintaining spatial coherence for attention mechanisms. DGCNN, however, does not require down-sampling because it can dynamically construct KNN graphs to capture local relationships without relying on predefined sampling strategies. RandLA-Net adopts random sampling (RS) to strike a balance between computational efficiency and feature retention, reducing potential information loss due to feature propagation. Finally, KPConv utilizes voxel sampling (VS) to align points with a voxel grid, thereby enabling efficient convolution operations through spatial discretization.

We use cross-entropy loss to supervise the classification network to achieve enhanced training results and alleviate the imbalance caused by the varying number of training samples across different categories. The mathematical formula is as follows:

C r o s s - E n t r o p y L o s s = - \sum_{i = 1}^{C} y_{i} \log (q_{i}),

(16)

where C represents the total number of categories,

y_{i}

indicates the correct classification label, 1 denotes correct classification, and 0 corresponds to incorrect classification.

q_{i}

refers to the probability that the predicted sample belongs to the

i

category.

All experiments were conducted on an Ubuntu 20.04 operating system. The hardware configuration consisted of an Intel Xeon^® Gold 6148 CPU, 400 GB RAM, and four 24 GB NVIDIA GeForce RTX 3090 GPUs. The training was carried out using the PyTorch 1.8 framework. According to the density of the Titan MSL point set and previous experience, when constructing the SA and SSA modules, the number of neighbors, k, was set to 6. During the training, the hyperparameters are presented in Table 3. During training, a checkpoint was created every two epochs, saving the model with the best mean intersection over the union (MIoU) metric. After training, the model that achieved the best performance was selected from the saved checkpoints and tested to serve as the final model.

4. Result

4.1. Single-Channel Point Cloud Preprocessing Results

Accurate preprocessing of single-channel point clouds is essential for enhancing spectral reconstruction. Consequently, the point clouds from the three Titan channels are processed sequentially, as illustrated in Figure 6 and Figure 7. Figure 6 depicts the denoising results of the single-channel point clouds, whereas Figure 7 shows the outcomes of intensity normalization. Post denoising, all point clouds were accurately assigned intensity values, and noise was effectively removed. The denoising effect for the 532 nm point cloud is the most pronounced, with noise at the edges basically eliminated. However, after denoising, certain point clouds exhibited high intensity values due to strongly reflective objects in the region, such as vehicle surfaces, white lane markings, and light-colored building facades. Therefore, 99.5% of the intensity values for each channel were selected as the maximum intensity for that channel’s point cloud. Subsequently, intensity values exceeding 1 were adjusted to 1 following normalization.

4.2. Dual MSL Cloud Reconstruction Based on SSA

We propose DossaNet, a model designed to extract deep information embedded in spatial–spectral features effectively. By leveraging an attention mechanism, DossaNet dynamically adjusts the interpolation weights in spectral reconstruction to ensure high-precision results. This approach also solves challenges, including difficulty determining interpolation parameters and inconsistencies in point cloud density. The structure of DossaNet comprises two main modules: the SA module and the SSA module. Theoretically, deeper networks typically yield improved performance [50]. We used the classification results of KPConv as the evaluation metric and tested the performance of different network structures to evaluate the effectiveness of DossaNet and identify the optimal structure. The results presented in Table 4 show that DossaNet is highly effective in spectral reconstruction. At the same time, it maintains a low parameter count and FLOPs. The “SA + 3 × SSA” model structure demonstrated the best performance, achieving the highest OA and AA accuracies of 95.03% and 92.29%, respectively. Thus, we have validated the effectiveness of DossaNet and identified the optimal structure.

The optimal spectral reconstruction performance is achieved when DossaNet adopts the “SA + 3 × SSA” structure. This network configuration was used for the spectral reconstruction of the Titan single-channel point cloud set, with the number of neighboring points set to 6. Then, the single-wavelength point cloud set is input into the “SA + 3 × SSA” spectral reconstruction module in order to generate the multispectral point cloud with unified spatial features. Figure 8 visualizes the results for a specific region (Area 2), depicting the state before spectral reconstruction (Subfigure A), the initial spectral reconstruction (Subfigure B), and the post-optimization with secondary spectral refinement (Subfigure C). The three-channel point clouds are displayed using false colors by RGB channels in CloudCompare [40]. In picture frames A, B, and C, three different LC types are shown: buildings, impervious ground, and trees, respectively. In A, the SA module effectively fills in the missing spectral information using spatial features, restoring the general spectral characteristics of the building. However, some finer details were still missing. Then, the SSA module used the spatial–spectral information to extract the relationship between the two, further enhancing the building’s details. These details not only conclude the boundaries of the target but also the realism of the building’s edges, rendering them easily distinguishable. In B and C, DossaNet also recovers the missing spectral details for impervious ground and trees. This improvement is primarily attributed to the rich spatial–spectral information within the local neighborhood and the modeling ability of the attention mechanism to establish cross-modal correlations. High-precision multi-spectral point clouds were achieved through efficient extraction of spatial–spectral features and spectral reconstruction.

To assess the quality of the multispectral point cloud after spectral reconstruction, we sequentially calculated the average SAM values for the spectral reconstruction of both the SA and SSA modules, along with the zero-value transformation rate for the SA module. The results are presented in Table 5. The average SAM values of the SA module are approximately equal to the zero-value transformation rate, indicating that the SA module effectively recovers missing spectral information by leveraging spatial features, thereby improving the quality of spectral reconstruction. Additionally, the SAM values of the SSA module are lower than those of the SA module, suggesting that the SSA module further enhances the spectral quality of the point cloud by utilizing spectral correlations. Furthermore, both the SA and SSA modules are designed based on an attention mechanism, and the effectiveness of these two modules further validates the overall effectiveness and versatility of DossaNet.

4.3. Classification of LC Types

The study evaluates the model’s classification performance using five quantitative assessment metrics: overall accuracy (OA), average accuracy (AA), kappa, mean intersection over union (MIoU), and intersection over union (IoU). Generally, the higher values of these evaluation metrics indicate improved model classification accuracy.

We used DossaNet and different methods to reconstruct multispectral point clouds of the study area and classified them using six models. The classification results and comparative results of different spectral reconstruction methods are presented in Table 6 and Table 7. The OA and kappa coefficient of our method for all models exceeded 82.80% and 0.7732, respectively, with the highest accuracy achieved by Point Transformer, reaching 95.33% for OA and 0.9390 for kappa. For different LC types (IoU), all models performed best in classifying trees, with an accuracy exceeding 91.45%, followed by grasslands, impervious surfaces, and buildings, which reached 77.61%, 65.75%, and 60.87%, respectively. However, some LC types could not be effectively recognized, and the main reasons can be attributed to two factors. On the one hand, the imbalance in the dataset caused difficulties for the models to extract meaningful features, affecting classification accuracy. For example, the four main LC types (trees, buildings, impervious ground, and grasslands) accounted for 96.5%, whereas the three other types only accounted for 3.5%. This imbalance directly led to decreased classification performance or even failure to recognize the three other LC types. On the other hand, the structure of the models also impacted classification accuracy. For instance, PointNet++ extracts relevant information by aggregating local features layer by layer. However, due to the small size, varied shape, and broad spectral distribution of targets, such as vehicles and power lines, many small target features are drowned out by larger target features or the background. As a result, PointNet++ cannot classify these targets accurately. Similarly, RandLA-Net uses a self-attention mechanism and a local feature aggregation module to compensate for information loss at key points. However, the random sampling method still results in some local and global information loss, preventing the model from extracting sufficient features to identify bare ground effectively.

To further verify the effectiveness of DossaNet, the classification results are compared with the classification results of no spectral reconstruction, IDW, and KNN. The experimental results indicate that DossaNet exhibits higher performance, further illustrating the advantages and generalizability of our approach. Overall, the result of our method improves OA of 4.8%, 1.01%, 5.93%, 0.73%, 0.56%, and 0.32% compared with those without spectral reconstruction. Compared with the IDW and KNN methods, the accuracy was improved or at par, with PointNet++, DGCNN, and PAConv improving OA of 1.33%, 0.86%, 0.02% and 1.73%, 0.28%, 0.05%, respectively, and the other models exhibiting the same results. This comparison result indicates that the fusion of spatial and spectral features provides additional effective features for spectral reconstruction, leading to more accurate multispectral point clouds. Furthermore, our method improved AA across most models, surpassing the results of models without spectral reconstruction, with increases of 4.47%, 0.35%, 4.27%, 2.30%, 3.52%, and 2.56%. Compared with the IDW and KNN methods, apart from RandLA-Net’s accuracy at par, the other models showed improvements of 2.32%, 2.08%, 0.79%, 1.55%, 0.60% and 3.58%, 2.93%, 1.43%, 1.84%, and 1.84%. Thus, our method not only enhances the overall classification performance but also alleviates the class imbalance issue.

To exhibit a more intuitive demonstration of the classification results using DossaNet in spectral reconstruction, we visualize the results for areas 2, 7, and 12 (Figure 9). Evidently, most models accurately identified and distinguished the LC types, depicting the shape and contours of the LC in various scenarios. However, in the case of PointNet++, the visualization results are less satisfactory due to the model architecture and sampling method.

Furthermore, in order to demonstrate the differences between our proposed method, approaches lacking spectral reconstruction (a), IDW (b), and KNN(c), we selected one scene from the classification results of each of the six models for comparative analysis. The visualized results are shown in Figure 10. In scenes A and F, trees and buildings have similar heights. Method (a) misclassifies buildings as trees, and methods (b) and (c) misclassify near the edges. Our method captures spectral features and the correlations between adjacent points and then further extracts deep features, successfully resolving the misclassification issues of the two other methods. This situation indicates that spectral information enhances the quality of multispectral point clouds. In scenes B and C, impervious ground and grassland exhibit similar shapes, causing difficulty for methods (a), (b), and (c) to identify the boundaries between the two types of terrain accurately. On the contrary, our method sharpens the boundaries of terrain features using spectral details, highlighting important features and enabling the model to capture and recognize them successfully. In scene D, methods (a), (b), and (c) struggle to identify the terrain using only spatial features effectively. Our method improves classification results significantly by integrating spatial and spectral features. In scene E, methods (a), (b), and (c) misclassify some bare land as grassland. However, our method accurately distinguishes between the two, yielding superior results. These detailed visualizations further validate the versatility and effectiveness of our spectral reconstruction method.

5. Discussion

The significance of this study lies in optimizing spectral reconstruction methods to improve LC or land use classification accuracy in urban areas. The research introduces an end-to-end spectral reconstruction method incorporating an attention mechanism and spatial–spectral features. This method can adaptively acquire and adjust the weighting factors, enhancing the overall automation of the process. It addresses the issues of spatial–spectral information inconsistency and the difficulty in determining appropriate weighting factors during spectral reconstruction. It also generates more realistic spectral values for multispectral point clouds, providing more accurate data for classification tasks.

In spectral reconstruction, we use a fixed neighborhood scale to extract spatial–spectral features and obtain reasonable weighting factors through model training. Feature extraction and subsequent classification tasks are influenced to varying degrees as the neighborhood scale changes [51]. We evaluated the results using the classification outcomes of KPConv as a performance metric to investigate the impact of neighborhood scale on spectral reconstruction performance. We compared spectral reconstruction results for different neighborhood scales, as shown in Table 8. Classification accuracy progressively declines with increases in the value of K. This negative impact is evident across OA, AA, and other indices. Therefore, this study selects K = 6, avoiding using larger scales. Moreover, as the neighborhood size increases, the features included within the neighborhood become more complex. This situation increases the introduction of irrelevant spectral information from unintended targets, degrading the performance of the feature extractor. This conclusion aligns with the findings of Chen et al. [52].

In addition to the neighborhood scale, spectral loss functions also influence spectral reconstruction results. Similarly, we evaluated the results using the classification outcomes of KPConv as a performance metric to investigate the impact of Mask L1 Loss on spectral reconstruction performance. We compared the spectral reconstruction results obtained with and without the inclusion of Mask L1 Loss, as shown in Table 9. The application of Mask L1 Loss leads to improvements in OA, AA, kappa, and MIoU, further demonstrating the effectiveness of Mask L1 Loss.

Although the proposed spectral reconstruction method has achieved promising results in the Titan point cloud, it still presents some limitations. Contrary to image data, 3D point clouds require more time-consuming and labor-intensive annotation, and some datasets contain annotation errors [53]. The classification accuracy of the fully supervised training method employed is limited by the accuracy of the annotations and the associated labeling costs, posing challenges for further improvement. Weakly supervised learning methods are known for their robustness to mitigate the impact of inaccurate annotations, offering significant application value in scenarios where annotation costs are high or labeling is complex [54]. These methods provide a new direction for future work. Our future research will focus on developing weakly supervised deep learning methods to extract more features with fewer labels, thereby improving training effectiveness and reducing costs.

6. Conclusions

High-precision multispectral Titan point clouds can effectively enhance the accuracy of subsequent LC classification. In this study, we propose a spectral reconstruction method that utilizes an attention mechanism to integrate spatial and spectral features. This method adaptively adjusts the weighted parameters in spectral reconstruction. It streamlines the acquisition process of Titan MSL point clouds, enabling end-to-end spectral reconstruction. Experiments validated the feasibility and effectiveness of the proposed method. Qualitative and quantitative assessments indicate that the proposed method has advantages in the spectral reconstruction of MSL point clouds, offering valuable insights for related research and applications. Although the proposed method achieves high accuracy in spectral reconstruction, an obvious limitation is that the collected MSL data only consists of three different wavelength channels, which is significantly fewer than the channels in hyperspectral LiDAR. In the future, we plan to obtain a partial hyperspectral point cloud dataset to further evaluate the effectiveness and generalizability of the proposed method on hyperspectral point clouds. Additionally, balancing accuracy and computational efficiency in large-scale scenarios will be another focus of our future research.

Author Contributions

Conceptualization, G.Z. and S.S.; Methodology, S.S. and S.B.; Software, H.Q. and S.B.; Validation, H.Q.; Formal Analysis, H.Q. and S.B.; Investigation, H.Q.; Resources, G.Z., W.G. and S.S.; Data Curation, X.T.; Writing—Original Draft Preparation, H.Q.; Writing—Review and Editing, G.Z., H.Q. and S.S.; Visualization, H.Q.; Supervision, S.S. and X.T.; Project Administration, G.Z., W.G. and S.S.; Funding Acquisition, G.Z., W.G. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42461050 and 42471413), Guangxi Surveying and Mapping LiDAR Intelligent Equipment Technology Mid-Test Base (grant #: Guike AD23023012), Guangxi Science and Technology Talent Grand Project (grant #: Guike AD19254002), Natural Science Foundation of Hubei Province (grant #:2024AFA069), State Key Laboratory of Spatial Datum Special Research Funding (grant#: SKLGIE2023-Z-3-1), and LIES-MARS Special Research Funding.

Data Availability Statement

The original data presented in the study are openly available at https://machinelearning.ee.uh.edu/2018-ieee-grss-data-fusion-challenge-fusion-of-multispectral-lidar-and-hyperspectral-data (accessed on 1 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, W.; Zhao, R.; Lu, H. Response of ecological environment quality to land use transition based on dryland oasis ecological index (DOEI) in dryland: A case study of oasis concentration area in Middle Heihe River, China. Ecol. Indic. 2024, 165, 112214. [Google Scholar] [CrossRef]
Buchner, J.; Yin, H.; Frantz, D.; Kuemmerle, T.; Askerov, E.; Bakuradze, T.; Bleyhl, B.; Elizbarashvili, N.; Komarova, A.; Lewińska, K.E.; et al. Land-cover change in the Caucasus Mountains since 1987 based on the topographic correction of multi-temporal Landsat composites. Remote Sens. Environ. 2020, 248, 111967. [Google Scholar] [CrossRef]
Eva, E.A.; Marzen, L.J.; Lamba, J.; Ahsanullah, S.M.; Mitra, C. Projection of land use and land cover changes based on land change modeler and integrating both land use land cover and climate change on the hydrological response of Big Creek Lake Watershed, South Alabama. J. Environ. Manage. 2024, 370, 122923. [Google Scholar] [CrossRef] [PubMed]
Tong, X.-Y.; Xia, G.-S.; Lu, Q.; Shen, H.; Li, S.; You, S.; Zhang, L. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens. Environ. 2020, 237, 111322. [Google Scholar] [CrossRef]
Li, W.; Sun, K.; Li, W.; Wei, J.; Miao, S.; Gao, S.; Zhou, Q. Aligning semantic distribution in fusing optical and SAR images for land use classification. ISPRS J. Photogramm. Remote Sens. 2023, 199, 272–288. [Google Scholar] [CrossRef]
Zhang, W.; Li, W.; Zhang, C.; Hanink, D.M.; Li, X.; Wang, W. Parcel-based urban land use classification in megacity using airborne LiDAR, high resolution orthoimagery, and Google Street View. Comput. Environ. Urban Syst. 2017, 64, 215–228. [Google Scholar] [CrossRef]
Jin, H.; Mountrakis, G. Fusion of optical, radar and waveform LiDAR observations for land cover classification. ISPRS J. Photogramm. Remote Sens. 2022, 187, 171–190. [Google Scholar] [CrossRef]
Xu, Z.; Guan, K.; Casler, N.; Peng, B.; Wang, S. A 3D convolutional neural network method for land cover classification using LiDAR and multi-temporal Landsat imagery. ISPRS J. Photogramm. Remote Sens. 2018, 144, 423–434. [Google Scholar] [CrossRef]
Zhang, Y.; Gao, H.; Zhou, J.; Zhang, C.; Ghamisi, P.; Xu, S.; Li, C.; Zhang, B. A cross-modal feature aggregation and enhancement network for hyperspectral and LiDAR joint classification. Expert Syst. Appl. 2024, 258, 125145. [Google Scholar] [CrossRef]
Yan, W.Y.; Shaker, A.; El-Ashmawy, N. Urban land cover classification using airborne LiDAR data: A review. Remote Sens. Environ. 2015, 158, 295–310. [Google Scholar] [CrossRef]
Chen, B.; Shi, S.; Gong, W.; Sun, J.; Guo, K.; Du, L.; Yang, J.; Xu, Q.; Song, S. A spectrally improved point cloud classification method for multispectral LiDAR. Int. Arch. Photogramm. Remote Sens. Spatial. Inf. Sci. 2020, XLIII-B3-2020, 501–505. [Google Scholar] [CrossRef]
Wang, L.; Lu, D.; Xu, L.; Robinson, D.T.; Tan, W.; Xie, Q.; Guan, H.; Chapman, M.A.; Li, J. Individual tree species classification using low-density airborne multispectral LiDAR data via attribute-aware cross-branch transformer. Remote Sens. Environ. 2024, 315, 114456. [Google Scholar] [CrossRef]
Hakala, T.; Suomalainen, J.; Kaasalainen, S.; Chen, Y. Full waveform hyperspectral LiDAR for terrestrial laser scanning. Opt. Express 2012, 20, 7119–7127. [Google Scholar] [CrossRef]
Gong, W.; Sun, J.; Shi, S.; Yang, J.; Du, L.; Zhu, B.; Song, S. Investigating the Potential of Using the Spatial and Spectral Information of Multispectral LiDAR for Object Classification. Sensors 2015, 15, 21989–22002. [Google Scholar] [CrossRef]
Niu, Z.; Xu, Z.; Sun, G.; Huang, W.; Wang, L.; Feng, M.; Li, W.; He, W.; Gao, S. Design of a new multispectral waveform LiDAR instrument to monitor vegetation. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1506–1510. [Google Scholar] [CrossRef]
Kukkonen, M.; Maltamo, M.; Korhonen, L.; Packalen, P. Multispectral airborne LiDAR data in the prediction of boreal tree species composition. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3462–3471. [Google Scholar] [CrossRef]
Chen, X.; Chengming, Y.E.; Li, J.; Chapman, M.A. Quantifying the carbon storage in urban trees using multispectral ALS data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3358–3365. [Google Scholar] [CrossRef]
Teo, T.A.; Wu, H.M. Analysis of land cover classification using multi-wavelength LiDAR system. Appl. Sci. 2017, 7, 663. [Google Scholar] [CrossRef]
Wang, Q.; Gu, Y. A discriminative tensor representation model for feature extraction and classification of multispectral LiDAR data. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1568–1586. [Google Scholar] [CrossRef]
Pan, S.; Guan, H.; Chen, Y.; Yu, Y.; Gonçalves, W.N.; Junior, J.M.; Li, J. Land-cover classification of multispectral LiDAR data using CNN with optimized hyper-parameters. ISPRS J. Photogramm. Remote Sens. 2020, 166, 241–254. [Google Scholar] [CrossRef]
Zou, X.; Zhao, G.; Li, J.; Yang, Y.; Fang, Y. 3D land cover classification based on multispectral LiDAR point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci 2016, XLI-B1, 741–747. [Google Scholar] [CrossRef]
Wichmann, V.; Bremer, M.; Lindenberger, J.; Rutzinger, M.; Georges, C.; Petrini-Monteferri, F. Evaluating the potential of multispectral airborne lidar for topographic mapping and land cover classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, II-3/W5, 113–119. [Google Scholar] [CrossRef]
Weinmann, M.; Jutzi, B.; Mallet, C. Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant features. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, II-3, 181–188. [Google Scholar] [CrossRef]
Jing, Z.; Guan, H.; Zhao, P.; Li, D.; Yu, Y.; Zang, Y.; Wang, H.; Li, J. Multispectral LiDAR point cloud classification using SE-PointNet++. Remote Sens. 2021, 13, 2516. [Google Scholar] [CrossRef]
Shi, S.; Tang, X.; Chen, B.; Chen, B.; Xu, Q.; Bi, S.; Gong, W. Point cloud data processing optimization in spectral and spatial dimensions based on multispectral LiDAR for urban single-wood extraction. ISPRS Int. J. Geoinf. 2023, 12, 90. [Google Scholar] [CrossRef]
Dovrat, O.; Lang, I.; Avidan, S. Learning to sample. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA; 2019; pp. 2760–2769. [Google Scholar]
Lang, I.; Manor, A.; Avidan, S. Samplenet: Differentiable point cloud sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)), Seattle, DC, USA; 2020; pp. 7578–7588. [Google Scholar]
Potamias, R.A.; Bouritsas, G.; Zafeiriou, S. Revisiting point cloud simplification: A learnable feature preserving approach. In Proceedings of the European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 586–603. [Google Scholar] [CrossRef]
Yang, Y.; Wang, A.; Bu, D.; Feng, Z.; Liang, J. AS-Net: An attention-aware downsampling network for point clouds oriented to classification tasks. J. Vis. Commun. Image Represent. 2022, 89, 103639. [Google Scholar] [CrossRef]
Ma, G.; Wei, H. A Novel Sketch-Based Framework Utilizing Contour Cues for Efficient Point Cloud Registration. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Kong, G.; Zhang, C.; Fan, H. Large-Scale 3-D Building Reconstruction in LoD2 from ALS Point Clouds. IEEE Geosci. Remote Sens. Lett. 2025, 22, 1–5. [Google Scholar] [CrossRef]
Lu, D.; Zhao, R.; Xu, L.; Zhou, J.; Gao, K.; Gong, Z.; Zhang, D. 3D-UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds. Int. J. Appl. Earth Obs. Geoinf. 2025, 136, 104401. [Google Scholar] [CrossRef]
Yao, J.; Zhang, B.; Li, C.; Hong, D.; Chanussot, J. Extended Vision Transformer (ExViT) for Land Use and Land Cover Classification: A Multimodal Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
Kia, H.Z.; Choi, Y.; Nelson, D.; Park, J.; Pouyaei, A. Large eddy simulation of sneeze plumes and particles in a poorly ventilated outdoor air condition: A case study of the University of Houston main campus. Sci. Total Environ. 2023, 891, 164694. [Google Scholar] [CrossRef]
Prasad, S.; Saux, B.L.; Yokoya, N.; Hansch, R. 2018 IEEE GRSS Data Fusion Challenge–Fusion of Multispectral LiDAR and Hyperspectral Data; IEEE Dataport: Hoston, MA, USA, 2020. [Google Scholar]
Cloud Compare Team. Cloud Compare (Version 2.13.2). 2022. [Software]. Available online: http://www.cloudcompare.org/ (accessed on 1 July 2025).
Fernandez-Diaz, J.C.; Carter, W.E.; Glennie, C.; Shrestha, R.L.; Pan, Z.; Ekhtari, N.; Singhania, A.; Hauser, D.; Sartori, M. Capability assessment and performance metrics for the Titan multispectral mapping lidar. Remote Sens. 2016, 8, 936. [Google Scholar] [CrossRef]
Luo, B.; Yang, J.; Song, S.; Shi, S.; Gong, W.; Wang, A.; Du, L. Target classification of similar spatial characteristics in complex urban areas by using multispectral LiDAR. Remote Sens. 2022, 14, 238. [Google Scholar] [CrossRef]
Wang, Q.W.; Gu, Y.F.; Yang, M.; Wang, C. Multi-attribute smooth graph convolutional network for multispectral points classification. Sci. China Technol. Sci. 2021, 64, 2509–2522. [Google Scholar] [CrossRef]
Rusu, R.B. Semantic 3d object maps for everyday manipulation in human living environments. KI Künstliche Intell. 2010, 24, 345–348. [Google Scholar] [CrossRef]
Yang, J.; Luo, B.; Gan, R.; Wang, A.; Shi, S.; Du, L. Multiscale adjacency matrix CNN: Learning on multispectral LiDAR point cloud via multiscale local graph convolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 855–870. [Google Scholar] [CrossRef]
Reichler, M.; Taher, J.; Manninen, P.; Kaartinen, H.; Hyyppä, J.; Kukko, A. Semantic segmentation of raw multispectral laser scanning data from urban environments with deep neural networks. ISPRS J. Photogramm. Remote Sens. 2024, 12, 100061. [Google Scholar] [CrossRef]
Chakravarty, S.; Paikaray, B.; Mishra, R.; Dash, S. Hyperspectral Image Classification using Spectral Angle Mapper. In Proceedings of the IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), Dhaka, Bangladesh, 4–5 December 2021; pp. 87–90. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. Proc. Adv. Neural Inf. Process. Syst. 2017, 30, 5099–5108. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA; 2020; pp. 11108–11117. [Google Scholar]
Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar] [CrossRef]
Xu, M.; Ding, R.; Zhao, H.; Qi, X. Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3173–3182. [Google Scholar]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.S.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar]
Chen, H.; Cheng, J.; Ruan, X.; Li, J.; Ye, L.; Chu, S.; Cheng, L.; Zhang, K. Satellite remote sensing and bathymetry co-driven deep neural network for coral reef shallow water benthic habitat classification. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104054. [Google Scholar] [CrossRef]
Liu, T.; Ma, T.; Du, P.; Li, D. Semantic segmentation of large-scale point cloud scenes via dual neighborhood feature and global spatial-aware. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103862. [Google Scholar] [CrossRef]
Chen, X.; Mao, J.; Zhao, B.; Tao, W.; Qin, M.; Wu, C. Building contour extraction from hypervoxel growth point cloud surface neighborhood azimuth geometric features. J. Build. Eng. 2025, 101, 111914. [Google Scholar] [CrossRef]
Wang, Y.; Sun, P.; Chu, W.; Li, Y.; Chen, Y.; Lin, H.; Dong, Z.; Yang, B.; He, C. Efficient multi-modal high-precision semantic segmentation from MLS point cloud without 3D annotation. Int. J. Appl. Earth Obs. Geoinf. 2024, 135, 104243. [Google Scholar] [CrossRef]
Huang, S.; Hu, Q.; Ai, M.; Zhao, P.; Li, J.; Cui, H.; Wang, S. Weakly supervised 3D point cloud semantic segmentation for architectural heritage using teacher-guided consistency and contrast learning. Autom. Construct. 2024, 168, 105831. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area. (a) Location of the study area and dataset partitioning. The area marked with orange text represents the training area, while the area marked with green text corresponds to the testing area. (b–d) The photos of the Titan MSL system [37]. (e) Titan MSL data. (f) LC types in the study area.

Figure 2. Specific implementation process of the multispectral point cloud spectral reconstruction method and classification result.

Figure 3. Architecture of the SA module.

Figure 4. Architecture of the SSA module.

Figure 5. Schematic diagram of Mask L1 Loss calculation process.

Figure 6. Denoising results of the single-channel point clouds. (a–c) Original point clouds before denoising. (d–f) Point clouds after denoising. The colors are rendered based on intensity, with low intensity represented in blue and high intensity in red.

Figure 7. Outcomes of single-channel point cloud intensity normalization.

Figure 8. Spectral reconstruction results for a specific area (Area 2). (a) Original spectra before reconstruction. (b) Spectral results after reconstruction with the SA module. (c) Spectra outcomes after reconstruction with the SSA module. In the figure, all channel points are displayed in false color. Frame A represents houses, frame B represents impervious surfaces, and frame C represents trees.

Figure 9. Visualization of the classification results using DossaNet in spectral reconstruction.

Figure 10. Comparison of spectral reconstruction methods in different models and scenes. Highlighted with black dash line boxes to indicate differences.

Table 1. Proportion of LC classes in the training and testing datasets.

LC Types	Impervious Ground	Grass	Tree	Building	Car	Power Line	Bare Land
Training (%)	28.9	26.2	23	18.4	1.7	1.0	0.8
Testing (%)	29.1	25.7	22.5	19.1	1.9	1.0	0.7
Difference (%)	0.2	0.5	0.5	0.7	0.2	0	0.1

Table 2. Specific architecture and parameter configurations of classification models.

Model	Down-Sampling Method	Per Layer Quantity of Point Cloud /Voxel Size in Down-Sampling	Neighborhood Point Search Method	Quantity of Neighborhood Points
PointNet++	FPS	8192, 2048, 512, 128, 32	SNS + MSG	16 + 32
DGCNN	×	×	KNN	20
RandLA-Net	RS	8192, 2048, 512, 128, 32	KNN	16
KPConv	VS	0.4, 0.8, 1.6, 3.2, 6.4	SNS	16
PAConv	FPS	8192, 2048, 512, 128, 32	KNN	1, 2, 4, 8, 16
Point Transformer	FPS	8192, 2048, 512, 128, 32	KNN	8+16

Table 3. The training hyperparameters.

Hyperparameters
Epoch	100
Batch size	8
Optimizer	AdamW
Learning rate	The initial rate is 1 × 10⁻³, and using a cosine annealing decay strategy to train 80 epochs, with the final rate to 1 × 10⁻⁵. Training was continued for an additional 20 epochs with a learning rate of 1 × 10⁻⁵
Loss function	Cross-Entropy and Mask L1
Dropout rate	0.5

Table 4. Effectiveness verification results and classification accuracy of different structures of DossaNet. Testing model is KPConv, bolded numbers represent the highest accuracy.

Structure	OA	AA	Kappa	MIoU	Params (M)	FLOPs (G)
None	0.9430	0.8999	0.9256	0.8618	/	/
SA	0.9493	0.9196	0.9339	0.8840	0.006	0.029
SA + 1 × SSA	0.9462	0.9224	0.9299	0.8841	0.018	0.064
SA + 2 × SSA	0.9481	0.9184	0.9323	0.8833	0.030	0.098
SA + 3 × SSA	0.9503	0.9229	0.9352	0.8892	0.042	0.132
SA + 4 × SSA	0.9493	0.9219	0.9339	0.8864	0.054	0.166

Table 5. Spectrum reconstruction quality evaluation results of multispectral point cloud.

Loss Function	Original Data	SA	SSA
Average SAM (°)	0	58.41	57.1
Zero-value transformation rate (%)	66.67	64.90	/

Table 6. Classification and comparison results of using different spectral reconstruction methods in various models. Bold values represent the highest accuracy. None denotes no spectral reconstruction, Our denotes DossaNet.

Model	Spectral Reconstruction	OA	AA	Kappa	MIoU
PointNet++	None	0.7800	0.4790	0.7100	0.3930
	IDW KNN	0.8147	0.5005	0.7558	0.4220
	IDW KNN	0.8107	0.4879	0.7484	0.4130
	Our	0.8280	0.5237	0.7732	0.4515
DGCNN	None	0.9332	0.8685	0.9129	0.8242
	IDW KNN	0.9347	0.8512	0.9146	0.8081
	IDW KNN	0.9405	0.8427	0.9206	0.8030
	Our	0.9433	0.8720	0.9260	0.8351
RandLA-Net	None	0.8548	0.7371	0.8097	0.6696
	IDW KNN	0.9143	0.7807	0.8877	0.7302
	IDW KNN	0.9082	0.7702	0.8807	0.7165
	Our	0.9141	0.7798	0.8874	0.7291
KPConv	None	0.9430	0.8999	0.9256	0.8618
	IDW KNN	0.9505	0.9150	0.9354	0.8804
	IDW KNN	0.9496	0.9086	0.9343	0.8821
	Our	0.9503	0.9229	0.9352	0.8892
PAConv	None	0.9473	0.8975	0.9312	0.8615
	IDW KNN	0.9527	0.9172	0.9382	0.8854
	IDW KNN	0.9524	0.9143	0.9377	0.8824
	Our	0.9529	0.9327	0.9384	0.8956
Point Transformer	None	0.9501	0.8913	0.9348	0.8564
	IDW KNN	0.9536	0.9109	0.9394	0.8783
	IDW KNN	0.9527	0.8985	0.9382	0.8660
	Our	0.9533	0.9169	0.9390	0.8829

Table 7. Classification results and comparison results of using different spectral reconstruction methods in various models (IOU). Bold values represent the highest accuracy. None denotes no spectral reconstruction, Our denotes DossaNet.

Model	Spectral Reconstruction	IOU
Model	Spectral Reconstruction	Impervious Ground	Grass	Building	Tree	Car	Power Line	Bare Ground
PointNet++	None	0.6175	0.6687	0.5427	0.8394	0.0824	0	0
	IDW	0.6596	0.7434	0.6030	0.8658	0.0820	0	0
	KNN	0.6621	0.7314	0.5648	0.8631	0.0699	0	0
	Our	0.6575	0.7761	0.6087	0.9145	0.2041	0	0
DGCNN	None	0.8319	0.8358	0.9398	0.9494	0.8594	0.6863	0.6669
	IDW	0.8434	0.8409	0.9396	0.9488	0.8614	0.6514	0.5712
	KNN	0.8556	0.8578	0.9473	0.9516	0.8669	0.6044	0.5376
	Our	0.8566	0.8589	0.9508	0.9578	0.8769	0.6608	0.6841
RandLA-Net	None	0.6386	0.6526	0.9205	0.9480	0.7839	0.7434	0
	IDW	0.7786	0.8135	0.9387	0.9550	0.8526	0.7729	0
	KNN	0.7723	0.7895	0.9364	0.9548	0.8059	0.7566	0
	Our	0.7926	0.8022	0.9341	0.9553	0.8262	0.7935	0
KPConv	None	0.8481	0.8511	0.9504	0.9734	0.9289	0.8816	0.5993
	IDW	0.8719	0.8646	0.9568	0.9740	0.9303	0.8810	0.6843
	KNN	0.8738	0.8691	0.9607	0.9708	0.9197	0.8651	0.7157
	Our	0.8692	0.8606	0.9587	0.9735	0.9321	0.8787	0.7518
PAConv	None	0.8630	0.8648	0.9564	0.9679	0.9110	0.8414	0.6263
	IDW	0.8733	0.8728	0.9649	0.9704	0.9228	0.8545	0.7391
	KNN	0.8828	0.8748	0.9657	0.9706	0.9225	0.8565	0.7030
	Our	0.8728	0.8682	0.9642	0.9730	0.9224	0.8714	0.7969
Point Transformer	None	0.8765	0.8667	0.9598	0.9717	0.9059	0.8145	0.5995
	IDW	0.8814	0.8738	0.9627	0.9705	0.9112	0.8071	0.7415
	KNN	0.8818	0.8718	0.9630	0.9688	0.8977	0.8066	0.6722
	Our	0.8797	0.8746	0.9596	0.9700	0.9010	0.8321	0.7636

Table 8. Spectral reconstruction results of using different neighborhood scales. Testing model is KPConv.

Different Neighborhood Scales K			OA	AA	Kappa	MIoU
3			0.9493	0.9141	0.9283	0.8783
6			0.9503	0.9229	0.9352	0.8892
12			0.9501	0.9221	0.9349	0.8874
18			0.9494	0.9193	0.9340	0.8857
32			0.9488	0.9155	0.9321	0.8795
IOU
Impervious Ground	Grass	Building	Tree	Car	Power line	Bare ground
0.8671	0.8600	0.9594	0.9709	0.9248	0.8778	0.6983
0.8692	0.8606	0.9587	0.9735	0.9321	0.8787	0.7518
0.8665	0.8621	0.9598	0.9735	0.9309	0.8764	0.7425
0.8673	0.8592	0.9575	0.9731	0.9302	0.8786	0.7339
0.8651	0.8583	0.9586	0.9709	0.9238	0.8623	0.7179

Table 9. Influence of Mask L1 Loss on spectral reconstruction results. Testing model is KPConv.

Loss Function	OA	AA	Kappa	MIoU
None	0.9493	0.9187	0.9339	0.8825
Mask L1 Loss	0.9503	0.9229	0.9352	0.8892

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, G.; Qi, H.; Shi, S.; Bi, S.; Tang, X.; Gong, W. Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism. Remote Sens. 2025, 17, 2411. https://doi.org/10.3390/rs17142411

AMA Style

Zhou G, Qi H, Shi S, Bi S, Tang X, Gong W. Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism. Remote Sensing. 2025; 17(14):2411. https://doi.org/10.3390/rs17142411

Chicago/Turabian Style

Zhou, Guoqing, Haoxin Qi, Shuo Shi, Sifu Bi, Xingtao Tang, and Wei Gong. 2025. "Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism" Remote Sensing 17, no. 14: 2411. https://doi.org/10.3390/rs17142411

APA Style

Zhou, G., Qi, H., Shi, S., Bi, S., Tang, X., & Gong, W. (2025). Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism. Remote Sensing, 17(14), 2411. https://doi.org/10.3390/rs17142411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism

Abstract

1. Introduction

2. Area and Dataset Partitioning

3. Methodology

3.1. Single-Channel Point Cloud Preprocessing

3.2. Dual Spectral Reconstruction Method Based on Spatial–Spectral Attention

3.2.1. SA Spectral Reconstruction Module

3.2.2. SSA Spectral Optimization Module

3.2.3. Mask L1 Loss

3.2.4. Point Cloud Reconstruction Quality Evaluation

3.3. Point Cloud Classification Verification Method

4. Result

4.1. Single-Channel Point Cloud Preprocessing Results

4.2. Dual MSL Cloud Reconstruction Based on SSA

4.3. Classification of LC Types

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI