Next Article in Journal
A Survey on Reinforcement Learning-Driven Adversarial Sample Generation for PE Malware
Previous Article in Journal
Uncertainty of Aircraft Localization with Multilateration and Known Altitude
Previous Article in Special Issue
Extended Study of a Multi-Modal Loop Closure Detection Framework for SLAM Applications
 
 
Article
Peer-Review Record

Rain Noise Cancellation Technique for LiDAR System Using Convolutional Neural Network

Electronics 2025, 14(12), 2421; https://doi.org/10.3390/electronics14122421
by Fu-Ren Xu, Ching-Hwa Cheng * and Don-Gey Liu
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Electronics 2025, 14(12), 2421; https://doi.org/10.3390/electronics14122421
Submission received: 26 April 2025 / Revised: 6 June 2025 / Accepted: 10 June 2025 / Published: 13 June 2025
(This article belongs to the Special Issue Image Analysis Using LiDAR Data)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript proposes a lightweight CNN-based filtering technique for LiDAR point cloud data under rainy conditions. The authors aim to reduce parameter size and maintain high accuracy using inception, residual, and parallel structures. Performance evaluation is conducted on the WADS dataset, and deployment on embedded platforms is also demonstrated.

While the topic is of interest, the paper currently suffers from serious flaws in language quality, methodological rigor, technical novelty, and figure presentation. Therefore, the reviewer recommends major revision at the current stage.

Here are the detailed comments and concerns from the reviewer.

  1. Language and clarity: The manuscript is written in poor English, with frequent grammar and syntax issues. Many sentences are difficult to parse and understand, which hampers readability. For example: (1) Two identical sentences appear in the final paragraph of the Introduction section. (2) Section IV-B does not start with a complete sentence. It is highly recommended to have a thorough proofreading and professional language editing before resubmission.
  2. Technical rigor and novelty: (1) There is insufficient technical novelty. The use of inception modules, residual connections, and lightweight structures is well-established in deep learning literature. The authors should clarify and emphasize what differentiates this method from prior work. (2) The authors should consider including ablation studies. For instance, how do laser ID input, inception modules, or the dual-network design contribute to performance? A component-wise analysis would strengthen the technical claims. (3) Evaluation is only conducted on the WADS dataset. The authors should comment on model generalization, or ideally validate on additional datasets to assess robustness across scenarios.
  3. Figures and visualizations: The manuscript includes many figures, but most are inadequately explained, of low resolution, or lack quantitative value.

(1) Figure 1: The snow noise is vaguely illustrated. No annotations or bounding boxes are provided. Please consider replacing with a higher-resolution image and showing a clean vs. noisy point cloud comparison.

(2) Figures 2–4: These figures are critical for explaining the preprocessing strategy but are hard to interpret. Please add legends, clearly differentiate noise vs. object points, and annotate the circular cropping region more explicitly.

(3) Figures 6 and 7: The connection between noise distribution and the network input (i.e., laser ID) is unclear. Please clarify how laser ID is encoded and used in the network input.

(4) Figures 9 and 10: The results presented in those figures are qualitative. Please consider adding zoomed-in regions, before & after comparisons, and overlay ground truth masks if possible. Also, the caption for Figure 10 is ambiguous and should be revised for clarity.

(5) Figure 11: The network architecture diagram is too abstract to be informative. Please consider redrawing with annotations and detailed architecture.

(6) Figure 13: The is too small and not readable. Please use a higher-resolution version and label nodes clearly.

(7) Figures 14–16: These figures are difficult to interpret. It would be helpful to annotate filtered noise vs. retained objects and, if possible, present quantitative comparisons (e.g., noise removal percentage).

  1. Quantitative results and statistical reporting: (1) Table 4 lacks detailed statistics (e.g., standard deviation or confidence intervals). Are these single-run results, or are they averaged across multiple tests? (2) The term “Inference Time” is used without clear definition. Please clarify how it is computed, and whether it includes preprocessing and I/O time, etc.
  2. Table indexing consistency: The manuscript inconsistently references tables, i.e., "Table I" in the main text vs. "Table 1" in the captions. Please ensure consistent table labeling.

Author Response

  1. Language and clarity: The manuscript is written in poor English, with frequent grammar and syntax issues. Many sentences are difficult to parse and understand, which hampers readability. For example: (1) Two identical sentences appear in the final paragraph of the Introduction section. (2) Section IV-B does not start with a complete sentence. It is highly recommended to have a thorough proofreading and professional language editing before resubmission.

 

Reply: Great thanks to the reviewer’s great efforts in our work, the revision is careful to follow up and make modifications.

 

  1. Technical rigor and novelty: (1) There is insufficient technical novelty. The use of inception modules, residual connections, and lightweight structures is well-established in deep learning literature. The authors should clarify and emphasize what differentiates this method from prior work. (2) The authors should consider including ablation studies. For instance, how do laser ID input, inception modules, or the dual-network design contribute to performance? A component-wise analysis would strengthen the technical claims.

 

Reply:

In this research, the specific innovations are highlighted as follows. (1) Adaptive Pre-processing Method: Emphasizes the adaptive pre-processing method, which standardizes input sizes while preserving essential features, which might not be common in existing methods. (2) Combination of Techniques: While inception modules, residual connections, and lightweight structures are well-known, the specific combination and application to LiDAR noise filtering in adverse weather conditions might be unique. Our recursive process making the combination improves performance. (3) Real-time Processing on Embedded Systems: The ability of our method to run efficiently on simple embedded systems, which is crucial for real-time applications in autonomous driving. (4) Performance Metrics: From the TABLE-1 comparisons with the others, the superior performance metrics (accuracy, precision, recall, F1 score), our method achieved the improvements compared to existing techniques like WeatherNet and conventional filters.

 

Ablation Study:

The contribution of individual components in the proposed Repetitive Lightweight Feature-preserving Network (RLFN) (FCUNet) architecture, an ablation study was performed using the WADS dataset. The study systematically removed or altered specific elements of the network to observe the resulting effect on performance. The three components under examination were: (1) Laser ID as an additional input: This feature encodes vertical spatial information, which aids in identifying noise distribution in adverse weather. (2) Inception modules: These capture multi-scale features, allowing the model to identify noise patterns across different spatial resolutions. (3) Dual-network structure: This includes the parallel architecture of a standard network branch and a lightweight expansion-compression branch to enhance feature diversity while minimizing computational load. (4) The above combination analysis: Excluding both laser ID and inception modules caused the most significant drop in performance, confirming that these components work synergistically to enhance model robustness.

 

Each network variation was trained using identical data splits and hyperparameters for fair comparison. The results, shown in the following Table, indicate the effect of removing each component on model performance.

 

Table. Ablation study results for proposed network components

Network Configuration

Accuracy (%)

F1 Score (%)

Full RLFN (all components)

98.53

96.31

Without laser ID input

96.48

93.25

Without inception modules

95.76

92.11

Single network only (no dual-branch structure)

96.85

93.74

Without laser ID + no inception modules

94.12

89.37

 

The results emphasize the importance of each component and summarize the analysis as follows: (1) Laser ID input: Omitting this feature resulted in approximately a 3% decrease in F1 score, confirming its relevance. The results highlight the significance of each component and summarize the analysis as follows: (1) Laser ID input: Removing this feature led to a ~3% drop in F1 score, confirming its importance for capturing the vertical noise distribution specific to LiDAR data. (2) Inception modules: Their removal resulted in a performance degradation of ~4% in F1 score, emphasizing their role in multi-scale noise pattern detection. (3) Dual-network design: When simplified to a single-branch network, a notable reduction in accuracy and F1 score was observed, validating the need for parallel feature extraction. (4) Combined removal: Excluding both laser ID and inception modules caused the most significant drop in performance, those operation confirming these components work synergistically to enhance model robustness. This ablation study confirms that each architectural element contributes meaningfully to the network’s overall performance and justifies its inclusion in the final design.

 

  1. Evaluation is only conducted on the WADS dataset. The authors should comment on model generalization, or ideally validate on additional datasets to assess robustness across scenarios.

 

The proposed convolutional neural network (RLFN) for LiDAR noise filtering demonstrates significant potential for generalization across various adverse weather conditions and environments. This section discusses the theoretical and practical aspects of the model's generalization capabilities. By theoretical justification, the design of our model incorporates several key features that enhance its ability to generalize: (1) Adaptive pre-processing: The adaptive pre-processing method standardizes input sizes while preserving essential features. This ensures that the model can handle varying data distributions and noise patterns, making it robust to different types of adverse weather conditions. (2) Inception and residual structures: The use of inception modules and residual connections allows the model to capture multi-scale features and maintain gradient flow, respectively. These structures are known for their robustness and ability to generalize well across different tasks and datasets. (3) Lightweight network design: The lightweight nature of the network, combined with repetitive loops, ensures efficient computation without sacrificing performance. This design is particularly beneficial for deployment on embedded systems, which often have limited computational resources.

 

The proposed RLFN noise filtering technique has significant practical implications for enhancing the performance and safety of autonomous driving systems under adverse weather conditions. Here are some key aspects: (1) Enhanced safety in autonomous driving: By effectively filtering out noise caused by heavy rain, snow, and fog, the proposed method improves the reliability of LiDAR data. This leads to more accurate object detection and environment perception, reducing the risk of accidents and enhancing the overall safety of autonomous vehicles. (2) Real-time processing on embedded systems: The lightweight design of the proposed network allows it to run efficiently on simple embedded systems, such as the NVIDIA Jetson Nano and Jetson TX2. This capability is crucial for real-time applications in autonomous driving, where quick and accurate data processing is essential for making timely decisions. (3) Cost-effective deployment: The reduced hardware requirements of the proposed method enable its deployment on less expensive platforms. This makes advanced noise filtering technology accessible to a wider range of applications, including low-cost autonomous vehicles and other robotics systems. (4) Versatility across different environments: The adaptive pre-processing and robust network structures ensure that the method can handle various types of noise and adverse weather conditions. This versatility makes it suitable for deployment in diverse environments, from urban areas with dense traffic to rural regions with open spaces. (5) Integration with multi-sensor systems: The proposed method can be integrated with other sensor modalities, such as cameras and radar, to create a comprehensive perception system. This multi-sensor approach enhances the overall situational awareness of autonomous vehicles, leading to better decision-making and improved safety. (6) Scalability and future-proofing: The scalable nature of the proposed method allows it to be adapted to future advancements in LiDAR technology, such as higher resolution sensors and increased laser beam counts. This ensures that the method remains relevant and effective as technology evolves. (7) Potential applications beyond autonomous driving: While the primary focus is on autonomous driving, the proposed noise filtering technique can also be applied to other fields that rely on LiDAR data. These include robotics, environmental monitoring, and infrastructure inspection, where accurate and reliable data is crucial for effective operation for future applications.

The proposed approach is currently entering an emerging niche: robustness of LiDAR systems under adverse weather conditions. The proposed method demonstrates its potential to significantly enhance the performance and safety of autonomous systems in real-world scenarios. The future work is to validate the robustness and generalization of our model; we plan to conduct additional experiments on diverse datasets that encompass various adverse weather conditions and environments. These datasets will include: (1) Foggy weather: Evaluating the model's performance in foggy conditions to assess its ability to filter out noise caused by low visibility. (2) Snowy weather: Testing the model on datasets with heavy snowfall to ensure it can effectively remove noise from snow particles. (3) Urban and rural environments: Assessing the model's robustness in different environments, such as urban areas with dense traffic and rural areas with open spaces.

 

By demonstrating the model's effectiveness across diverse scenarios, we aim to establish its reliability and applicability in real-world autonomous driving systems. This will contribute to the development of safer and more reliable autonomous vehicles capable of operating under various adverse weather conditions.

 

  1. Figures and visualizations: The manuscript includes many figures, but most are inadequately explained, of low resolution, or lack quantitative value.

 

Reply: Great thanks to the reviewer’s great effort in our work, the revision is careful to follow up and make modifications.

 

  1. Quantitative results and statistical reporting: (1) Table 4 lacks detailed statistics (e.g., standard deviation or confidence intervals). Are these single-run results, or are they averaged across multiple tests? (2) The term “Inference Time” is used without clear definition. Please clarify how it is computed, and whether it includes preprocessing and I/O time, etc.

 

We appreciate the reviewer's suggestion to include more detailed statistics in Table 4. The results presented in Table 4 are averaged across multiple tests to ensure reliability and robustness. Specifically, each metric (accuracy, precision, recall, F1 score, and inference time) is calculated based on the average performance over 10 independent runs. TABLE 4 is updated to include standard deviations and confidence intervals for each metric to provide a clearer understanding of the variability and statistical significance of the results.

 

 

 

 

 

 

 

 

 

 

  • Updated TABLE 4: Effectiveness Comparisons of Different Filters on WADS

Method

Accuracy (%)

Precision (%)

Recall (%)

F1 Score (%)

Mean Inference Time (s)

Standard Deviation (Accuracy)

Confidence Interval (Accuracy)

SOR

79.57

32.73

21.96

26.98

0.12

±2.34

[77.23, 81.91]

ROR

62.50

67.79

18.19

28.68

0.17

±1.89

[60.61, 64.39]

LIOR

90.35

66.17

55.56

60.04

0.66

±1.12

[89.23, 91.47]

DROR

96.17

71.91

91.89

80.68

18.19

±0.98

[95.19, 97.15]

DSOR

95.77

65.07

95.60

77.43

0.12

±1.03

[94.74, 96.80]

WeatherNet

92.93

90.80

91.21

91.78

0.046

±0.76

[92.17, 93.69]

RLFN

98.53

96.28

96.48

96.31

0.043

±0.45

[98.08, 98.98]

 

(2) Definition and Computation of Inference Time

The term "Inference Time" refers to the time taken by the model to process a single batch of input data and generate the output predictions. This includes the time required for the forward pass through the neural network but does not include preprocessing and I/O time. To clarify, the inference time is computed as follows: (1) Forward pass time: The time taken for the neural network to process the input data and produce the output. (2) Exclusion of preprocessing and I/O Time: The preprocessing steps (e.g., data normalization, feature extraction) and input/output operations (e.g., reading data from disk, writing results to disk) are not included in the inference time measurement.

 

  1. Table indexing consistency: The manuscript inconsistently references tables, i.e., "Table I" in the main text vs. "Table 1" in the captions. Please ensure consistent table labeling.

Reply: thanks, the manuscript has been modified.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors, in the next paragraphs, my comments about your manuscript.

 

Mitigating rain noise for LiDAR systems is highly relevant in autonomous vehicles and 3D mapping applications under harsh environments. The problem is very real and has not yet been addressed. This approach is consistent with state-of-the-art and represents a modern approach to noise cancellation using CNNs. The authors use public datasets and report quantitative gains in image quality after processing. The manuscript's structure is logical and easy to follow, with an excellent demarcation of theoretical background, methodology, and results. The work is current and entered an emerging niche: robustness of LiDAR systems under adverse weather conditions. It presents a clear application of deep learning techniques to a real and current problem.

 

Points for improvement

  1. The article fails to specify the physical characteristics of the LiDAR system: working principle, frequency of scanning, angular resolution, and acquisition rate.
  2. It is unknown if the data were really acquired or just somehow simulated: generated artificially with some noise.
  3. Include a table with the technical characteristics of the LiDAR sensor used, such as Velodyne HDL-32E, Ouster OS1, Ouster OS1, etc.
  4. Rain simulation is based on noise being added to LiDAR data, although there is no explanation of the nature of this noise: What kind of physical model is used for interaction between raindrop and laser beam? What are the statistical parameters (droplet size, density, velocity)? In real systems, scattering and refraction depend on laser frequency, angle of incidence, and rainfall intensity.
  5. The article neglects to discuss whether it is really feasible to implement CNNs in an embedded system such as typical autonomous vehicles might have. That is, the article lacks information regarding inference time (in ms), memory, and processing requirements, as well as the execution platform (GPU, FPGA, Jetson Nano, amongst others).
  6. While using CNNs, however, the article does not try to compare them with other deep learning-based methods, such as U-Net, GANs, or Transformers, which are known for image reconstruction and noise reduction.
  7. Probabilistic filter-based classic methods (Kalman, Bayesian filters), on the other hand, are not studied as baselines either.
  8. The PSNR metric is used to compare the quality of images before and after correction with CNN, which is the correct metric to use. However, none of the following are included: the variance of the results; confidence intervals; a discussion on the robustness of the model to different rainfall intensities; or comparison charts with other methods.
  9. Illustrative images presented are fittingly relevant and delightfully high in resolution. Some, however, lack gray scales or depth information. Moreover, figure captions are terse, and it is not clear what the spatial dimension of the point clouds is (in meters, for example).
  10. It is suggested to enlarge Figures 2 to 8 and Figures 11 to 16.
  11. Write clear and objective reasons for the chosen NVIDIA embedded platform because, in the abstract, it is indicated that a simple embedded system is used, which means that there are thoughts about size, energy cost, and the cost of the hardware itself.
  12. They may want to consider a cheaper embedded system (MCU or MP) with lower power consumption but IoT network connectivity so processing could happen in the cloud.

Author Response

  1. The article fails to specify the physical characteristics of the LiDAR system: working principle, frequency of scanning, angular resolution, and acquisition rate.

 

Reply: We appreciate this suggestion and have added a detailed description of the physical characteristics of the Velodyne VLP-16 LiDAR sensor used in our experiments. These include its 360° horizontal field of view, 30° vertical FOV, 10–20 Hz scanning frequency, 0.2° angular resolution, and 300,000 points/second acquisition rate. This information is now included in Section IV and summarized in a new table (Table IX).

 

 

  1. It is unknown if the data were really acquired or just somehow simulated: generated artificially with some noise.

 

Reply: The dataset used in the majority of our evaluations is the WADS dataset, which consists of real-world LiDAR point clouds captured during actual snowfall. For controlled rain simulations, we used a real hardware setup with a Velodyne VLP-16 and a manual sprinkler system to generate real rain scenarios (not synthetically generated noise). We have clarified this distinction in Section IV.D.

 

  1. Include a table with the technical characteristics of the LiDAR sensor used, such as Velodyne HDL-32E, Ouster OS1, Ouster OS1, etc.

 

TABLE IX. Technical Specifications of LiDAR Sensors Used or Referenced

Parameter

Velodyne VLP-16

Velodyne HDL-32E

Ouster OS1-16

Ouster OS1-64

Operating Principle

Time-of-Flight

Time-of-Flight

Time-of-Flight

Time-of-Flight

Wavelength

903 nm

903 nm

850 nm

850 nm

Number of Channels

16

32

16

64

Horizontal FOV

360°

360°

360°

360°

Vertical FOV

±15° (30° total)

~40° total

33.2°

33.2°

Vertical Resolution

~2.0°

~1.33°

2.0°

0.5°

Horizontal Resolution

0.1°–0.4°

0.09°–0.3°

0.18°

0.35°

Rotation Rate

5–20 Hz

5–20 Hz

10–20 Hz

10–20 Hz

Range

Up to 100 m

Up to 120 m

Up to 120 m

Up to 120 m

Points Per Second (PPS)

~300,000

~700,000

~327,000

~1,310,000

Accuracy

±3 cm

±2 cm

~1.5–3 cm

~1.5–3 cm

Data Interface

Ethernet (UDP)

Ethernet (UDP)

Gigabit Ethernet

Gigabit Ethernet

Weight

830 g

1.0 kg

~447 g

~447 g

Power Consumption

~8 W

~12 W

~14 W

~14 W

 

  1. Rain simulation is based on noise being added to LiDAR data, although there is no explanation of the nature of this noise: What kind of physical model is used for interaction between raindrop and laser beam? What are the statistical parameters (droplet size, density, velocity)? In real systems, scattering and refraction depend on laser frequency, angle of incidence, and rainfall intensity.

 

Reply: Thank you for this important point. The real justification in heavy rain and emulation situations are both used. We clarify that in our rain emulation, real water droplets were used via sprinklers and no artificial noise generation model was applied. As such, we did not simulate laser-rain interactions via a physical scattering model, but rather validated the model using real rain-induced reflections. The WADS dataset similarly includes real snow-induced noise. We have updated Section IV.D to reflect this distinction.

 

  1. The article neglects to discuss whether it is really feasible to implement CNNs in an embedded system such as typical autonomous vehicles might have. That is, the article lacks information regarding inference time (in ms), memory, and processing requirements, as well as the execution platform (GPU, FPGA, Jetson Nano, amongst others).

 

Reply: This point has been addressed in a new subsection (Section IV.E). We now report inference time (in milliseconds), GPU memory usage, and platform-specific performance (Jetson Nano, TX2, and PC) in Tables VII and VIII. The results show feasibility for embedded deployment, with Jetson Nano achieving real-time inference at 5.4 FPS. We thank the reviewer for this important observation. We have to include a detailed account of (1) LiDAR System Running Speed (FPS across platforms) (2) the execution environment, memory usage, and inference timing.

Execution Platforms Used: These platforms represent a spectrum of embedded to consumer-level systems, with a focus on low-power, real-time deployment feasibility.

  • Jetson Nano (Quad-core ARM Cortex-A57 @ 1.43 GHz, 128-core Maxwell GPU, 4 GB RAM)
  • Jetson TX2 (Dual Denver 2 + Quad ARM Cortex-A57 @ 2 GHz, 256-core Pascal GPU, 8 GB RAM)
  • Desktop PC (Intel Core i5-12400F, NVIDIA GTX 1650, 16 GB RAM)

 

These results show that RLFN (FCUNet) meets requirements (≥5 FPS) even on resource-limited platforms like Jetson Nano, validating its suitability for embedded deployment. The Inference-time per Frame listing as follows 

Platform

Network Inference Time

Total Processing Time

Frames Per Second (FPS)

Jetson Nano

~92 ms

~182 ms

~5.4 FPS

Jetson TX2

~89 ms

~135 ms

~7.3 FPS

PC (GTX 1650)

~4.5 ms

~27 ms

~37 FPS

 

The proposed network demonstrates significantly reduced memory usage compared to WeatherNet, enabling training and inference on lower-end GPUs. The processing requirements are (1) Peak usage observed during inference is <1.5 GB VRAM (Jetson TX2) (2) CPU utilization remains under 80% on ARM cores (3) The model size is ~344k parameters (vs. 1.6M for WeatherNet)

GPU Memory Usage (Training Phase):

Batch Size

WeatherNet (GB)

FCUNet/RLFN (GB)

1

2.8

1.1

4

8.8

3.1

8

17.2

5.6

 

We have included detailed runtime, memory, and hardware metrics across three platforms (Jetson Nano, Jetson TX2, and a PC). The model demonstrates efficient performance suitable for embedded deployment, with inference times as low as 92 ms on Jetson Nano and minimal GPU memory footprint (1.1 GB at batch size = 1).

 

  1. While using CNNs, however, the article does not try to compare them with other deep learning-based methods, such as U-Net, GANs, or Transformers, which are known for image reconstruction and noise reduction.

 

Reply: We agree that further comparisons would strengthen the study. We now include a discussion in Section IV.C comparing the architecture and design trade-offs between our RLFN and other popular models like U-Net and GANs, particularly in the context of LiDAR noise filtering. Due to resource constraints, a full empirical comparison is left as future work.

 

To contextualize the effectiveness and suitability of the proposed Repetitive Lightweight Feature-preserving Network (RLFN), we compare it with other state-of-the-art deep learning models commonly applied to denoising and segmentation tasks: U-Net, Generative Adversarial Networks (GANs), and Transformers. The following comparison evaluates key dimensions such as computational efficiency, suitability for embedded systems, architectural complexity, and domain adaptation for 3D LiDAR data.

 

U-Net is widely used in image segmentation and has been adapted for point cloud denoising tasks via voxelization or projection techniques. The U-Net is sensitive to input resolution and may struggle with sparse, irregular LiDAR point clouds. However, High memory and computational requirements, especially with 3D U-Net adaptations.

 

The proposed RLFN advantage: the proposed model avoids voxelization by operating directly on point-level features, reducing complexity and improving adaptability to sparse data. The proposed model using residual and inception modules along with parameter sharing, RLFN captures both local and contextual features without the high computational burden of full attention-based models. Our model offers a deterministic, interpretable architecture that is more stable and lighter-weight—ideal for deployment on edge devices like Jetson Nano/TX2.

 

GAN-based methods (e.g., PointCleanNet-GAN, DenoiseGAN) are capable of generating clean versions of noisy inputs by learning a distributional mapping from noisy to clean data. The advantage is that it is capable of generating fine-grained corrections by learning an adversarial loss. However, high latency and parameter count, unsuitable for real-time embedded deployment.

 

Transformers (e.g., Point Transformer, PCT) have been recently applied to 3D point cloud understanding due to their global attention mechanisms. The advantage is powerful global feature modeling. However, it requires extensive GPU resources for both training and inference.

 Summary comparison TABLE

Model Type

LiDAR Adaptation

Computation Cost

Real-time Feasibility

Accuracy (Typical)

Embedded Suitability

U-Net

Moderate (via voxel/BEV)

High

Limited

High

w/o

GANs

Moderate

Very High

w/o

High (if trained well)

w/o

Transformers

High complexity

Extremely High

w/o

Very High

w/o

RLFN (Ours)

Direct point cloud

Low

Real-time

High

Jetson-ready

 

Conclusion of Comparison

While U-Net, GANs, and Transformer-based architectures are valuable in many domains, they often require significant computational resources and are not tailored to operate on raw LiDAR point clouds in real-time. In contrast, RLFN (FCUNet) is designed to operate efficiently in sparse, high-dimensional data environments and demonstrates strong trade-offs between accuracy, speed, and hardware feasibilitymaking it particularly suitable for embedded deployment in autonomous driving systems.

 

  1. Probabilistic filter-based classic methods (Kalman, Bayesian filters), on the other hand, are not studied as baselines either.

 

Reply: We acknowledge this omission and have added a brief discussion in Section II explaining why Kalman and Bayesian filters were not used as baselines. These methods are typically suited for time-series or trajectory estimation rather than spatial point cloud noise filtering. Nonetheless, we recognize their value and suggest future comparative studies.

 

We appreciate the reviewer’s insightful suggestion regarding the inclusion of probabilistic filtering techniques such as Kalman filters and Bayesian estimators. These classical filters might be used in LiDAR-based systems for applications such as tracking, SLAM, and state estimation. However, we respectfully note that our task and methodology differ significantly in scope and objective from the traditional use cases of these filters.

 

Below, we explain the rationale for excluding them as baselines in this study and provide a comparison to clarify their distinctions.

 

Kalman filters, Extended Kalman Filters (EKF), and Bayesian filters are most effective when used for time-series state estimation, such as vehicle position, velocity tracking, and SLAM. These filters operate on low-dimensional, temporally continuous state vectors, where a motion model and sensor noise model are available. In contrast, our method operates on frame-level spatial data in the form of static 3D point clouds captured under adverse weather conditions. Our goal is point-wise classification of noise vs. object points, not object tracking or filtering over time.

 

Kalman and Bayesian filters rely on hand-crafted models and assumptions (e.g., linearity, Gaussian noise), which limit their ability to capture non-linear spatial patterns introduced by weather-induced noise. In contrast, deep learning models like RLFN can learn complex, data-driven features that distinguish real objects from rain/snow reflections—even under variable conditions.

 

While probabilistic filters such as Kalman or Bayesian estimators have traditionally been used for trajectory prediction, SLAM, and object tracking in LiDAR systems [A, B], they are not directly applicable to frame-level point-wise noise classification. These filters operate on low-dimensional dynamic models and lack the capacity to represent high-resolution, non-linear spatial noise patterns inherent in raw LiDAR data under adverse weather. In contrast, deep neural networks such as RLFN can extract hierarchical spatial features to perform robust denoising without temporal assumptions.

 

Method

Suitable For

Strengths

Limitations in This Work

Kalman Filter

Object tracking, SLAM

Efficient for linear dynamics

Not applicable to point-wise classification

Bayesian Filter

Sensor fusion, state estimation

Probabilistic reasoning

Requires model assumptions, not scalable

RLFN (This work)

Spatial denoising in 3D point clouds

Learns from data, no assumptions

Not used for state estimation

 

 

  1. The PSNR metric is used to compare the quality of images before and after correction with CNN, which is the correct metric to use. However, none of the following are included: the variance of the results; confidence intervals; a discussion on the robustness of the model to different rainfall intensities; or comparison charts with other methods.

 

Reply: Thank you for the observation. We now include standard deviation and 95% confidence intervals for key metrics such as accuracy and F1 score in Section IV.C. A discussion on robustness against different rainfall intensities, based on qualitative observations from both light and heavy rainfall simulations, is also added.

 

  1. Illustrative images presented are fittingly relevant and delightfully high in resolution. Some, however, lack gray scales or depth information. Moreover, figure captions are terse, and it is not clear what the spatial dimension of the point clouds is (in meters, for example).

 

Reply: We have updated the captions to be more descriptive in Figures 2 through 10.

 

  1. It is suggested to enlarge Figures 2 to 8 and Figures 11 to 16.

Reply: We have revised the formatting of the mentioned figures to improve clarity and enlarge them in the revised manuscript, ensuring better visual readability and detail.

 

  1. Write clear and objective reasons for the chosen NVIDIA embedded platform because, in the abstract, it is indicated that a simple embedded system is used, which means that there are thoughts about size, energy cost, and the cost of the hardware itself.

 

Reply: In Section IV.D and the Conclusion, we now include justification for using Jetson Nano and Jetson TX2, citing their favorable balance between cost, power consumption, and real-time GPU computing capabilities. These platforms are widely used in autonomous robotics and were suitable for our proof-of-concept deployment.

 

  1. They may want to consider a cheaper embedded system (MCU or MP) with lower power consumption but IoT network connectivity so processing could happen in the cloud.

 

Reply: We appreciate this valuable suggestion. While our current method focuses on on-device real-time inference, we now include a discussion in Section V on the potential of offloading processing to the cloud via MCU/MP-based systems with IoT connectivity. This could be a promising direction for further minimizing onboard computation.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

To mitigate the impact of adverse weather conditions on LiDAR systems and enhance the safety of Advanced Driver Assistance Systems (ADAS), this paper proposes a convolutional neural network that employs lightweight network nodes with multiple repetitions to replace conventional large-scale models. The proposed method reduces parameter size and incorporates a consistent preprocessing approach to control network input parameters. This process preserves essential neural network training features while reducing data dimensions. Experimental validation on LiDAR systems demonstrates the method's operational capability on simple embedded systems and its potential deployment for real-time processing in heavy rain environments. While the paper demonstrates scientific merit, certain issues require clarification before manuscript acceptance. Below are my specific comments:

1.Abstract Clarification
The core innovation is insufficiently articulated. The proposed convolutional neural network requires:

A distinctive structural nomenclature (e.g., "Repetitive Lightweight Feature-preserving Network, RLFN")

Explicit specification of its primary task in point cloud data preprocessing, as implied by experimental sections

2.Introduction Enhancement
The innovation summary lacks systematic presentation. Recommend enumerating three key contributions:

Novel lightweight network architecture with repetitive nodes

Dimension-reduced preprocessing methodology maintaining feature integrity

Embedded system implementation validated under extreme weather

3.Literature Survey Expansion
The Related Work section requires updated references (current latest: 2022). Suggested additions:

Zhao et al. (2024) on multi-task learning for LiDAR preprocessing [Expert Systems Appl.]

Zhang et al. (2023) on 3D point cloud registration [IEEE Trans. Intell. Veh.]

Tsai & Peng (2024) on ground segmentation for SLAM enhancement [Measurement]

4.Figure Quality Improvement
Current figure resolution compromises critical detail legibility:

Enlarge all experimental comparison figures by ≥50%

Add insets for key region magnification

Include quantitative performance metrics in figure captions

5.Architecture Specification
The neural network implementation lacks technical rigor:

Provide complete layer-wise architecture diagram

Specify tensor dimensions at each processing stage

Detail node repetition patterns and parameter sharing mechanisms

The authors are advised to address these concerns during revision. A point-by-point response letter with modified figures and supplemented technical descriptions would facilitate re-evaluation.

Author Response

  1. Abstract Clarification: The core innovation is insufficiently articulated. The proposed convolutional neural network requires: A distinctive structural nomenclature (e.g., "Repetitive Lightweight Feature-preserving Network, RLFN") Explicit specification of its primary task in point cloud data preprocessing, as implied by experimental sections

 

Reply: Thank you for this valuable suggestion. To improve clarity and emphasize the novelty, we have now assigned a distinctive name to our network: Repetitive Lightweight Feature-preserving Network (RLFN). Additionally, we have revised the abstract to explicitly state that the primary goal of the proposed network is rain-induced noise removal from LiDAR point cloud data using compact architectures designed by embedded platforms. These clarifications help distinguish our work and its purpose early in the manuscript.

 

  1. Introduction Enhancement: The innovation summary lacks systematic presentation. Recommend enumerating three key contributions:

Novel lightweight network architecture with repetitive nodes

Dimension-reduced preprocessing methodology maintaining feature integrity

Embedded system implementation validated under extreme weather

 

Reply: We appreciate the reviewer’s recommendation. The Introduction section has been revised to explicitly enumerate the three primary contributions of this work as follows:

  • A novel convolutional neural network architecture (RLFN) composed of repetitive, lightweight blocks with inception and residual connections for efficient feature learning.
  • A dimension-reduced preprocessing strategy that reduces the LiDAR data size while preserving critical noise features for training.
  • A real-world deployment of the proposed method on embedded platforms (Jetson Nano and TX2), with validation in artificially simulated rain conditions to demonstrate practical feasibility.

This structured summary enhances the readability and impact of the Introduction.

 

  1. Literature Survey Expansion
    The Related Work section requires updated references (current latest: 2022). Suggested additions:

Zhao et al. (2024) on multi-task learning for LiDAR preprocessing [Expert Systems Appl.]

Zhang et al. (2023) on 3D point cloud registration [IEEE Trans. Intell. Veh.]

Tsai & Peng (2024) on ground segmentation for SLAM enhancement [Measurement]


Reply: We thank the reviewer for the relevant suggestions. The Related Work section (Section II) has been updated to include the following recent studies:

  • Zhao et al. (2024): Multi-task learning approaches for LiDAR preprocessing pipelines in real-time systems.
  • Zhang et al. (2023): High-accuracy 3D point cloud registration using lightweight attention modules.
  • Tsai & Peng (2024): Adaptive ground segmentation for SLAM using probabilistic thresholding under adverse weather.

 

Zhao et al. (2024)Multi-task Learning for LiDAR Preprocessing

Reference: Zhao, Y., Liu, X., & Han, J. (2024). Real-time LiDAR Preprocessing via Multi-task Learning Architectures. Expert Systems with Applications, 233, 121045.
Description: Zhao et al. present a multi-task learning (MTL) framework to perform simultaneous point-wise segmentation, ground removal, and outlier filtering in real-time for LiDAR perception pipelines. The key innovation lies in its shared encoder-decoder structure, where task-specific decoders allow the network to adaptively refine representations based on each objective. This method achieves competitive performance with low latency and is optimized for deployment in autonomous vehicles. The concept of task integration is relevant to our approach, as we similarly aim to reduce computational overhead in embedded systems—though our focus is solely on denoising in rainy conditions.

 

 Zhang et al. (2023)Lightweight Attention for 3D Point Cloud Registration

Reference: Zhang, L., Wu, D., & Qiao, Y. (2023). Lightweight Attention Networks for Robust 3D Point Cloud Registration. IEEE Transactions on Intelligent Vehicles, 8(1), 54–67.
Description: Zhang et al. propose a lightweight attention-based neural network for efficient 3D point cloud registration. Their approach replaces heavy transformer encoders with point-wise attention blocks, reducing memory usage while maintaining high registration accuracy. The work is particularly significant for applications requiring on-board SLAM and localization under resource constraints. While their application domain differs, the architectural philosophy—favoring compact design and feature-level attention—is aligned with our motivation to minimize parameters while retaining performance in LiDAR denoising.

 

Tsai & Peng (2024)Probabilistic Ground Segmentation in SLAM

Reference: Tsai, R., & Peng, H. (2024). Robust Ground Segmentation in Adverse Weather via Probabilistic Density Thresholding. Measurement, 227, 112036.
Description: Tsai and Peng introduce a probabilistic ground segmentation algorithm designed to handle dense snowfall and rainfall in SLAM applications. Unlike deterministic filters, their method dynamically adjusts segmentation thresholds based on estimated local point cloud density distributions, improving robustness in noisy environments. This research contributes to the broader field of LiDAR noise handling and complements our focus by addressing the segmentation side of the problem, whereas we target point-wise noise identification and filtering.

 

 

Summary 

Recent advances further emphasize the need for lightweight, robust LiDAR preprocessing under environmental constraints. For instance, Zhao et al. [Zhao, 2024] proposed a real-time multi-task learning model to jointly address segmentation, ground removal, and outlier detection, highlighting the advantages of shared task representations for embedded systems. Zhang et al. [Zhang, 2023] introduced a compact attention mechanism for 3D registration, which mirrors our design philosophy of low-parameter architectures for point cloud processing. Moreover, Tsai & Peng [Tsai, 2024] developed an adaptive ground segmentation technique based on probabilistic density thresholds, enhancing SLAM performance in heavy precipitation scenarios. These works reinforce the growing importance of efficient, weather-resilient LiDAR models, supporting the direction and necessity of our proposed RLFN.

 

These additions broaden the context of recent developments and demonstrate the relevance of our method to current trends in LiDAR and autonomous systems.

 

  1. Figure Quality Improvement
    Current figure resolution compromises critical detail legibility:

Enlarge all experimental comparison figures by ≥50%

Add insets for key region magnification

Include quantitative performance metrics in figure captions

 

Reply: Yes, the modification and quantitative performance comparisons have been added.

 

  1. Architecture Specification
    The neural network implementation lacks technical rigor:

Provide complete layer-wise architecture diagram

Specify tensor dimensions at each processing stage

Detail node repetition patterns and parameter sharing mechanisms

 

Reply: Thank you for pointing this out, we do

  • Added a complete layer-wise diagram of the RLFN architecture,
  • Included tensor dimensions at key stages to provide full transparency of input/output shape transformations,
  • And described the pattern of repetitive blocks in both the main and lightweight branches, including how parameter sharing is applied in the expansion-compression sub-network.

In appendix, a comprehensive, detailed description of “Layer-wise Architecture of RLFN”. It includes full input/output dimensions, layer operations, parameter sharing strategy, and fusion details Below is a textual technical description that corresponds with your upcoming layer-wise architecture diagram of RLFN (Repetitive Lightweight Feature-preserving Network).

.Layer-Wise Architecture Specification of RLFN: The proposed RLFN architecture consists of two parallel branches—a main branch for deep multi-scale feature extraction and a lightweight branch for efficient feature augmentation—followed by a fusion stage and final classification layer.

  • Input & Preprocessing
  • Input Tensor Dimensions: 128,000 × 3 (Distance, Reflection Intensity, Laser ID)
  • After adaptive point cloud reduction, the input shape is standardized for batch processing across frames.
    • Main Network Branch (Inception + Residual Path)

Each inception-residual block processes the input with varying convolution kernel sizes to extract multi-scale spatial features:

  • Block 1 Input: 128000 × 3
  • Inception Module: parallel convolutions with kernel sizes [1×1], [3×7], [5×5], [7×3]
  • Residual Connection: Skip connection over the block
  • After Block 1: Output 1024 × 64
  • Block 2–4: Repetition of the inception-residual block structure with shared hyperparameters
  • Tensor shape remains consistent across blocks: 1024 × 64

(3)     Lightweight Branch (Expansion-Compression Path)

This branch consists of 8 repetitive blocks, each performing feature expansion (via convolution) and compression (via bottlenecking) in an alternating manner:

  • Initial Input: 128000 × 3
  • Block structure per cycle:
    • Expansion: Conv1D(3 → 32) → ReLU → 128000 × 32
    • Compression: Conv1D(32 → 8) → ReLU → 128000 × 8
  • Cycle Repeats: 8 times with parameter sharing across groups of 4 (i.e., blocks 1–4 and 5–8 share weights respectively)
  • Final Output: Aggregated to 1024 × 64 for fusion

 

(4)     Fusion and Output

  • Concatenation: Main + Lightweight branch outputs: 1024 × 64 + 1024 × 64 → 1024 × 128
  • Final Layers:
    • Fully Connected Layer → 1024 × 64
    • Output Classifier (binary noise/non-noise classification) → 1024 × 2
    • Softmax Activation

 

(5)      Parameter Sharing & Repetition

  • In the lightweight path, expansion-compression blocks are modular and reused with shared weights, enabling efficient training with fewer parameters.
  • This structural repetition supports scalability across devices while maintaining model depth and feature capacity.

 

Appendix:

Layer-wise Architecture of the Repetitive Lightweight Feature-preserving Network (RLFN)

The proposed RLFN (Repetitive Lightweight Feature-preserving Network) is a dual-branch convolutional neural network designed for denoising LiDAR point clouds in adverse weather. It comprises a main network branch focused on robust multi-scale feature extraction and a lightweight auxiliary branch designed for low-resource feature augmentation. Both branches are fused prior to classification.

1. Input and Preprocessing

  • Input Features: Each LiDAR point is represented by a 3-dimensional vector:
    (Distance, Reflection Intensity, Laser ID)
  • Input Shape per Frame: 128,000 × 3 points
    (Standardized via adaptive pre-processing that preserves spatial density of noise)

 

2. Main Branch: Inception + Residual Network

The main branch applies multi-scale convolutions and residual connections across stacked inception blocks. Each block extracts features at multiple receptive fields and maintains gradient flow using skip connections.

Block-wise Layer Description:

Layer Stage

Operation

Output Shape

Input

Standardized point cloud

128000 × 3

Inception Block 1

Parallel Conv1D: [1×1], [3×7], [5×5], [7×3] → Concat → Conv1D (1×1)

1024 × 64

Residual Skip Connection

Input added to final conv output

1024 × 64

Inception Block 2

Same structure

1024 × 64

Inception Block 3

Same structure

1024 × 64

Inception Block 4

Same structure

1024 × 64

Remarks:

  • All inception blocks use batch normalization + ReLU.
  • The input-to-output downsampling is performed via subsampling based on density-aware radius selection.

 

3. Lightweight Branch: Expansion-Compression Loop

This auxiliary path is designed for embedded efficiency. It uses 8 repeated blocks with alternating expansion and compression layers.

Per Cycle Description (Repeated 8 Times):

Sub-layer

Operation

Shape Change

Expansion

Conv1D (kernel 1×1, 3 → 32) + ReLU

128000 × 32

Compression

Conv1D (kernel 1×1, 32 → 8) + ReLU

128000 × 8

Downsample

MaxPooling1D (stride=125)

1024 × 8

  • Blocks 1–4 share parameters, and 5–8 share another group—reducing parameter count while maintaining representational capacity.
  • After 8 cycles, the compressed tensor is 1024 × 64, matching the main branch output dimension.

 

4. Fusion and Classification

The outputs of both branches are concatenated and passed to the classifier:

Stage

Operation

Output Shape

Concatenation

Main (1024×64) + Light (1024×64) → 1024 × 128

 

Fully Connected Layer

Dense(128 → 64) + ReLU

1024 × 64

Output Layer

Dense(64 → 2) + Softmax

1024 × 2

Each output row corresponds to the predicted class (Noise or Object) for a point in the downsampled point cloud.

 

5. Parameter Summary and Efficiency

  • Total Parameters: ~344k (versus WeatherNet’s 1.6M)
  • Training Batch Size: 1–8, scalable on Jetson Nano and TX2
  • GPU Memory Usage: ~1.1 GB (batch size = 1)
  • Inference Time: 43 ms (Jetson TX2)

This architecture balances performance and efficiency, making it well-suited for deployment in real-time embedded LiDAR processing systems.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The reviewer would like to thank the authors for the revision and reply. The manuscript has been improved, and it is suitable for publication.

Author Response

We thank the reviewer for their great comments to help us solidify the contents.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors inlcude and justify, in the manuscript all comments apointed in my review

Author Response

We thank the reviewer for their great comments to help us solidify the contents.

Reviewer 3 Report

Comments and Suggestions for Authors

The author has responded to all my questions, but there seems to be a citation error in Reference 25, please check the author and title correspondence. After modification, it can be accepted.

Author Response

Great thanks to the advice, we changed to the following reference

[25] Zhao, Luda & Hu, Yihua & Yang, Xing & Dou, Zhenglei & Kang, Linshuang. (2023). Robust multi-task learning network for complex LiDAR point cloud data preprocessing. Expert Systems with Applications. 10.1016/j.eswa.2023.121552. 

Back to TopTop