1. Introduction
LIDAR (LIght Detection And Ranging) sensors are used in mining and mineral processing to provide detailed three-dimensional measurements of the environment. These measurements are used for topographic mapping, rock characterization, autonomous mining vehicle navigation, and obstacle detection, among other applications.
However, mining operations such as blasting, crushing, secondary reduction, and material transport by earthmoving machines and trucks generate dust that can remain suspended in the air for extended periods. Dust particles scatter and absorb light, which introduces noise into LIDAR measurements [
1]. This phenomenon can generate false measurements, producing erroneous distance readings, false detections, or missed detections of objects of interest [
2]. To address this problem, dust measurements must be filtered from LIDAR readings [
3].
Thus, using filtering algorithms allows for identification of dust signals and separation of them from other environmental data. Most algorithms model dust measurements as outliers and filter them by comparing the position of each measurement with those of its neighbors and/or by analyzing the intensity of the measurement. This is because dust measurements typically have a lower intensity than non-dust measurements. Moreover, new hyperspectral LIDARs can simultaneously obtain spatial and spectral information [
4,
5], meaning that each measurement is extended with spectral information. The inclusion of such information allows for a richer description of points and the possibility of using filtering algorithms that use these properties. However, given that these sensors are not yet widely adopted in standard mining applications such as obstacle detection and autonomous navigation, they are not included in this work. Therefore, our focus is on filtering methods that can be used in standard industrial LIDARs, such as those used in mining vehicles.
Among other techniques, filtering based on the use of machine learning techniques stands out because statistical classifiers can learn to identify patterns in data [
6]. Statistical classifiers can be trained using different types of sensors and information sources, allowing them to adapt to various operating conditions in mining environments. They can significantly improve the accuracy of the 3D data obtained by effectively identifying and differentiating dust measurements.
Two challenges for the use of statistical classifiers in dust filtering are the processing speed and the availability of data for the training of models [
3]. Real-time dust filtering is crucial in some applications, such as autonomous navigation and obstacle detection, which require real-time decision-making. Regarding the availability of training data, a related requirement is the availability of open dust databases to develop and benchmark different dust filtering methods.
In this paper, we address both challenges. First, we propose a neural network-based method capable of filtering dust measurements in real time from point clouds obtained using LIDARs. We train and validate the proposed method using real data. Second, we build a database using LIDAR sensor data from different dusty environments. This database is made public for use in the training and benchmarking of dust-filtering methods.
The main contributions of this paper are as follows: (1) a neural based method for the real-time dust filtering of point clouds obtained with LIDAR sensors, which includes a novel neural network encoding and processing architecture, as well as the use of novel features; (2) UCHILE-Dust, a database for training and testing dust filtering methods, which is made public.
3. Proposed Dust-Filtering Method
The proposed dust-filtering method is based on the use of a modified version of the PointNet++ architecture [
12], which receives point cloud data directly, groups the points hierarchically, computes internal feature representations, and classifies each point as dust or non-dust. For each LIDAR measurement, its 3D position, intensity, and temporal displacement are provided to the network. In the case where the data scan is acquired from a mobile vehicle, odometry information is also considered. Thus, the dust-filtering process takes into account geometric, intensity, and temporal features.
3.1. Neural Architecture: Reduced-PointNet++
The PointNet network, proposed in [
13], is able to directly process the spatial coordinates and other attributes of the points to classify and segment a point cloud. Unlike previous approaches that required converting the point clouds into 2D or volumetric representations, PointNet works directly on 3D data without the need for voxelization or projections, which makes it more efficient in terms of memory and processing. PointNet is based on the use of symmetric functions, such as global max-pooling, which allows it to be invariant to the order of the points, a fundamental property in point clouds where there is no predefined order. In addition, the model incorporates learned transformations that align the point cloud before processing, reducing the variability introduced by differences in data orientation. However, due to its reliance on global feature aggregation, PointNet is limited in its ability to capture local relationships between points, which may affect its performance on tasks that require fine-structure information, as in the case of filtering dust measurements.
PointNet++, proposed in [
12], improves PointNet by using hierarchical groupings of PointNet points and subnetworks on multiple scales, allowing greater capture of local features and spatial relationships. This modification allowed its use for dust filtering.
In order to achieve real-time processing of large point clouds, a reduced version of PointNet++ was designed in this work. Two big changes were made to PointNet++: first, the number of sampled points was reduced, which also reduced the number of abstraction layers (SA), and some blocks were eliminated to make the architecture simpler. Second, the MLPs configurations in both the abstraction and propagation blocks were adjusted. This led to a significant decrease in the number of parameters and an improvement in the efficiency of the model.
Table 1 shows these changes. The reduced architecture was obtained empirically through iterative pruning. Layers were progressively removed while monitoring accuracy, and the process was repeated until a non-negligible degradation in performance was observed.
In
Section 4, a comparison of the performance of PointNet++ and its reduced version is presented in dust-filtering tasks.
3.2. Point Cloud Features
Let us consider a point cloud
containing points
belonging to frame
t, where
t represents an instant in a temporal sequence of length
T. Each point
is defined by
where
corresponds to spatial data and
represents a vector of associated attributes, such as intensity (
), or other properties such as temporal information.
A first alternative to incorporating temporal information is to find the spatial differences between points in consecutive point clouds. For each point
in
, we look for the nearest point in
. This is achieved by minimizing the following Euclidean distance:
Then, the nearest point
is determined as
and the spatial difference vector is computed as
The vector of spatial differences , or its magnitude , can then be used as a temporal attribute of each point.
A second alternative to incorporate temporal information, proposed in [
14] as
temporal variation-aware interpolation, is to generate an interpolated feature to represent the local information of the previous point cloud
projected in the current point cloud
. To achieve this, first, for each point
in
, we calculate the distances
to the
K-nearest neighbors in
. Then, the interpolation weights for each neighbor point are computed as follows:
with
defined by Equation (
4) and
and
hyperparameters.
The weights are then normalized using the
softmax function:
Then, for each of the nearest neighbor points of
in the previous point cloud, the intensity value and the differences of the intensity values are fed to an MLP layer with ReLU activation, and intermediate features are computed (see details of the network architecture in [
14]):
Afterward, all intermediate features are agregated to generate the interpolated feature
as
where ⊙ is an elemental multiplication.
As shown in
Table 2, different variants of our dust-filtering method can be built depending on the information (feature vector) used:
SI: Spatial + Intensity features.
STdm: Spatial + Temporal-magnitude-difference features.
STdv: Spatial + Temporal-vector-difference features.
STi: Spatial + Temporal-interpolated features.
SITdm: Spatial + Intensity + Temporal-magnitude-difference features.
SITdv: Spatial + Intensity + Temporal-vector-difference features.
SITi: Spatial + Intensity + Temporal-interpolated features.
Finally, in the case where the 3D data are acquired from a LIDAR mounted on a moving vehicle or robot, the odometry information is used to align the point clouds before the temporal features are computed. Thus, before calculating temporal features between point clouds and , the points of are projected to t using the rotation and translation matrices between and t, and , respectively. These matrices are calculated from the vehicle’s odometry.
4. Experimental Results
4.1. UCHILE-Dust Database
The UCHILE-Dust dataset was acquired during campaigns conducted from September 2024 to February 2025.
The OS0 was configured at 1024 × 64 resolution, providing a 360° azimuth field of view, and operated in dual-return mode at 10 Hz. Each frame was extracted with 3D coordinates and return intensity, and robot odometry was associated where applicable.
Recordings were saved either as
PCAP files (direct OS0 stream) or
ROS bag files (robot-mounted OS0 with odometry). A multistep preprocessing pipeline was applied, including frame extraction, odometry alignment (moving sensor case), multiecho merging, distance filtering, pre-labeling with dust-free references, manual annotation using
labelCloud, and final conversion to the S3DIS format (Stanford Large-Scale 3D Indoor Spaces). The database is available at
https://github.com/nicolasCruzW21/UCHILE-Dust, accessed on 4 September 2025.
Table 3 shows a general overview of the database, considering the different subsets captured in different environments. It is important to note that the percentage of dust points in all subsets is less than 12%.
4.1.1. Interior 1 and 2 Subsets
Captured indoors at the Field Robotics Laboratory of the Advanced Mining Technology Center (AMTC) of the Universidad de Chile (UCHILE) using a static OS0 LiDAR sensor. Dust was manually dispersed across the scene, which includes a combination of glass and concrete surfaces that introduce complexities with transparent and reflective materials. A rock breaker hammer is present in the center of the room.
4.1.2. Exterior 1 and 2 Subsets
Captured outdoors in the AMTC courtyard. Dust was dispersed between the sensor and a nearby wall or in an open space to assess the impact of multipath reflections.
4.1.3. Carén Subset
Captured in a large, dry, and windy open field with flat terrain and a quarry. The site is located in the Carén park in the Metropolitan Region of Chile and belongs to UCHILE. A Panther robot equipped with an OS0 LiDAR was used to collect data while in motion. Dust was introduced using an air blower. Carén is the most realistic subset for mobile perception tasks.
Example point cloud frames for each subset are shown in
Figure 1.
4.2. Experimental Setup
Each variant of the model (SI, STdm, STdv, STi, SITdm, SITdv, and SITi) was trained using its respective training and validation sets. Three different learning rates were used for each variant: 0.01, 0.005, and 0.008. The learning rate that produced the best results was selected. This was done because the average accuracy varied depending on the learning rate used in each experiment. In all cases, a batch size of 16 was used. Other hyperparameters used are the following:
The CrossEntropyLoss was used as a loss function but considering weights for the classes, since they are unbalanced. This weight consisted of the inverse of the proportion of each class within the corresponding training dataset.
The number of training epochs was set at 100, but Early Stopping was implemented with 10 epochs of patience relative to the average accuracy value in the validation set. This metric was chosen as it is invariant to class imbalance.
Data Augmentation methods were used: rotations, scaling, occlusion, and noise.
A dropout rate of 0.7 was used to reduce overfitting and was applied to the last convolution layer before the classification layers.
We used two methods as a baseline: LIDROR, the traditional method with the best reported dust-filtering results, and the CNN-based method [
6], which reports better filtering results than LIDROR in [
6]. For the CNN-based method, we used the hyperparameters in the original paper and learning rate values of 0.01, 0.005, and 0.008. The learning rate that produced the best results was selected. For LIDROR, we determined the hyperparameters using grid search and the following ranges:
,
,
, and
.
Table 4 shows the parameters obtained. The parameter
is the angular resolution of the sensor, it depends on the LIDAR model and was set to 0.006134.
The following metrics were employed to assess the quality of the various methods employed for dust classification: accuracy, precision, recall, and F1-score. Accuracy is the proportion of all classifications that are correct, precision is the proportion of dust classifications that are actually dust, recall is the proportion of actual dust samples that are correctly classified as dust, and F1-score is the harmonic mean of precision and recall.
4.3. Results in Real Environments with Static Sensors
The results of applying different dust-filtering methods to the Interior 1, Interior 2, Exterior 1, and Exterior 2 datasets are shown in
Table 5,
Table 6,
Table 7 and
Table 8, respectively. In each instance, the methods are trained using exclusively data from the corresponding datasets.
The results obtained allow for the following conclusions to be drawn. All variants of the proposed method demonstrate superior performance compared to the LIDROR and CNN methods, with the exception of the Interior 1 dataset. The CNN method consistently performs better than LIDROR, and in the Interior 1 dataset, it shows the best performance. In the Interior 2, Exterior 1, and Exterior 2 datasets, one of the variants of the proposed methods achieves the best performance. In the Exterior 1 dataset, both baselines have very low precision: 0.03 for LIDROR and 0.09 for CNN. This means that they are unable to filter dust in these cases.
The most challenging datasets are Interior 1 and Exterior 2. In these datasets, all methods have low precision. Given that in deployment in a real environment false positives can be costly, this dataset represents an ideal benchmark for further experimentation. Furthermore, low F1-Scores are expected in these kinds of datasets and are driven by the large imbalance between classes.
Finally, a comparison of the proposed variants reveals that the utilization of both temporal difference and intensity features is generally superior to the application of either temporal differences or intensity features alone.
4.4. Results in Real Environments with Moving Sensors
The results of applying different dust-filtering methods to the Carén dataset, in which the LIDAR sensor is mounted on a mobile robot, are shown in
Table 9 and
Table 10.
Table 9 shows the results when the odometry correction is not used, and
Table 10 shows the results when the odometry correction is used.
The results obtained allow for the following conclusions to be drawn. All variants of the proposed method outperform both baselines. In fact, LIDROR achieves very low precision with this dataset. The CNN-based method also shows lower precision values than most variants of the proposed method.
Secondly, as demonstrated in the preceding subsection, the utilization of temporal features has been shown to enhance the efficacy of dust filtering. However, the alignment and analysis of the geometric characteristics of LIDAR points belonging to consecutive frames is challenging when utilizing moving sensors. This is because the points are in different reference systems. In static scenes, temporal variations arise predominantly from dust motion; therefore, magnitude-based temporal descriptors combined with intensity (e.g., SITdm) are particularly effective. When odometry is not applied, the ego-motion of the platform makes the entire environment appear to move, introducing apparent motion in all points; under these conditions, purely geometric temporal cues become less reliable, and intensity-driven temporal interpolation (e.g., STi/SITi) becomes more informative. Once odometry correction is applied, consecutive frames are geometrically aligned, compensating for global motion and revealing the true local dynamics of dust; in this case, directional temporal descriptors combined with intensity (e.g., SITdv) better capture the irregular, locally dispersed motion of dust compared to the coherent behavior of static surfaces. Therefore, we hypothesize that the application of odometry correction and temporal features allows for a better characterization and filtering of dust points. This phenomenon is evident in
Table 10, where the results of the proposed variants demonstrate enhancement when odometry correction is used. The only two cases in which the accuracy decreases are those corresponding to the STi and STdv variants. The SITdv variant yielded the best results, with a precision value of 0.7, a recall of 0.97, and an F1-score of 0.82.
Figure 2 shows three examples of the dust filtering method in action. In all three cases, it can be seen that the method removes most of the observed dust.
4.5. Measuring the Generalization Capabilities of the Method
To evaluate the generalization capabilities of the variants of the proposed methods and the CNN method, the methods were trained and validated using the Interior 1, Interior 2, Exterior 1, and Exterior 2 datasets, and then tested using the Carén dataset. The results of these experiments are shown in
Table 11 and
Table 12, which present the results for cases with and without odometry.
Upon comparing the results of the proposed methods trained and validated using the Carén dataset (
Table 9 and
Table 10) reveals that there is only a slight advantage to in-domain training relative to training in one environment and testing in another (
Table 11 and
Table 12). This finding indicates that the model is learning underlying dust features rather than environment-specific features, such as the relative position of the objects in the scene. This finding suggests that the model is generalizing correctly.
In the case of the CNN-based method, we observe a slightly higher drop in precision compared to the proposed methods. In all cases, the proposed methods achieve higher precision, recall, and F1-score.
Another conclusion is that the benefits of using odometry correction are lost in this experimental setup. In most cases, no improvement in results is observed. One possible reason for this is that the training and validation sets do not account for odometry correction.
4.6. Performance Comparison Between PointNet++ and Reduced-PointNet++
As a final experiment, a comparison was carried out between using the original PointNet++ architecture and the reduced-PointNet++ architecture proposed here for all of the methods presented in this work.
All experiments were carried out on a platform with a 12 GB VRAM NVIDIA GTX TITAN GPU (NVIDIA, Santa Clara, CA, USA), a 12-core Intel
® Core™ i7-8700K processor (Intel, Santa Clara, CA, USA) running at 3.70 GHz, and 16 GB of RAM.
Table 13 shows a comparison of the execution times of both architectures. It can be seen that the reduced-PointNet++ architecture outperforms the original PointNet++ architecture in terms of speed. The execution times are, on average, 50% shorter.
It must be noted that this reduction in execution time does not sacrifice performance.
Table 14 shows the results of using the original PointNet++ architecture when the methods are trained on the Interior 1, Interior 2, Exterior 1, and Exterior 2 datasets but tested on the Carén dataset, without using odometry information. These results are directly comparable to those in
Table 11. Comparing both tables reveals that the results are similar. In some cases, the original PointNet++ obtains slightly better results, while in others, the proposed reduced-PointNet++ obtains slightly better results.
These results show that reducing the network size is a valid approach for the dust detection problem, but it also shows that there is still room for further prunning since the reduction in parameters did not affect performance.
5. Discussion
5.1. Analysis of the Results
This work presents a real-time dust filtering method for LIDAR point clouds based on a reduced-PointNet++ architecture. From the results obtained, it is clear that the proposed approach outperforms the baseline heuristic method and the baseline machine learning method. Furthermore, the experiments demonstrated that temporal information is useful in distinguishing dust from static objects, and the integration of odometry further improved the performance in mobile applications.
This is a promising approach for robotic applications in challenging dust-filled environments, since it allows filtering of dust from LiDAR measurements in order to perform basic navigation tasks such as SLAM, object avoidance, and emergency stops. Furthermore, the proposed reduction in the network size, which in turn reduces the computational requirements and inference times, allows the network to be deployed in embedded systems such as those found in robotic platforms, which are usually constrained by restrictions such as size and, for dusty environments, passive cooling.
The approach was tested and developed using the UCHILE-Dust dataset, which is made publicly available to support the development and benchmarking of future methods. The dataset includes indoor, outdoor, and mobile robot recordings, covering a wide range of scenarios with varying dust densities, allowing for a wide range of possible applications. Our experiments showed that models trained on UCHILE-Dust generalized well from static environments to mobile robot deployments. This suggests that models trained on this dataset learn dust-specific features rather than environment-specific features such as the relative positions of the objects in the scene (layout).
5.2. Mining Applications
2D and 3D LIDAR sensors are used in mining and mineral processing to provide detailed three-dimensional measurements of the environment. Given that their use can be hindered by the presence of dust generated in different mining processes (blasting, crushing, secondary reduction, material transport by earthmoving machines and trucks, etc.), it seems natural to use dust-filtering methods to reduce the dust from LIDAR observations.
For example, dust filtering can be used to increase the robustness of obstacle detection and autonomous navigation systems in haul trucks in open pit mines. It can also be used to filter transient dust clouds after blasting, to increase the accuracy of volume calculations in stockpile volumetrics, to improve 3D measurements and rock detection in impact hammer automation, and to remove noisy measurements that can produce false deformation alerts in pit wall monitoring, among other applications.
In our case, we are applying the proposed method to our autonomous navigation and loading system using LHD (load–haul–dump) vehicles [
15] and to our autonomous impact hammer operating system [
16]. We expect field data from these deployments to be available in the near future to assess and report on the benefits of the dust-filtering method in these two applications.
6. Conclusions
This article proposes a dust-filtering method based on the use of a neural network. This method can filter dust measurements from point clouds obtained by using LIDARs in real time. The method was validated using real data, yielding results at the forefront of the field. In addition, a public database was created using LIDAR sensor data from various dusty environments. This database is expected to play a crucial role in the development and validation of dust filtering methods.
Future work will include measuring the advantages of using this type of algorithm in mining applications with LIDAR data. These applications include detecting obstacles from mining vehicles in open pits, autonomously navigating in mining tunnels, autonomous operation of impact hammers, and creating 3D models of stockpiles, among others.
Furthermore, it is worthwhile to study how novel hyperspectral LIDARs can be applied in mining applications.