Next Article in Journal
Spatial Expansion and Driving Mechanisms of the Yangtze River Delta, Based on RF-RFECV Feature Selection and Night-Time Light Remote Sensing Data
Previous Article in Journal
Hyperspectral Imaging Reveals Chlorophyll Temporal Dynamics in Masson Pine Under Pine Wood Nematode and Abiotic Stresses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection Based on Height Mutual Information in Airborne LiDAR Filtering

1
School of Resources Environment Science and Technology, Hubei University of Science and Technology, Xianning 437100, China
2
GNSS Research Center, Wuhan University, Wuhan 430079, China
3
Research Center of Beidou + Industrial Development of Key Research Institute of Humanities and Social Sciences of Hubei Province, Xianning 437100, China
4
Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China
5
Surveying and Mapping Engineering, Henan Technical College of Construction, Zhengzhou 450064, China
6
Institute of Telecommunication and Navigation Satellites, China Academy of Space Technology, Beijing 100094, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(7), 1031; https://doi.org/10.3390/rs18071031
Submission received: 12 February 2026 / Revised: 26 March 2026 / Accepted: 27 March 2026 / Published: 30 March 2026

Highlights

What are the main findings?
  • To address the essential role of height data in airborne LiDAR filtering, a novel feature selection method is proposed strategically based on height mutual information.
  • The proposed feature selection method demonstrates enhanced effectiveness and reliability for filtering, as evidenced by a statistically significant improvement in the average kappa coefficient.
What are the implications of the main findings?
  • Height-related features serve as pivotal discriminative factors in filtering airborne LiDAR data, playing a central role in separating ground points from non-ground points and significantly enhancing the accuracy of point cloud classification.
  • The proposed feature selection method effectively identifies and eliminates contextually redundant geometric features, thereby enhancing filtering efficiency and improving the discriminative power of the final feature set for more accurate ground point classification.

Abstract

Filtering constitutes a critical step in the post-processing of airborne Light Detection And Ranging (LiDAR) data. Over the past decade, machine learning has emerged as a prominent methodological paradigm across numerous disciplines, attracting significant research interest in its application to LiDAR filtering. From a machine learning perspective, filtering is essentially a binary classification task that aims to discriminate between ground and non-ground points. However, the limited information inherent in point clouds often leads to the generation of highly correlated features, particularly those derived from height data, which can compromise filtering accuracy. To address this issue, feature selection becomes imperative. In this study, we employed height-based mutual information as a criterion to identify and eliminate less discriminative features for filtering. The AdaBoost (Adaptive Boosting) algorithm was adopted as the classifier for point cloud filtering. For each point, nineteen features were derived from the raw LiDAR point cloud based on height and other geometric attributes within a defined neighborhood. The performance of the proposed feature selection approach was evaluated using benchmark datasets provided by the International Society for Photogrammetry and Remote Sensing (ISPRS). Experimental results demonstrate that the method is effective and reliable. After removing three selected features, the average kappa coefficient improved, along with a reduction in three categories of error, although a slight increase in Type II error (0.15%) was observed.

1. Introduction

Airborne LiDAR is a new active remote sensing technology consisting of a laser scanner, a GPS (Global Positioning System) and an INS (Inertial Navigation System), which is able to generate DSM (Digital Surface Model) quickly and efficiently [1]. Due to the excellent data acquisition capability of airborne LiDAR hardware, algorithms and software for data post-processing are urgently needed. Thus, many commercial companies and research institutions have developed and launched relevant LiDAR data processing software, like the TerraSolid series based on MicroStation, LiDAR_Suite developed by the Chinese TianQing Technology Company, etc. In nearly all airborne LiDAR data application software, however, ground point extraction is one of the critical steps and a necessary function, which is usually termed filtering.
In general, airborne LiDAR point clouds primarily comprise three types of points: ground points, non-ground points, and noise points [2]. Conventionally, ground points refer to those representing the bare-earth surface, typically characterized by lower elevations within a local area. Non-ground points encompass returns from vegetation, buildings, vehicles, power lines, and other above-ground objects. Noise points, often resulting from system artifacts or moving entities such as birds or aircraft, are usually defined as those with elevations significantly deviating from the local average. Therefore, filtering in airborne LiDAR data processing is fundamentally a procedure for extracting ground points from the raw point cloud [3]. Over the past two decades, numerous filtering algorithms have been developed. Traditional algorithms can be categorized according to different criteria. For instance, based on directional scanning modes, Meng [4] grouped them into six classes: segmentation/clustering, morphological, directional scanning, contour-based, TIN (Triangular Irregular Network), and interpolation-based methods. From the perspective of ground definitions, Shan and Sampath [5] divided algorithms into labeling and adjustment categories. According to different reference strategies, Sithole and Vosselman [6] classified filtering approaches into four groups: slope-based, block-minimum, surface-based, and clustering/segmentation. However, with the advancement of machine learning classifiers, many contemporary filtering algorithms no longer fit neatly into these traditional taxonomic frameworks.
Most traditional filtering algorithms require the manual and empirical setting of several parameters. Taken the classical PTD (progressive TIN densification) filtering method as an example, there are six parameters should be determined for each test dataset, including the distance between the center point and the fitted plane, the angle from the line connecting the current point and the triangle point to the triangle, the maximum building size, terrain slope, and the maximum and minimum side length of the triangle. These parameters often need to be adjusted according to local terrain relief and may even require reconfiguration for adjacent areas, making full automation difficult to achieve. In contrast, intelligent classifiers such as AdaBoost (Adaptive Boosting) and SVM (Support Vector Machine) have been introduced into airborne LiDAR data processing. Their general workflow involves (1) training a classifier using sample data with different features, and (2) applying the trained classifier to filter unlabeled point clouds. In airborne LiDAR filtering, apart from the original elevation and echo attributes, most descriptive features are derived from geometric characteristics, such as planimetric coordinates. Through continuous exploration, the number of features used has increased, ranging from five [7] to thirty-eight [8] in reported studies. While a larger feature set can improve classifier precision, it also introduces redundancy due to geometric correlations among features. Such redundancy can degrade both the classification accuracy and computational efficiency in turn. Therefore, an appropriate evaluation criterion is needed to select an optimal feature subset from the original feature space, thereby eliminating redundancy. This process, known as feature selection, aims to maintain or improve filtering accuracy by retaining the most informative features while removing redundant ones [9]. It should be noted that the effectiveness of handcrafted features for LiDAR filtering and classification persists, despite the prevailing trend in deep learning toward automated feature extraction [10].
During the last decades, many evaluation criteria have been proposed, which can be categorized into five types: distance measures, consistency measures, dependency measures, information measures, and probability of error measures [11,12]. According to different evaluation criteria, feature selection algorithms can be categorized into three schemes: filter, wrapper and embedded schemes [13,14].
A filter scheme is to find the optimal feature subset by ranking the importance of original features, including the following: A correlation-coefficients-based method is proposed by Yin [15] for high-dimensional imbalanced data, and reveals that the samples in rare classes are essential for the learning performances of rare classes. A mutual-information-based method is proposed by Herman [16] to select a less redundant and more informative set of features. An entropy-based method is proposed by Zhang [17] to handle categorical, numerical, and heterogeneous data for alleviating the dimensionality curse, enhancing learning performance, and providing better readability and interpretability. A normalized mean difference (NMD) filter-based feature selection method is proposed by Bouchene [18] and designed to address limitations of traditional techniques.
A wrapper scheme, which considers one classifier as a black box and finds an optimal subset by certain search algorithms, was proposed by Kohavi and John [19] in 1997, including the following: A genetic algorithm is proposed by De Stefano [20] in which feature subsets are evaluated by means of a specifically devised separability index for handwritten recognition. A whale optimization algorithm (WOA)-based method is proposed by Mafarja [21] to eliminate irrelevant and redundant features and enhance the classification accuracy. A novel Markov blanket-based wrapper feature selection method is proposed by Hassan [14] to work out of the box for both classification and regression tasks on mixed data. Chaudhuri [22] proposes the concept of search space division (SSD), which leads to smaller search spaces and hence reduced computational cost for high-dimensional data.
An embedded scheme, including decision trees and an artificial neural network, is to embed these feature selection algorithms in the classifier training process, without separating samples into training sets and test sets, with features remaining or being eliminated directly. JimÉnez-Navarro [23] attempts to embed a feature selection process with a general-purpose methodology, and proposes a novel general-purpose layer for neural networks to remove the influence of irrelevant features. An unsupervised feature selection algorithm, EUFS (Embedded Unsupervised Feature Selection), is proposed by Wang [24], which directly embeds feature selection into a clustering algorithm via sparse learning without the transformation. The alternating direction method of multipliers is used to address the optimization problem of EUFS.
Nowadays, many researchers focus on building some hybrid models combined three schemes together for feature selection, including the following: Shiri [25] proposes a hybrid filter–wrapper feature selection method based on Equilibrium Optimization (EO) and Simulated Annealing (SA), in which SA is used to solve the local optimal problem so that EO could be more accurate and better able to select the best subset of features. Hu [26] proposes a filter–wrapper model to obtain a feature subset from high-dimensional data in a short time, enhancing the effectiveness and efficiency to avoid wasting a lot of searches on low-ranked features. Li [27] proposed a novel feature extraction and selection scheme for hybrid fault diagnosis of gearbox based on S transform, non-negative matrix factorization, mutual information and multi-objective evolutionary algorithms.
Within the field of LiDAR data processing, research specifically focused on feature selection is relatively scarce. This scarcity may be attributed to the fact that, compared to optical imagery, fewer features can be directly derived from airborne LiDAR data itself. Consequently, scholarly efforts have been more concentrated on studying the effectiveness of feature extraction rather than on selection. Xu [28] proposes a LiDAR SLAM (Simultaneous Localization and Mapping) system that addresses localization and mapping issues by grouping consistent and stable geometry features to better express the environmental properties in both odometry and loop closure detection. Cai [8] proposes a feature selection method using mRMR (minimal-redundancy–maximal-relevance) with Parzen window optimization for airborne LiDAR filtering, aiming at finding out the optimal and appropriate feature subsets for different scenes.
In this paper, first, we generated nineteen features to be selected, based on geometric information of the point cloud neighborhood, and utilized AdaBoost as a filtering classifier. Second, due to the high height precision of airborne LiDAR data, we created a grid-based DEM (Digital Elevation Model) for filtered LiDAR data points through interpolation. The standard DEM (reference) and the filtered DEM are resampled to the same resolution for spatial alignment. Then, for the same region, we calculated height mutual information between the standard DEM and DEM generated using different features one by one, and eliminated those features with the lowest mutual information values. Experimental results show our method is effective and reliable.

2. Materials and Methods

2.1. Features Generation

From the perspective of data attributes, a single point in a point cloud lacks a meaningful physical interpretation. In contrast, a cluster of points better represents the underlying physical and geometric properties. Therefore, converting the individual points into clusters or patches of point sets facilitates a more effective description and classification of the point cloud itself. Due to the inherently unordered and randomly distributed nature of LiDAR point clouds, it is necessary to construct local neighborhoods to characterize the spatial and geometric relationships between points. The commonly used neighborhoods, including (a) a cylinder-based neighborhood and (b) a sphere-based neighborhood, are listed in Figure 1. As shown in Figure 1, the red point represents the center point, and the black points are its neighbor points. In general, a cylinder-based neighborhood is defined to include all points whose horizontal Euclidean distance (i.e., within the XY-plane) to the center point is less than a specified radius r. A sphere-based neighborhood is defined as the set of all points whose three-dimensional distance to the center point is less than a specified radius r. The grid-based neighborhood construction operates in two steps: (1) separate the original points into a grid of square cells with side length r, and (2) select the neighbor points in the corresponding grids as required.
Based on geometrical information of airborne LiDAR data, nineteen features to be selected are generated as follows (the center point p is given).
Grid-Based Feature: Step-off counts (feature 1), which was first proposed in 2015, and the details can be found in [29]. This is a grid-based feature. It counts the number of directions with a significant height discontinuity. According to our experimental results, for almost half of the non-ground points, this feature value is eight, while most ground points get one or two.
Density-Based Feature: Point density (feature 2) and point density ratio (feature 3). The details can be found in [30]. Note that point density is in the cylinder-based neighborhood, and point density ratio is computed on the basis of the cylindrical and spherical neighborhoods.
Profile-Based Features: These features are based on a cylinder neighborhood. Profile of a center point and its neighborhood is separated into several bins with fixed height h (0.75 m in this paper, according to the point density), as shown in Figure 1a, where the length of bins is twice the radius of its neighborhood. Profile-based features represent the height distribution of these adjacent points. If a height discontinuity appears in the profile, it indicates that the neighborhood of the center point may be located at the edge of a ground object. The details can be found in [8]. According to this, nine features are generated, including, the count of non-empty bins (feature 4), the count of continuous non-empty bins (feature 5), the count of continuous empty bins (feature 6), the biggest height deviation (feature 11), the signed biggest height deviation (feature 12), the positive biggest height deviation (feature 13), the negative biggest height deviation (feature 14), the max point number deviation (feature 15), and the count of height classification (feature 16).
Eigenvalue-Based Features: The eigenvalues are calculated from the covariance matrix between x, y and z of three-dimensional coordinate points in the neighborhood. There are four features, including anisotropy (feature 7), linearity (feature 8), planarity (feature 9) and sphericity (feature 10). Details can be found in [30].
Surface-Based Features: Calculated on the basis of the estimation surface fitted by center point p and its neighborhood, including plane slope (feature 17), surface roughness (feature 18) and distance to surface (feature 19) [30].
The details of the above-generated features are listed in Table 1.

2.2. Classifier

As we aim at selecting redundant features for filtering and the feature selection method is filter-based, AdaBoost is adopted as the classification algorithm for its stability. The model is trained on a labeled sample set and subsequently applied to classify (filter) unlabeled LiDAR point clouds. AdaBoost, introduced by Freund and Schapire [31] in 1996, is an iterative ensemble learning method that operates by combining a series of simple, weakly performing classifiers (“weak learners”) into a single, highly accurate strong classifier. The final ensemble classifier H is constructed as a weighted linear combination of these weak learners. The core strength of AdaBoost lies in its ability to sequentially focus on misclassified examples from previous iterations, thereby producing a composite model that typically outperforms any individual weak learner and demonstrates robust generalization to new, unseen data [7]. This characteristic makes it particularly suitable for filtering tasks where distinguishing ground points from complex non-ground objects is challenging. The basic workflow of AdaBoost is shown in Table 2.

2.3. Mutual Information

2.3.1. Definition of Mutual Information

This paper adopts height mutual information (MI) between the filtered DEM using a certain feature and the standard DEM as an evaluation criterion. MI is a measure of the mutual dependence between the two variables, which quantifies the “amount of information” obtained about one random variable through the other random variable [32].
Entropy is used to represent the “amount of information” and is always referred to as the Shannon entropy. The information entropy between the discrete-type random variable X and its probability p ( x ) :
H ( X ) = x p ( x ) log 1 p ( x ) = x p ( x ) log p ( x ) = E log p ( x )
Seen from Equation (1), entropy is an uncertainty measure of expectation for X . In other words, H ( X ) approximately equals the “amount of information” obtained from the random variable X .
For joint distribution probabilities p ( x , y ) , joint entropy and conditional entropy are defined as follows:
Joint entropy is an uncertainty measure when two random variables X and Y occur simultaneously. It reflects the correlation between X and Y .
H ( X , Y ) = x , y p ( x , y ) log p ( x , y )
If Y is given, the conditional entropy of X is an uncertainty measure of X when Y occurs.
H X Y = x , y p ( x , y ) log p x y = E log p x y
MI reflects the information correlation between two random variables. Given two random variables X and Y , if their edge probability distributions are p ( x ) and p ( y ) , respectively, which are often multi-dimensional, then their MI I ( X ; Y ) is defined as
I ( X ; Y ) = y x p ( x , y ) log p ( x , y ) p ( x ) p ( y ) d x d y
where p ( x , y ) is the joint probability distribution of random variables X and Y . It can be inferred from the definition that if X and Y are irrelevant and independent, their MI is zero. Similarly, the higher the correlation between X and Y is, the bigger their MI value is, and the more similar the information is. MI can also be written as follows:
I ( X ; Y ) = H ( X ) H X Y = H ( Y ) H Y X = H ( X ) + H ( Y ) H ( X , Y )
where H ( X ) and H ( Y ) represent the Shannon entropy of X and Y , respectively.
Equation (5) shows that MI between two variables can be represented as the decreasing degree of uncertainty for one variable X (or Y ) when the other variable Y (or X ) is given. Additionally, MI satisfies the symmetry, I ( X ; Y ) = I ( Y ; X ) , and the maximum MI is the minimum information entropy of these two variables, I ( X ; Y ) m i n ( H ( X ) , H ( Y ) ) .

2.3.2. MI Calculation

In this paper, a height histogram is utilized to calculate the height MI between two DEMs. The histogram of the grid DEM is considered a height histogram, which depicts the whole height distribution in this DEM. Firstly, the height probability distribution of the DEM is calculated. Assuming histograms of DEM A and B are h ( a ) and h ( b ) , their height probability densities are P A ( a ) , P B ( b ) , respectively, which equals to their information entropy H ( X ) and H ( Y ) in Equation (5), then:
P A ( a ) = h ( a ) A h ( a ) , P B ( b ) = h ( b ) B h ( b )
Secondly, h ( a , b ) represents the height joint histogram of DEM A and B, which counts the point number with the same height value in the same position. Height joint histogram P A , B ( a , b ) , which equals their joint entropy H ( X , Y ) in Equation (5), can be calculated by joint probability density:
P A , B ( a , b ) = h ( a , b ) A B h ( a , b )
Thirdly, (6) and (7) are taken into Equation (5). Equation (8) is the transformation of Equation (5) and is used to calculate the height MI of two DEMs.
I ( A ; B ) = P A ( a ) + P B ( b ) P A , B ( a , b )
This paper calculates height MI values between the standard DEM and filtered DEM using each feature, and eliminates those features with the maximum number of occurrences of minimum MI values.

2.3.3. Determination of Histogram Interval

Based on the methodological framework outlined above, this study quantifies the statistical dependency between the standard DEM and filtered DEM datasets by computing their MI values. We employ a simple and effective height histogram for this calculation. In the histogram-based computation process, both the probability density and the joint probability density are derived from the number of points sharing identical height values at corresponding spatial locations. Consequently, the determination of histogram interval width is critical, as it substantially influences the resulting MI calculation and its reliability.
To ensure reliable results from mutual information calculation using height histograms, we selected four datasets from the experimental data for height mutual information calculation with different interval lengths (50 m, 100 m, 150 m and 200 m), as listed in Figure 2.
The experimental results above indicate that as the calculation interval widens, the mutual information values increase, reflecting a greater relative change in mutual information value. This phenomenon can be attributed to the fact that a larger interval in the height histogram encompasses richer height information. The calculated values of height mutual information exhibit a consistent and predictable pattern. Although there is a significant difference in mutual information values across different height intervals, for different features within the same set of data, the feature with the smaller mutual information value remains unchanged regardless of the interval variation. It is noteworthy that the trend of height MI value changes remained consistent across different datasets despite the change in height intervals. Thus, the above experimental results provide the guideline for selecting features with lower height MI values in the same height interval.

3. Experimental Results

The test dataset was provided by the International Society for Photogrammetry and Remote Sensing (ISPRS) Working Group III/3 commission. It is located in the center of Vaihingen/Enz and the city of Stuttgart in southern Germany. The dataset was obtained by an Optech ALTM scanner and includes various feature objects (vegetation, buildings, roads, railways, rivers, bridges, power lines, water, etc.), and both the first and last returns are available. It is composed of 15 sample areas with an average point spacing of 1–1.5 m in urban areas (Samples s11 to s42) and 2–3.5 m in rural areas (Samples s51 to s71). For each sample, the reference data, used to generate the standard DEM, was obtained by semi-automatic filtering and manual editing with knowledge of the landscape and available aerial images. A detailed description of the dataset can be found in [6]. The experiment was implemented in MATLAB R2025a, and results were displayed in LiDAR_Suite v3.4, a commercial point cloud processing software developed by the authors.
Each feature described in Section 2.1, along with the training samples, was sequentially input into the AdaBoost classifier to construct nineteen distinct classification models. These models were subsequently applied to classify the unlabeled point clouds into ground and non-ground points. Based on the classification results, nineteen corresponding DEMs were generated (Figure 3b–d and Figure 4b–d) and compared against a standard DEM (Figure 3a and Figure 4a) to calculate the height-based MI values. From the statistical analysis, features that most frequently corresponded to the lowest MI values across tests were identified as redundant and subsequently removed. To validate the effectiveness of the proposed feature selection method, kappa coefficients and three error metrics—Type I error, Type II error, and total error—were employed for evaluation.
Type I = b a + b Type II = c c + d Total = b + c e  
where a represents the number of ground points that are correctly classified as ground points, b represents the number of ground points that are incorrectly classified as non-ground points, c represents the number of non-ground points that are incorrectly classified as ground points, d represents the number of non-ground points that are correctly classified as non-ground points, and e represents the total number of points.
To eliminate redundant features, we first ranked the mutual information values of 19 features for each set of data. Secondly, we marked the three or five features with the smallest height MI values in each sample dataset. Finally, we counted the frequency of each feature to be one of three or five features with the smallest height MI values in all fifteen sample datasets. The statistical results are shown in Table 3 and Figure 5 and Figure 6. As seen in Table 3, feature 7 performs worst with the statistically lowest MI values, followed by feature 10 and feature 6, which are presented in bold.
As shown in Table 3 and Figure 5 and Figure 6, feature 7 (anisotropy) yields the poorest performance, followed by feature 10 (sphericity) and feature 6 (count of continuous empty bins), indicating that these three features possess lower discriminative power compared to others in airborne LiDAR filtering. The filtering accuracy after sequentially eliminating feature 7, features 7/10, and features 7/10/6 was evaluated against the baseline (using all 19 features) using kappa coefficients and three error metrics, as summarized in Figure 7. The optimal result for each sample and each error type obtained through feature selection is highlighted in bold in Table 4.
As Table 3 and Figure 7 show, the results of feature selection are reliable and acceptable. During the feature selection process, filtering accuracy of each sample dataset using the remaining features fluctuates not very obviously, especially for Sample s51, which verifies the efficiency of our proposed method. After eliminating features 7/10/6, three evaluation criteria (kappa coefficient, Type I errors, total errors) of most sample datasets improve obviously, but Type II errors showed poorer performance to a certain extent. For the average values, the kappa coefficient and three types of errors after eliminating three features (7/10/6) all increase, except for Type II error (increasing 0.15%). It is noteworthy that Type II errors represent non-ground points that are misclassified as ground points. In general, except for areas such as cliffs and steep walls, the terrain exhibits continuous elevation changes, and non-ground points are visually more noticeable. In post-processing, eliminating non-grounds is relatively easy. Then, Type II errors are reduced at a cost of an increase in Type I errors and vice versa [33]. Overall, variations in average accuracies are acceptable, which range from −0.18% (Type I error) to 0.15% (Type II error). Table 5 depicts the statistical results of sample datasets for the changes in each evaluation criterion (kappa coefficient and three types of errors) during the feature selection process.
As listed in Table 3, the maximum values (maximums and averages) of variations (better and worse) in evaluation criteria during each feature selection process are displayed in boldface. After eliminating feature 7, the variations in the three types of errors range from −0.38% to 0.55%. Then, after eliminating features 7/10, the variations in three types of errors range more widely, from −2.04% to 1.45%. And after eliminating features 7/10/6, the variations in three types of errors range from −4.27% to 2.02%. It is noteworthy that a negative value indicates a reduction in error, while a positive value indicates an increase in error. Seen from the statistical results, most datasets perform better or unchanged after eliminating the selected features, which further verifies the redundancy of these features. According to the average variation values of each evaluation criterion, except for Type II errors, other accuracies are all improved after eliminating three redundant selected features. It should be noted that related studies indicate that repairing Type II errors is considerably easier than repairing Type I errors [8,10]. The observed variations in each criterion are expected and acceptable, thereby confirming the robustness and reliability of our feature selection method.

4. Discussion

Filtering is a critical and foundational step in the airborne LiDAR data processing. Its primary objective is to separate laser returns reflected from the earth’s terrain (ground points) from those reflected by above-ground objects such as vegetation, buildings, and infrastructure (non-ground points). From the pattern recognition point of view, filtering is fundamentally a binary classification. The filtering accuracy is essential for generating an accurate DEM, which serves as a cornerstone for numerous applications in topography, forestry, and urban planning. With the rapid advancement of machine learning techniques, researchers have increasingly explored and adapted a diverse array of related algorithms, including supervised, unsupervised, and deep learning methods, to automate and improve the filtering accuracy. Compared with the optical images, these algorithms learn to discriminate between classes based on some generated features, calculated for each point or its local neighborhood. In general, features derived from LiDAR data are based on geometric information, mainly height information. Consequently, many commonly used features, such as those based on elevation statistics, slopes, or curvatures within a neighborhood, are often highly correlated with each other. This feature redundancy can introduce multicollinearity, which may not only impede the interpretability of the model but also potentially destabilize the learning process and obscure the true discriminative signals, ultimately limiting classification performance.
To address this challenge and enhance filtering accuracy by mitigating the effects of feature redundancy, this paper proposes a novel feature selection methodology combined with the analysis of height-based mutual information. Mutual information serves as a robust measure of the general dependence between two variables, capable of capturing both linear and non-linear relationships. Our method evaluates the height MI values between the DEM filtered by a certain feature and the standard DEM, which can be considered as the shared information content between each candidate’s feature and a simplified representation of the local height distribution. Under the condition that the sample data remains unchanged for each dataset, the impact of sample quality on the correlation judgment can be excluded. By quantifying and analyzing these MI relationships, the method identifies and recommends the removal of features that contribute minimal unique information for the filtering task.
Experimental validation on standard benchmark datasets demonstrates the efficacy of the proposed approach. The method successfully identified and eliminated three redundant features from an initial set of nineteen commonly used features. Notably, after removing a specific subset of features (7, 10, and 6 in our study), the overall filtering accuracy, as measured by average statistics like the Kappa coefficient and Type I errors, was either maintained or showed improvement. A specific and expected exception was observed in the behavior of Type II errors (where ground points are incorrectly classified as non-ground), which showed a more complex response. However, variations in accuracy are acceptable and plausible, which range from −4.27% (Type I error for Sample s41) to 2.02% (Type II error for Sample s41). These results confirm that the proposed feature selection method based on height mutual information can streamline the feature set without compromising, and sometimes even enhancing, filtering robustness.

5. Conclusions

This paper proposes a novel feature selection methodology for airborne LiDAR filtering based on height mutual information. The statistical results from ISPRS datasets show that there is redundancy among the generated features. After eliminating the selected three features (anisotropy, sphericity and count of continuous empty bins), most filtering results remain unchanged, and even better than before. This indicates the effectiveness of the proposed feature selection method. In future studies, we plan to expand the feature set beyond geometric descriptors by integrating additional data attributes, including full-waveform parameters, intensity returns, and even spectral characteristics. While contemporary deep learning approaches, particularly Transformer-based models, are designed to automatically learn relevant representations, these hand-crafted features can still provide complementary information that improves classification performance. Consequently, our ongoing research will focus on combining deep learning techniques with systematic feature selection methods to further optimize point cloud classification accuracy.

Author Contributions

Conceptualization, X.X. and Z.C.; methodology, Z.C.; software, Z.C. and L.Z.; validation, X.X., Z.C. and L.Z.; formal analysis, Z.C.; investigation, X.X.; resources, Z.C.; data curation, Z.C.; writing—original draft preparation, Z.C.; writing—review and editing, L.Z. and X.X.; visualization, L.Z. and Z.C.; supervision, X.X.; project administration, Q.C., S.B. and Z.H.; funding acquisition, X.X. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (U25D8020, 42304032) and the PhD Research Start-up Foundation of Hubei University of Science and Technology, China (L07903/170428).

Data Availability Statement

The original data presented in the study are openly available from the International Society for Photogrammetry and Remote Sensing Working Group III-3 as a public filtering-test dataset at https://www.itc.nl/isprs/wgIII-3/filtertest/downloadsites/ (accessed on 18 March 2026).

Acknowledgments

The test datasets were provided by the International Society for Photogrammetry and Remote Sensing (ISPRS) Working Group and OpenTopography. The authors would like to thank them and the anonymous reviewers for their constructive comments on the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, R.; Wu, J.; Zhao, X.; Luo, Y.; Xu, G. SC-CNN: LiDAR point cloud filtering CNN under slope and copula correlation constraint. ISPRS-J. Photogramm. Remote Sens. 2024, 212, 381–395. [Google Scholar] [CrossRef]
  2. Cao, D.; Wang, C.; Du, M.; Xi, X. A Multiscale Filtering Method for Airborne LiDAR Data Using Modified 3D Alpha Shape. Remote Sens. 2024, 16, 1443. [Google Scholar] [CrossRef]
  3. Yan, W.Y.; Shaker, A.; El-Ashmawy, N. Urban land cover classification using airborne LiDAR data: A review. Remote Sens. Environ. 2015, 158, 295–310. [Google Scholar] [CrossRef]
  4. Meng, X.; Wang, L.; Silván-Cárdenas, J.L.; Currit, N. A multi-directional ground filtering algorithm for airborne LIDAR. ISPRS-J. Photogramm. Remote Sens. 2009, 64, 117–124. [Google Scholar] [CrossRef]
  5. Shan, J.; Sampath, A. Urban DEM generation from raw lidar data: A labeling algorithm and its performance. Photogramm. Eng. Remote Sens. 2005, 71, 217–226. [Google Scholar] [CrossRef]
  6. Sithole, G.; Vosselman, G. Experimental comparison of filter algorithms for bare-Earth extraction from airborne laser scanning point clouds. ISPRS-J. Photogramm. Remote Sens. 2004, 59, 85–101. [Google Scholar] [CrossRef]
  7. Lodha, S.K.; Fitzpatrick, D.M.; Helmbold, D.P. Aerial Lidar Data Classification using AdaBoost. In Proceedings of the Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, Canada, 21–23 August 2007; pp. 435–442. [Google Scholar]
  8. Cai, Z.; Ma, H.; Zhang, L. Feature selection for airborne LiDAR data filtering: A mutual information method with Parzon window optimization. GIScience Remote Sens. 2020, 57, 323–337. [Google Scholar] [CrossRef]
  9. Zhuo, S.D.; Qiu, J.J.; Wang, C.D.; Huang, S.Q. Online Feature Selection With Varying Feature Spaces. IEEE Trans. Knowl. Data Eng. 2024, 36, 4806–4819. [Google Scholar] [CrossRef]
  10. Luo, W.; Ma, H.; Yuan, J.; Zhang, L.; Ma, H.; Cai, Z.; Zhou, W. High-Accuracy Filtering of Forest Scenes Based on Full-Waveform LiDAR Data and Hyperspectral Images. Remote Sens. 2023, 15, 3499. [Google Scholar] [CrossRef]
  11. Huan, L.; Lei, Y. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [CrossRef]
  12. Wu, Y.; Li, P.; Zou, Y. Partial multi-label feature selection with feature noise. Pattern Recognit. 2025, 162, 111310. [Google Scholar] [CrossRef]
  13. Duan, Z.; Li, T.; Ling, Z.; Wu, X.; Yang, J.; Jia, Z. Fair streaming feature selection. Neurocomputing 2025, 624, 129394. [Google Scholar] [CrossRef]
  14. Hassan, A.; Paik, J.H.; Khare, S.R.; Hassan, S.A. A wrapper feature selection approach using Markov blankets. Pattern Recognit. 2025, 158, 111069. [Google Scholar] [CrossRef]
  15. Yin, L.; Ge, Y.; Xiao, K.; Wang, X.; Quan, X. Feature selection for high-dimensional imbalanced data. Neurocomputing 2013, 105, 3–11. [Google Scholar] [CrossRef]
  16. Herman, G.; Zhang, B.; Wang, Y.; Ye, G.; Chen, F. Mutual information-based method for selecting informative feature sets. Pattern Recognit. 2013, 46, 3315–3327. [Google Scholar] [CrossRef]
  17. Zhang, P.; Li, T.; Yuan, Z.; Luo, C.; Liu, K.; Yang, X. Heterogeneous Feature Selection Based on Neighborhood Combination Entropy. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 3514–3527. [Google Scholar] [CrossRef]
  18. Bouchene, M.M.; Fatima, M. Normalized mean difference (NMD): A novel filter-based feature selection method. Int. J. Mach. Learn. Cybern. 2025, 16, 6837–6855. [Google Scholar] [CrossRef]
  19. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
  20. De Stefano, C.; Fontanella, F.; Marrocco, C.; Scotto di Freca, A. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit. Lett. 2014, 35, 130–141. [Google Scholar] [CrossRef]
  21. Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft. Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
  22. Chaudhuri, A. Search space division method for wrapper feature selection on high-dimensional data classification. Knowl.-Based Syst. 2024, 291, 111578. [Google Scholar] [CrossRef]
  23. JimÉnez-Navarro, M.J.; MartÍnez-Ballesteros, M.; Brito, I.S.; MartÍnez-Álvarez, F.; Asencio-CortÉs, G. Embedded feature selection for neural networks via learnable drop layer. Log. J. IGPL 2025, 33, jzae062. [Google Scholar] [CrossRef]
  24. Wang, S.; Tang, J.; Liu, H. Embedded unsupervised feature selection. Proc. AAAI Conf. Artif. Intell. 2015, 29, 470–476. [Google Scholar] [CrossRef]
  25. Shiri, M.A.; Omidi, M.; Mansouri, N. A New Hybrid Filter-Wrapper Feature Selection using Equilibrium Optimizer and Simulated Annealing. J. Mahani Math. Res. 2023, 13, 293–332. [Google Scholar] [CrossRef]
  26. Hu, P.; Zhu, J. A filter-wrapper model for high-dimensional feature selection based on evolutionary computation. Appl. Intell. 2025, 55, 581. [Google Scholar] [CrossRef]
  27. Li, B.; Zhang, P.-l.; Tian, H.; Mi, S.-s.; Liu, D.-s.; Ren, G.-q. A new feature extraction and selection scheme for hybrid fault diagnosis of gearbox. Expert Syst. Appl. 2011, 38, 10000–10009. [Google Scholar] [CrossRef]
  28. Xu, M.; Lin, S.; Wang, J.; Chen, Z. A LiDAR SLAM System With Geometry Feature Group-Based Stable Feature Selection and Three-Stage Loop Closure Optimization. IEEE Trans. Instrum. Meas. 2023, 72, 8504810. [Google Scholar] [CrossRef]
  29. Guo, B.; Huang, X.; Zhang, F.; Sohn, G. Classification of airborne laser scanning data using JointBoost. ISPRS-J. Photogramm. Remote Sens. 2015, 100, 71–83. [Google Scholar] [CrossRef]
  30. Kim, H.B.; Sohn, G. 3D classification of power-line scene from airborne laser scanning data using random forests. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (IAPRS), Vienna, Austria, 5–7 July 2010; pp. 126–132. [Google Scholar]
  31. Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
  32. Pingel, T.J.; Clarke, K.C.; McBride, W.A. An improved simple morphological filter for the terrain classification of airborne LIDAR data. ISPRS-J. Photogramm. Remote Sens. 2013, 77, 21–30. [Google Scholar] [CrossRef]
  33. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Figure 1. Three types of neighborhoods, including (a) cylinder-based neighborhood, (b) sphere-based neighborhood and (c) grid-based neighborhood, where the red point is the center point, and the black points are its neighbor points.
Figure 1. Three types of neighborhoods, including (a) cylinder-based neighborhood, (b) sphere-based neighborhood and (c) grid-based neighborhood, where the red point is the center point, and the black points are its neighbor points.
Remotesensing 18 01031 g001
Figure 2. Changes in MI values with different height intervals (units: m) for (a) Sample 12, (b) Sample 31, (c) Sample 41 and (d) Sample 42.
Figure 2. Changes in MI values with different height intervals (units: m) for (a) Sample 12, (b) Sample 31, (c) Sample 41 and (d) Sample 42.
Remotesensing 18 01031 g002
Figure 3. DEM visualization for Sample 12 (urban area), with color indicating elevation (blue—low, red—high).
Figure 3. DEM visualization for Sample 12 (urban area), with color indicating elevation (blue—low, red—high).
Remotesensing 18 01031 g003
Figure 4. DEM visualization for Sample 53 (rural area), with color indicating elevation (blue—low, red—high).
Figure 4. DEM visualization for Sample 53 (rural area), with color indicating elevation (blue—low, red—high).
Remotesensing 18 01031 g004
Figure 5. The frequency of each feature is one of three features with minimum height MI values in sample datasets, where ratios of features 7/10/6 are the largest.
Figure 5. The frequency of each feature is one of three features with minimum height MI values in sample datasets, where ratios of features 7/10/6 are the largest.
Remotesensing 18 01031 g005
Figure 6. The frequency of each feature is one of five features with minimum height MI values in each sample dataset, where ratios of features 7/10/12/9/6 are the largest.
Figure 6. The frequency of each feature is one of five features with minimum height MI values in each sample dataset, where ratios of features 7/10/12/9/6 are the largest.
Remotesensing 18 01031 g006
Figure 7. Comparison of statistical results: kappa coefficients and three types of errors for AdaBoost using the original nineteen features and remaining features after feature selection (eliminating feature 7, features 7/10, features 7/10/6).
Figure 7. Comparison of statistical results: kappa coefficients and three types of errors for AdaBoost using the original nineteen features and remaining features after feature selection (eliminating feature 7, features 7/10, features 7/10/6).
Remotesensing 18 01031 g007
Table 1. The details of nineteen generated features.
Table 1. The details of nineteen generated features.
Feature NumberFeature NameDescriptionsReference
1step-off countsThe number of step-off directions.[29]
2point densityThis is the number of points within a given neighborhood.[30]
3point density ratioThis is the ratio between the number of points in the sphere-based neighborhood and the number of points in the cylinder-based neighborhood.[30]
4count of non-empty binsThe number of non-empty bins.[8]
5count of continuous non-empty binsThe maximum number of continuous non-empty bins.
6count of continuous empty binsThe maximum number of continuous empty bins.
7anisotropyThis value is equal to (1—sphericity).[30]
8linearityThis value is large if the points in the neighborhood are linear.
9planarityThis value is large if the points in the neighborhood are coplanar.
10sphericity This value is large if the points in the neighborhood are discrete.
11the biggest height deviationAbsolute value of maximum height difference.[8]
12signed biggest height deviationTrue value of maximum height difference.
13positive biggest height deviationThe maximum positive height difference between the center point and points in its neighborhood is higher than itself.
14negative biggest height deviationThe maximum negative height difference between the center point and points in its neighborhood is lower than itself.
15max point number deviationThis feature considers all bins except the lowest one. Maximum difference between the average point counts and the point counts in each bin should be recorded.
16count of height classificationThrough clustering analysis, height in the neighborhood can be roughly categorized into several classes.
17plane slopeThe slope of the fitted plane.[30]
18surface roughnessThe standard variance of the distance between the points and the fitted surface.
19distance to surfaceThe distance between the current point and the fitted surface.
Table 2. The workflow of AdaBoost method.
Table 2. The workflow of AdaBoost method.
ProcessDescription
InputTraining samples with labels x 1 , y 1 , x 2 , y 2 , , x N , y N ;
Component learning algorithm;
Number of cycles T.
Initiation LoopWeights of training samples are w i 1 = 1 N , for all i = 1, …, N
for t = 1, …, T
(1) Use component learning algorithm to train a component classifier, ht, on the updated training samples
(2) Calculate training error of ht: e t = i = 1 N w i t , y i h t ( x i )
(3) if  e t > 0.5 ,  then break
(4) Calculate weight for component classifier ht: α t = 1 2 ln 1 e t e t
(5) Update weights of training samples: w i t + 1 = w i t exp α t y i h t x i C t ,  i = 1,…, N, where Ct is a normalization constant, and i = 1 N w i t + 1 = 1
(6) end for
Output f ( x ) = s i g n i = 1 T α t h t ( x )
Table 3. The statistical frequency of each feature is one of three or five features with the smallest height MI values in all fifteen sample datasets.
Table 3. The statistical frequency of each feature is one of three or five features with the smallest height MI values in all fifteen sample datasets.
Features12345678910111213141516171819
Frequency
three3111349334211302030
five42534511366361403240
The bold values highlight the performance of features 7/10/6.
Table 4. Kappa coefficients and three types of errors corresponding to AdaBoost using original features and remaining features after feature selection (eliminating features 7/10/6 successively).
Table 4. Kappa coefficients and three types of errors corresponding to AdaBoost using original features and remaining features after feature selection (eliminating features 7/10/6 successively).
Sample DatasetsKappaOriginal
Features
Eliminating
Feature 7
Eliminating Features 7/10Eliminating
Features 7/10/6
Type of Errors
/%
s11kappa0.5566 *0.55660.55660.5433
I11.0311.0311.0310.93
II34.7234.7234.7236.23
Total21.1421.1421.1421.73
s12kappa0.83130.83130.82480.8258
I1.691.691.721.55
II15.4415.4416.0716.16
Total8.408.408.728.67
s21kappa0.95630.95410.95530.9588
I0.410.340.340.37
II5.295.845.675.04
Total1.491.561.521.40
s22kappa0.74020.74010.73950.7350
I8.038.049.129.40
II18.0218.0116.1116.19
Total11.1411.1511.3011.52
s23kappa0.71680.71680.71640.7164
I13.5013.5013.7313.73
II14.8214.8214.6114.61
Total14.1214.1214.1414.14
s24kappa0.76220.76220.76780.7678
I3.313.313.943.94
II24.1024.1022.0622.06
Total9.029.028.928.92
s31kappa0.86500.86500.86530.8653
I7.837.837.257.25
II5.435.436.066.06
Total6.736.736.706.70
s41kappa0.56370.56370.56630.5861
I29.2429.2427.5324.97
II14.4114.4115.8616.43
Total21.8121.8121.6820.69
s42kappa0.84580.84410.84030.8403
I16.7916.8017.6117.61
II1.751.841.691.69
Total6.156.226.356.35
s51kappa0.89860.89860.89860.8986
I0.720.720.720.72
II12.6812.6812.6812.71
Total3.333.333.333.33
s52kappa0.73300.73360.73960.7396
I2.182.242.012.01
II27.3526.9727.3927.39
Total4.834.844.684.68
s53kappa0.76660.76660.76380.7638
I0.560.560.560.56
II28.2228.2228.7328.73
Total1.681.681.701.70
s54kappa0.87700.87700.87730.8773
I6.956.956.586.58
II5.385.385.695.69
Total6.116.116.106.10
s61kappa0.91480.91480.91240.9124
I0.280.280.260.26
II8.468.469.379.37
Total0.560.560.580.58
s71kappa0.79790.79790.80000.8020
I0.490.490.380.40
II28.1928.1928.5328.14
Total3.623.623.573.54
Minimumkappa0.55660.55660.55660.5433
I0.280.280.260.26
II1.751.841.691.69
Total0.560.560.580.58
Maximumkappa0.95630.95410.95530.9588
I29.2429.2427.5324.97
II34.7234.7234.7236.23
Total21.8121.8121.6821.73
Averagekappa0.78840.78820.78830.7888
I6.876.876.866.68
II16.2816.3016.3516.43
Total8.018.028.038.00
* The bold values are the optimal result for each sample and each error type obtained through feature selection.
Table 5. Statistical results of sample datasets for the changes in kappa coefficient and three types of errors during the feature selection process.
Table 5. Statistical results of sample datasets for the changes in kappa coefficient and three types of errors during the feature selection process.
Eliminate Features StagesBetterUnchangedWorseMaximum of Better (Sample)Maximum of Worse (Sample)
Evaluation
Criterion
Average of BetterAverage of Worse
Feature 7Kappa11130.06% (s52)0.22% (s21)
0.06%0.13%
Type I errors11130.07% (s21)0.06% (s52)
0.07%0.03%
Type II errors21120.38% (s52)0.55% (s21)
0.20%0.32%
Total errors01140%0.07% (s21)
0%0.04%
Features 7/10Kappa6270.66% (s52)0.65% (s12)
0.29%0.28%
Type I errors7351.71% (s41)1.09% (s22)
0.43%0.56%
Type II errors4292.04% (s24)1.45% (s41)
1.06%0.58%
Total errors6270.15% (s52)0.32% (s12)
0.08%0.11%
Features 7/10/6Kappa7172.24% (s41)1.33% (s11)
0.60%0.50%
Type I errors9244.27% (s41)1.37% (s22)
0.64%0.76%
Type II errors6092.04% (s24)2.02% (s41)
0.74%0.74%
Total errors7171.12% (s41)0.59% (s11)
0.23%0.21%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, Z.; Zhao, L.; Chen, Q.; He, Z.; Bi, S.; Xu, X. Feature Selection Based on Height Mutual Information in Airborne LiDAR Filtering. Remote Sens. 2026, 18, 1031. https://doi.org/10.3390/rs18071031

AMA Style

Cai Z, Zhao L, Chen Q, He Z, Bi S, Xu X. Feature Selection Based on Height Mutual Information in Airborne LiDAR Filtering. Remote Sensing. 2026; 18(7):1031. https://doi.org/10.3390/rs18071031

Chicago/Turabian Style

Cai, Zhan, Luying Zhao, Qiuli Chen, Zhijun He, Shaoyun Bi, and Xiaolong Xu. 2026. "Feature Selection Based on Height Mutual Information in Airborne LiDAR Filtering" Remote Sensing 18, no. 7: 1031. https://doi.org/10.3390/rs18071031

APA Style

Cai, Z., Zhao, L., Chen, Q., He, Z., Bi, S., & Xu, X. (2026). Feature Selection Based on Height Mutual Information in Airborne LiDAR Filtering. Remote Sensing, 18(7), 1031. https://doi.org/10.3390/rs18071031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop