4.2. Ablation Study of the Normalized-Gradient-Entropy-Guided Dynamic Window
To verify the effectiveness of the normalized-gradient-entropy-guided adaptive sampling-window strategy, fixed-window ablation experiments were conducted. The comparison was performed only within the ERF fitting framework. Except for the sampling-window selection strategy, the Canny initialization, gradient-direction estimation, gray-level sampling, parameter initialization, and nonlinear fitting process were kept identical. This design isolates the effect of adaptive window selection from other algorithmic factors.
Fixed ERF fitting windows with 5, 7, 9, and 11 sampling points were compared with the proposed normalized-gradient-entropy-guided dynamic window strategy. For the dynamic-window method, normalized gradient entropy was first calculated within an initial 11-point window at each Canny edge point. The final ERF fitting window was then determined according to the calibrated thresholds in
Section 3. To analyze the influence of ERF parameter optimization, both two-parameter and four-parameter ERF fitting forms were tested. In the two-parameter ERF form, the gray-level amplitude and offset were initialized from local extrema and then fixed; only the subpixel offset and blur-scale parameter were optimized. In the four-parameter ERF form,
A,
B,
, and
were optimized simultaneously. The following ten methods were compared: ERF2P-fixed5, ERF2P-fixed7, ERF2P-fixed9, ERF2P-fixed11, ProposedDynamicERF_2P, ERF4P-fixed5, ERF4P-fixed7, ERF4P-fixed9, ERF4P-fixed11, and ProposedDynamicERF_4P.
4.2.1. Dynamic Window Selection Results
Table 3 lists the selection ratios of sampling points for the dynamic window strategy in the four synthetic scenarios. The statistics are computed over Canny candidate edge points participating in subpixel localization.
Table 3 and
Figure 4 show that the dynamic window selection ratios differ considerably among scenarios. In the Ideal scenario, all four window sizes are selected with non-negligible proportions, and the 11-point window accounts for 41.185%, indicating that the sampling window changes with the local transition width under different blur levels. In the Slope and Texture scenarios, the 9-point and 11-point windows have relatively high proportions. Specifically, the combined proportions of the 9-point and 11-point windows are 67.855% and 68.210% in the Slope and Texture scenarios, respectively. This suggests that background gray-level slopes and local texture disturbances disperse the gradient distribution around edges and that larger windows help cover the complete transition region. In the Asymmetric scenario, the 7-point and 9-point windows dominate, accounting for 80.586% in total, while the 11-point window accounts for only 11.065%. This indicates that an excessively large window is not always beneficial for asymmetric transitions; instead, a medium-sized window can better balance effective transition coverage and redundant-sample suppression.
These results demonstrate that the normalized-gradient-entropy-guided dynamic window does not simply select a fixed sampling-point number. Rather, it adjusts the ERF fitting window according to the local edge condition.
4.2.2. Sensitivity Analysis of Entropy Thresholds
To further evaluate the stability of the entropy thresholds used for switching among the 5-, 7-, 9-, and 11-point fitting windows, an additive threshold sensitivity analysis was conducted. Since the normalized gradient entropy is bounded within
, additive perturbations were used instead of proportional scaling to avoid invalid threshold values near the upper bound. Let
,
, and
denote the calibrated entropy thresholds. The perturbed thresholds were defined as
where
. For each perturbed threshold setting, the proposed dynamic-window ERF method was re-evaluated using the same synthetic test samples. The MAE, RMSE, standard deviation, relative RMSE change, and selected-window distribution were calculated. The relative RMSE change was computed with respect to the calibrated thresholds at
:
The results in
Table 4 and
Figure 5 show that the calibrated thresholds achieve the lowest RMSE of 0.13501 pixel. When the threshold offset varies from
to
, the maximum relative RMSE change is 4.81%. Within the range of
, the maximum relative RMSE change is only 3.83%. These results indicate that the proposed entropy-guided window selection strategy is not overly sensitive to moderate additive perturbations of the entropy thresholds.
An asymmetric trend can also be observed. Positive threshold offsets lead to a faster increase in RMSE than negative offsets. This is because increasing the thresholds makes the method less likely to select larger fitting windows. As a result, some blurred edges that originally require 9- or 11-point windows may be assigned to smaller windows, leading to insufficient coverage of the gray-level transition and increased fitting bias. In contrast, decreasing the thresholds tends to select slightly larger windows. Although larger windows may introduce additional background or noise, they still preserve the complete edge-transition region, and therefore cause only limited degradation within the tested perturbation range.
It should be noted that the entropy thresholds are calibrated parameters rather than universal constants. Their transferability depends on whether the local gradient-entropy distribution remains similar under different imaging conditions. Changes in camera resolution, lens point-spread function, defocus, or motion blur may alter the edge-transition width in pixel units and thus shift the distribution of . In general, stronger optical blur or defocus tends to produce more spatially dispersed gradients, resulting in higher values and a higher probability of selecting larger fitting windows. Illumination and image contrast can also affect the reliability of the entropy measurement. Under sufficient illumination and high contrast, the normalized gradient distribution is mainly determined by the edge transition itself, so the calibrated thresholds are expected to remain relatively stable. In contrast, low illumination, low contrast, high sensor gain, or strong image noise may introduce additional gradient fluctuations, which can increase the measured entropy and reduce the reliability of the original thresholds. Object type and surface material may have similar effects: matte high-contrast targets usually produce stable edge profiles, whereas reflective, textured, or low-contrast surfaces may change the local gradient distribution and require threshold recalibration. Therefore, when the camera, lens, illumination condition, image contrast, or target material changes substantially, recalibration using the same procedure is recommended.
4.2.3. Comparison Between Fixed and Dynamic Windows
To further verify whether the dynamic window strategy improves localization accuracy, it was compared with fixed sampling windows. The overall results are shown in
Table 5.
For the two-parameter ERF methods, ProposedDynamicERF_2P achieved a mean RMSE of 0.16436 pixel, which is approximately 5.23% lower than the best fixed-window method. Its mean MAE was 0.14114 pixel, also lower than those of all fixed two-parameter ERF methods. This indicates that, in the ERF fitting framework with fixed A and B, the dynamic window effectively reduces localization errors caused by fixed-window mismatch. For the four-parameter ERF methods, ERF4P-fixed11 achieved the best fixed-window RMSE of 0.15604 pixel, whereas ProposedDynamicERF_4P achieved a mean RMSE of 0.14646 pixel, corresponding to an improvement of about 6.14%. Its mean MAE was also the lowest among the four-parameter methods. Notably, the average computation time of the dynamic methods was lower than that of the maximum fixed window. This shows that the accuracy improvement is not achieved by always selecting the largest window, but by choosing an appropriate sampling range according to local edge conditions.
4.3. Accuracy and Efficiency Comparison with Other Subpixel Edge Localization Methods
To further evaluate the comprehensive performance of the proposed method, it was compared with several representative subpixel edge localization methods. The comparison methods include the pixel-level Canny baseline, the Canny–Devernay method based on interpolated non-maximum suppression [
23], moment-based methods, the partial-area-effect model, curve-fitting methods, and a recent stable-region-based subpixel localization baseline.
Different original methods often include their own preprocessing operations, pixel-level edge extraction strategies, threshold settings, and edge-selection rules. Directly comparing full systems may confound the effects of coarse localization, edge-point quantity, preprocessing strength, and subpixel correction models. To ensure fairness, a unified experimental framework was adopted. First, the same Canny detector was applied to each synthetic image to obtain pixel-level candidate edge points. Then, different subpixel models were used to refine the same candidate points. Therefore, this experiment compares the localization capability of different subpixel correction models under unified Canny candidates, rather than the complete edge detection systems proposed in the original literature.
The evaluated methods were as follows. CannyPixel directly uses integer-pixel edge points obtained by the Canny detector [
1]. Canny–Devernay refines the position by quadratic interpolation of the gradient-response peak along the gradient direction [
23]. GLM1984 follows the gray-level moment idea of Tabatabai and Mitchell and estimates edge position from the first three moments of a local gray-level sequence [
11]. Canny–Zernike2020 represents a Zernike-moment-based subpixel edge localization method [
14]. HagaraAEF2011 approximates the edge gray-level profile with an error function and estimates the subpixel position by parameter fitting [
13]. PAE2013 uses the partial area effect to model gray-level formation when an edge passes through a pixel region [
15]. GaussianFit2014 fits the gradient profile using a Gaussian function [
20]. ArctanFit2016 uses an arctangent edge model [
19], and SigmoidLogisticFit2023 uses a logistic function to describe gray-level transition [
24].
To further strengthen the benchmark, SER-CIS was added as a recent stable-region-based, region-adaptive subpixel localization baseline [
25]. SER-CIS introduces converted intensity summation and stable edge regions for robust subpixel edge localization. In this study, a SER-CIS-style benchmark implementation was constructed according to the CIS and stable-region parameter-estimation principles described in the original paper. The proposed methods include ProposedDynamicERF_2P and ProposedDynamicERF_4P, corresponding to two-parameter and four-parameter ERF fitting under the normalized-gradient-entropy-guided dynamic window. Deep-learning-based edge detectors, such as recent transformer-based edge detectors [
26,
27], were not directly included in the quantitative comparison because most of them are designed to predict pixel-level edge probability maps or semantic/structural object boundaries, whereas this study focuses on metric-level subpixel edge localization and its propagation to monocular ranging error. In addition, deep models usually require task-specific training data, and their output resolution, thresholding strategy, and post-processing procedure may introduce additional factors beyond the subpixel correction model itself. Therefore, this study focuses on classical and recent model-based subpixel localization methods under a unified Canny-candidate framework, while deep-learning-assisted subpixel localization will be considered in future work.
Table 6 reports the overall performance on all synthetic edge images. In addition to MAE and RMSE, the standard deviation (SD), median error, interquartile range (IQR), and maximum error were reported to provide a more complete statistical evaluation of localization accuracy, variation, and reliability. The computation time represents the average post-Canny time required for subpixel correction of one edge point. The computation times in
Table 5 and
Table 6 were recorded in different experimental batches and are therefore used only as relative indicators within each table, rather than for direct cross-table comparison.
As shown in
Table 6, the pixel-level CannyPixel method produced an overall RMSE of 0.41897 pixel, which is much larger than those of the subpixel methods. This confirms that integer-pixel edge localization is insufficient for high-precision edge measurement. Canny–Devernay improved the localization accuracy through gradient-peak interpolation, but its RMSE remained 0.30547 pixel, indicating that local peak interpolation is vulnerable to noise, blur, and gray-level disturbance in complex edge conditions.
Among the moment-based and analytical methods, Canny–Zernike2020, GLM1984, and PAE2013 produced RMSE values of 0.20864, 0.22587, and 0.17720 pixel, respectively. PAE2013 outperformed the other two, indicating that the partial-area-effect edge-formation model has relatively good adaptability under the synthetic conditions used in this study. However, these methods still depend on local edge orientation, gray-level distribution, and model assumptions, and may degrade under texture interference and asymmetric transitions.
Fitting-based methods generally performed well. ArctanFit2016 and SigmoidLogisticFit2023 produced RMSE values of 0.17044 and 0.17141 pixel, respectively. HagaraAEF2011 achieved an RMSE of 0.17334 pixel, showing that ERF-based fitting is effective for blurred step-edge localization. SER-CIS, as a recent stable-region-based subpixel localization baseline, achieved an RMSE of 0.22030 pixel. Its maximum error was relatively small, which may be partly related to the rejection or suppression of unstable edge regions by the stable-region selection mechanism. However, its MAE, RMSE, SD, median error, and IQR were higher than those of the proposed four-parameter dynamic-window ERF method.
ProposedDynamicERF_4P achieved the lowest MAE and RMSE among all compared methods, namely 0.12368 pixel and 0.16312 pixel. It also achieved the lowest SD, median error, and IQR, indicating that its localization errors were more concentrated and less affected by large fluctuations. Compared with SER-CIS, ArctanFit2016, SigmoidLogisticFit2023, PAE2013, and HagaraAEF2011, the RMSE was reduced by approximately 25.96%, 4.29%, 4.84%, 7.95%, and 5.90%, respectively. Although ProposedDynamicERF_4P did not achieve the smallest maximum error, the distribution-based metrics in
Table 6 and the box plots in
Figure 6 show that it provides better overall accuracy and reliability. ProposedDynamicERF_2P achieved an RMSE of 0.18067 pixel and an average time of 0.49560 ms, making it suitable as a lightweight solution for scenarios with stricter efficiency requirements.
To further examine whether the observed accuracy differences were statistically reliable, a Wilcoxon signed-rank test was performed on the paired sample-level MAE between ProposedDynamicERF_4P and each baseline method. Each synthetic image was treated as one paired sample, and the MAE of each method on the same synthetic image was used for paired comparison. The null hypothesis was that there was no difference in the median paired MAE between the two methods. A significance level of
was used. This non-parametric test was selected because the localization-error distributions were not assumed to be strictly Gaussian and contained outliers, as shown in
Figure 6.
Table 7 reports the statistical significance results. The proposed method showed statistically significant differences compared with all baseline methods at the
level. The differences were especially evident for the pixel-level, interpolation-based, moment-based, Gaussian-fitting, and stable-region-based methods. For strong fitting-based baselines such as HagaraAEF2011 and SigmoidLogisticFit2023, the median paired improvements were relatively small, but the paired tests still indicated statistically significant differences. These results suggest that the overall advantage of ProposedDynamicERF_4P is reflected in the paired sample-level error distributions rather than being caused by accidental fluctuations in a small number of samples.
Table 8 further reports the RMSE results under the four synthetic scenarios, and
Figure 7 visualizes the scene-wise comparison.
The per-scenario results show that different methods have distinct sensitivities to edge degradation. In the Ideal scenario, HagaraAEF2011 achieved the lowest RMSE of 0.13737 pixel because the edge profile was mainly determined by standard Gaussian blur and noise, and the gray-level transition was relatively regular. Therefore, fixed long-window AEF fitting could use the complete transition information effectively. SigmoidLogisticFit2023 and ProposedDynamicERF_4P also achieved low errors, with RMSE values of 0.13913 and 0.13923 pixel, respectively.
In the Slope scenario, HagaraAEF2011 remained highly accurate with an RMSE of 0.14329 pixel. The background gray-level slope is a low-frequency degradation and has limited influence on the overall shape of the standard blurred transition. Thus, fixed long-window fitting can still maintain good accuracy. ProposedDynamicERF_4P achieved an RMSE of 0.14504 pixel, which was slightly higher than those of HagaraAEF2011 and SigmoidLogisticFit2023 but remained comparable to other strong fitting-based methods.
In the Texture scenario, ProposedDynamicERF_4P achieved the lowest RMSE of 0.18656 pixel among the compared methods. Local texture introduces additional gradient responses around the edge, making gradient-peak interpolation, local moments, and fixed-window fitting more susceptible to interference. By characterizing the dispersion of the local gradient distribution and adjusting the window size accordingly, the proposed method covers the effective transition region while reducing the influence of texture disturbance on the fitted center. Compared with SigmoidLogisticFit2023, ArctanFit2016, and HagaraAEF2011, the RMSE reductions were approximately 0.95%, 3.22%, and 0.71%, respectively.
In the Asymmetric scenario, ProposedDynamicERF_4P again achieved the lowest RMSE of 0.18978 pixel. Compared with PAE2013, ArctanFit2016, SigmoidLogisticFit2023, HagaraAEF2011, and SER-CIS, the RMSE reductions were approximately 1.11%, 5.71%, 13.88%, 17.51%, and 43.98%, respectively. This result indicates that the proposed method has good adaptability to asymmetric gray-level transitions. Since the blur scales on the two sides of an asymmetric edge differ, a fixed long window may include unbalanced gray-level information and shift the fitted center. In contrast, the dynamic window actively adjusts the sampling range according to local gradient entropy, reducing model mismatch caused by an excessively large fixed window.
Overall, ProposedDynamicERF_4P achieved the best overall performance in terms of MAE, RMSE, SD, median error, and IQR over the complete synthetic test set. Although some fixed-window or curve-fitting methods performed slightly better under regular Ideal and Slope conditions, their errors increased under Texture or Asymmetric scenarios. The proposed method provides a more balanced performance across different edge conditions by adapting the ERF fitting window to local edge characteristics.
4.4. Real Monocular Distance Measurement Experiment and Result Analysis
To further verify the practical applicability of the proposed normalized-gradient-entropy-guided dynamic-window ERF subpixel edge localization method, a real monocular distance-measurement experiment was conducted. Unlike synthetic edge images, real images contain camera imaging noise, non-uniform illumination, lens distortion, edge blur, and background interference. Therefore, they can more directly reflect the stability and applicability of the algorithm in a practical visual measurement system. Monocular visual ranging usually estimates the target distance according to the geometric relationship between the physical target size and its image size. This type of method has a simple structure and low cost and has been used in target distance-measurement applications [
28]. Recent studies on monocular machine vision and subpixel visual measurement for mechanical-part measurement also show that low-cost visual measurement systems have promising engineering potential in dimensional measurement [
8,
29]. Hu et al. proposed a cubic-spline-interpolation-based subpixel edge detection method for O-ring dimensional measurement, indicating that subpixel edge localization can reduce the accuracy limitations of conventional integer-pixel methods in dimensional measurement [
30]. Ye et al. constructed a mobile-vision-based measurement system for precision measurement of flat-screen gaps and combined image processing with coordinate transformation to complete planar target dimension measurement, further demonstrating the application value of visual edge extraction in industrial dimensional measurement [
31]. Therefore, a black square target was first used as the standard measurement object to evaluate the influence of subpixel edge localization accuracy on monocular ranging. Additional real-image robustness experiments with different imaging conditions and target appearances were further conducted in
Section 4.4.6 and
Section 4.4.7.
4.4.1. Experimental Platform and Camera Calibration
The real distance-measurement platform mainly consisted of a monocular camera, a fixed-focus lens, a black target, a supporting platform, and a laser rangefinder. The monocular camera was used to capture target images at different distances, whereas the laser rangefinder provided the reference distance between the camera and the target for distance-error calculation. During image acquisition, the camera position was fixed, the target plane was kept approximately parallel to the camera imaging plane, and the images were captured under relatively stable indoor illumination to reduce the influence of environmental variations on edge extraction.
Before the distance-measurement experiment, camera calibration was performed to obtain the intrinsic parameters and distortion coefficients. Zhang’s calibration method was adopted because it estimates the camera intrinsic matrix and distortion coefficients from several images of a planar checkerboard target at different poses and is convenient and accurate in practice [
32]. After calibration, the captured target images were undistorted to reduce the influence of lens distortion on edge position and image-size calculation.
The intrinsic matrix obtained by camera calibration can be expressed as
where
and
denote the equivalent focal lengths in the horizontal and vertical directions, respectively, and
and
are the principal-point coordinates. The equivalent focal length obtained from calibration was used in the subsequent distance calculation. Because the real ranging experiment in this study focuses on comparing the relative performance of different edge localization methods under the same visual ranging model, all methods used the same camera intrinsic parameters, distortion-correction results, and distance-measurement model.
4.4.2. Monocular Distance Model and Edge-Size Extraction
A black target with a known physical size was used in the experiment. Let the physical side length of the target be
L, and let its image width be
l pixels. When the target plane is approximately parallel to the camera imaging plane, the pinhole camera model gives
where
Z is the distance between the camera and the target,
is the equivalent focal length in the horizontal direction,
L is the physical target size, and
l is the target width in pixels. Equation (
33) shows that the estimation error of the image size
l directly affects the distance result. As the target distance increases, the target occupies fewer pixels in the image, and the same edge localization error leads to a more pronounced ranging deviation. Therefore, improving edge localization accuracy is important for medium- and long-distance monocular ranging.
During image-size extraction, the Canny operator was first used to obtain pixel-level target-edge candidates. Different edge localization methods were then applied to refine the candidate edge positions and obtain the subpixel positions of the left and right target boundaries. Let the average subpixel positions of the left and right edges be
and
, respectively. The horizontal image width of the target can be expressed as
In this experiment, the extracted horizontal target width was substituted into Equation (
33) for distance calculation.
4.4.3. Experimental Settings and Evaluation Metrics
The real monocular distance-measurement experiment was conducted using the calibrated camera system and the prepared target images. The reference distance of each image was measured using a laser rangefinder. For each image, the target image size was extracted using different edge localization methods, and the monocular distance was then calculated using the same ranging model. To ensure a fair comparison, all methods used the same input images, the same distortion-correction procedure, the same target-size-to-distance model, and the same reference-distance annotations.
The standard real ranging experiment used a square target with a known physical side length of 13 cm. The target was placed at different distances within the range of 50–300 cm. For each method, the target image size was estimated from the localized target edges, and the corresponding distance was calculated. The calibrated focal length obtained from camera calibration was directly used in the monocular ranging model. For each test image, the target image size was first extracted using the corresponding edge localization method, and the distance was then calculated by
where
is the estimated distance of the
i-th image,
is the calibrated focal length in the horizontal direction,
S is the physical size of the measured target dimension, and
is the corresponding target image size in pixels. For square targets,
and
denotes the estimated side length. For the circular target,
and
denotes the fitted circle diameter.
To further evaluate the robustness of the proposed method under more practical imaging conditions, two additional real-image experiments were conducted. The first experiment evaluated the influence of different SNR-related imaging conditions. In this experiment, the exposure setting was kept fixed, while the illumination intensity and camera gain were adjusted to form four imaging-quality groups. Since the actual SNR value was not directly measured, these groups are referred to as SNR-related imaging-condition groups rather than absolute SNR levels. The second experiment evaluated the influence of target diversity by using targets with different shapes, gray-level contrasts, and surface materials. These additional experiments were used to examine whether the proposed dynamic-window ERF method can maintain stable edge-size extraction under variations in image quality and target appearance.
For all additional experiments, the same camera, lens, image resolution, distortion-correction procedure, edge-size extraction pipeline, and monocular ranging model were used. The exposure setting, focus state, and camera installation geometry were kept unchanged during each group of experiments. The acquisition distances covered 50–300 cm and were divided into five distance intervals: 50–100 cm, 100–150 cm, 150–200 cm, 200–250 cm, and 250–300 cm. For each experimental group, 10 images were collected in each distance interval, resulting in 50 images per group. The compared methods were selected as representative methods, including the pixel-level Canny baseline, a fixed-window ERF fitting method, SER-CIS, and the proposed dynamic-window ERF method. The real-image acquisition settings for robustness evaluation are summarized in
Table 9.
The distance sampling scheme used in the additional real-image experiments is summarized in
Table 10. This interval-based acquisition was used to evaluate whether the ranging errors remain stable over near-, medium-, and long-distance conditions.
The SNR-related imaging-condition groups are listed in
Table 11. G1 was acquired under strong illumination and low camera gain and therefore corresponds to the highest image quality among the tested groups. From G1 to G4, the illumination intensity was gradually reduced and the camera gain was increased to maintain target visibility. This setting introduces stronger noise amplification and lower edge-image quality, thereby providing a practical test of the robustness of different edge localization methods under degraded imaging conditions.
The target-diversity groups are listed in
Table 12. These groups were designed to evaluate the influence of target appearance changes, including shape, gray-level contrast, and surface reflectance, on edge-size extraction and monocular ranging accuracy. The square targets had a physical side length of 13 cm, and the circular target had a physical diameter of 13 cm. For square targets, the image size was defined as the estimated side length. For the circular target, the image size was defined as the fitted circle diameter. To avoid bias caused by the different geometric definitions of square side length and circular diameter, the corresponding physical target size was used in the same focal-length-based ranging model: the square side length was used for square targets, and the circle diameter was used for the circular target.
For each image, the reference distance
was measured by the laser rangefinder, and the estimated distance
was calculated using the monocular ranging model. The signed ranging error and absolute ranging error were defined as
Based on the errors of all valid images, the mean absolute error (MAE), root mean square error (RMSE), standard deviation (SD), median error, interquartile range (IQR), maximum error, and mean relative error were calculated for each method:
where
denotes the mean absolute error over all valid images. The SD and IQR were used to characterize the fluctuation and concentration of the absolute-error distribution, while the median error was reported to reduce the influence of extreme values.
4.4.4. Qualitative Visual Analysis on Real Images
To provide a more intuitive evaluation of the proposed edge-localization procedure on real images, this subsection presents representative qualitative visual examples. In addition to the quantitative ranging errors reported in the following subsections, the visual results are used to show how edge fragments are extracted from real target images and how the proposed subpixel refinement improves the localization of the target boundary. Two representative target shapes, namely the black matte square target and the black matte circular target, are selected for visualization because they contain straight and curved boundaries, respectively.
Figure 8 shows the edge-localization process and method comparison on real images. For each target shape, six visual results are presented, including the original ROI, the cropped local edge region, the pixel-level Canny edge fragments, the fixed-window ERF subpixel edge points, the SER-CIS subpixel edge points, and the proposed dynamic-window ERF subpixel edge points.
The qualitative results in
Figure 8 show that the proposed method can provide stable edge localization on real images with different boundary geometries. In the square-target example, the cropped edge region contains a visible gray-level transition band between the bright background and the dark target. The fixed-window ERF points are slightly shifted toward the dark target side, whereas the SER-CIS points are slightly closer to the bright background side. In comparison, the proposed dynamic-window ERF points are located near the center of the transition band and show better consistency with the apparent target boundary. This suggests that the entropy-guided dynamic window can reduce the localization bias caused by an unsuitable fixed sampling range. In addition, the square target mainly contains straight edge segments, and the refined edge points follow the local edge direction more consistently than the pixel-level edge fragments. For the circular target, the boundary contains continuous curvature, and the refined edge points still maintain a stable distribution along the curved edge. These observations indicate that the proposed dynamic-window ERF refinement is applicable to both straight and curved target boundaries.
The pixel-level edge fragments provide only a coarse description of the target boundary, and local discontinuities or fluctuations may directly affect the estimated target size. In contrast, the subpixel-refined edge points better describe the actual edge transition and provide a more reliable basis for subsequent line fitting, circle fitting, and edge-size extraction. This visual observation is consistent with the quantitative results in the following distance-measurement analysis, where subpixel edge-localization methods achieve lower ranging errors than the pixel-level CannyPixel method. Therefore, the added qualitative examples further support the applicability of the proposed method to real monocular distance measurement. It should be noted that the visual differences among subpixel methods may not be very prominent in
Figure 8, because the localized edge points are represented by small red markers and the differences among subpixel methods are often below the pixel scale in clean edge regions. Therefore, the qualitative examples are mainly used to illustrate the edge-localization process, the continuity of the refined edge points, and the difference between pixel-level edge fragments and subpixel-refined edge points. The detailed performance differences among fixed-window ERF, SER-CIS, and the proposed method are further evaluated using the quantitative error statistics in the following subsections.
4.4.5. Distance-Measurement Results and Analysis
Table 13 reports the overall error statistics of different edge localization methods in the real monocular distance-measurement experiment. In addition to MAE and RMSE, the standard deviation (SD), median error, interquartile range (IQR), and maximum error were also reported to provide a more comprehensive evaluation of ranging accuracy, error variation, and reliability. The pixel-level CannyPixel method achieved an MAE of 4.694 cm, an RMSE of 7.500 cm, and a mean relative error of 2.291%, which were considerably larger than those of the subpixel edge localization methods. This indicates that when integer-pixel edge positions are used to calculate the target image size, edge quantization errors are directly propagated to the monocular ranging results, especially when the target image size is small.
Canny–Devernay achieved subpixel correction through gradient-response interpolation and improved the ranging result compared with pixel-level Canny. Its MAE decreased to 3.420 cm, and its RMSE decreased to 5.678 cm. However, this method mainly depends on the local gradient peak position. When edge blur, non-uniform gray-level transitions, or local noise perturbations are present in real images, the gradient peak may shift, and the overall ranging error remains relatively large.
In contrast, HagaraAEF2011, SigmoidLogisticFit2023, SER-CIS, and ProposedDynamicERF_4P use gray-level transition or stable-region information for subpixel localization and can reduce integer-pixel quantization errors to some extent. HagaraAEF2011 achieved an MAE of 1.034 cm and an RMSE of 1.523 cm, while SigmoidLogisticFit2023 achieved an MAE of 1.056 cm and an RMSE of 1.545 cm. SER-CIS, as a recent stable-region-based subpixel localization baseline, achieved an MAE of 1.267 cm and an RMSE of 1.833 cm. ProposedDynamicERF_4P achieved the lowest MAE, RMSE, SD, median error, maximum error, and mean relative error, namely 0.976 cm, 1.475 cm, 1.119 cm, 0.394 cm, 4.291 cm, and 0.504%, respectively. These results indicate that the proposed method can extract the target edge size more stably in real monocular ranging tasks, thereby reducing the overall distance-measurement error.
It should be noted that real distance-measurement errors are not determined only by edge localization accuracy. They may also be affected by camera calibration error, imperfect parallelism between the target plane and the camera imaging plane, and the offset between the laser rangefinder reference point and the camera optical center. Therefore, the real experiment in this study is mainly used to compare the relative performance of different edge localization methods under the same acquisition conditions, the same distortion-correction results, and the same distance-measurement model.
Table 13 further shows that, compared with CannyPixel, Canny–Devernay, HagaraAEF2011, SigmoidLogisticFit2023, and SER-CIS, ProposedDynamicERF_4P reduced the MAE by approximately 79.21%, 71.48%, 5.63%, 7.65%, and 22.97%, respectively. In terms of RMSE, the corresponding reductions were approximately 80.33%, 74.02%, 3.14%, 4.51%, and 19.53%, respectively. These results demonstrate that the proposed method has a clear advantage over pixel-level localization and gradient-interpolation-based subpixel correction, while also providing a certain improvement over conventional gray-level fitting-based and stable-region-based subpixel methods.
Figure 9 further illustrates the distribution of absolute ranging errors. The proposed method shows the lowest median error and a relatively compact error distribution. Although SigmoidLogisticFit2023 obtained a slightly smaller IQR, the proposed method achieved the best overall balance among MAE, RMSE, SD, median error, maximum error, and mean relative error. This indicates that the normalized-gradient-entropy-guided dynamic-window ERF method provides improved overall accuracy and reliability in real monocular ranging.
To further analyze the variation of distance-measurement error over different ranges,
Table 14 lists the mean absolute errors with standard deviation in different distance intervals. The interval-based statistics are used mainly to observe the error trend with distance, whereas the overall indicators in
Table 13 are used to evaluate the comprehensive ranging performance over all samples. Because image quality, edge blur, local illumination, and target image size may differ among distance ranges in real experiments, local fluctuations may occur for different fitting models in individual intervals. Therefore, this study does not use the best result in a single interval as the only evaluation criterion; instead, the overall indicators and interval-based trends are considered together. The valid test images were divided into five distance intervals according to their reference distances.
As shown in
Table 14, the mean absolute errors of all methods generally increased as the reference distance increased. This phenomenon is consistent with the error-propagation characteristic of the monocular distance model. According to
, as the target distance increases, the target image size
l decreases, and the same magnitude of edge localization error produces a larger distance error. It should be emphasized that the distance intervals in
Table 14 and
Table 15 are used only to summarize the error trend over different ranges and do not indicate that the images were acquired at fixed distance intervals. In the distance ranges above 200 cm, the errors of CannyPixel and Canny–Devernay increased substantially, indicating that pixel-level localization and local-gradient-interpolation-based subpixel correction are less stable when the target occupies fewer pixels at longer distances.
By contrast, the errors of HagaraAEF2011, SigmoidLogisticFit2023, SER-CIS, and ProposedDynamicERF_4P increased more slowly, suggesting that subpixel methods based on gray-level transition or stable-region information can more effectively reduce edge-position quantization error. In the ≤100–200 cm range, ProposedDynamicERF_4P achieved the lowest mean absolute error in all three distance intervals, indicating that dynamic-window ERF fitting can extract the target edge size more accurately under near- and medium-distance conditions. In the 200–300 cm range, the proposed method produced errors close to those of HagaraAEF2011 and SigmoidLogisticFit2023 and remained lower than SER-CIS. Although it was not the best in every individual long-distance interval, it remained substantially better than CannyPixel and Canny–Devernay. This result indicates that the advantage of the proposed method is mainly reflected in overall error control and stable performance over most distance intervals, rather than in a local optimum in a single interval.
Figure 10 further shows the trend of ranging RMSE with standard deviation over different reference-distance intervals. The error curves of CannyPixel and Canny–Devernay increase rapidly with distance, especially beyond 200 cm. In comparison, the gray-level-model-based subpixel fitting methods and SER-CIS show flatter error curves. ProposedDynamicERF_4P maintains the lowest error in the near- and medium-distance ranges and remains close to the other fitting-based methods beyond 200 cm. This suggests that the normalized-gradient-entropy-guided dynamic window can adjust the sampling range according to local edge-transition characteristics in real images, improving the adaptability of ERF fitting under different distance conditions.
In addition to absolute error, relative error reflects the proportion of the ranging error with respect to the reference distance and is useful for comparing error stability under different distance conditions.
Table 15 reports the mean relative errors in different distance intervals.
As shown in
Table 15, the proposed method achieved the lowest mean relative error in the ≤100–200 cm range, indicating that it reduced not only the absolute error but also the error proportion relative to the reference distance in the near- and medium-distance ranges. In the 200–300 cm range, the mean relative error of ProposedDynamicERF_4P was close to those of HagaraAEF2011 and SigmoidLogisticFit2023 and was substantially lower than those of CannyPixel and Canny–Devernay. It was also lower than SER-CIS in all distance intervals. This indicates that gray-level-model-based fitting methods generally have better error stability when the target image size decreases at longer distances, while the proposed method still achieved the lowest mean relative error in the overall statistics.
Combining
Table 13,
Table 14 and
Table 15, ProposedDynamicERF_4P achieved the lowest MAE, RMSE, SD, median error, maximum error, and mean relative error over all real ranging samples, indicating the lowest comprehensive distance-measurement error in an overall statistical sense. The interval-based results further show that the proposed method has clear advantages in the near- and medium-distance ranges and remains close to other fitting-based subpixel methods in the long-distance range, while clearly outperforming pixel-level Canny and Canny–Devernay. These results demonstrate that the subpixel localization advantage obtained in synthetic edge experiments can be effectively translated into reduced distance-measurement error in real monocular ranging.
In summary, the real monocular distance-measurement experiment further verifies the practical application value of the proposed method. Compared with pixel-level edge localization, gradient-interpolation-based correction, conventional gray-level fitting methods, and the stable-region-based SER-CIS baseline, the proposed normalized-gradient-entropy-guided dynamic-window ERF fitting method can adaptively adjust the sampling range according to local edge-transition characteristics in real images, thereby improving the stability and accuracy of target edge-size extraction. The results indicate that the proposed method is not only accurate and robust in synthetic edge localization tasks but also has potential for application in low-cost monocular visual ranging systems.
4.4.6. Robustness Analysis Under Different SNR-Related Imaging Conditions
To evaluate the robustness of different methods under varying image quality, additional real-image experiments were conducted under four SNR-related imaging conditions. The imaging conditions were adjusted by changing illumination intensity and camera gain, while the camera position, focus, exposure setting, target type, target distance, and monocular ranging model were kept unchanged. This setting allows the influence of image quality and noise level on edge-size extraction to be analyzed independently.
Table 16 reports the ranging error statistics under different SNR-related imaging conditions. The MAE, RMSE, SD, median error, IQR, and maximum error were calculated for each method in each group. The SD and IQR were used to characterize the error fluctuation and distribution concentration under each imaging condition. In the additional real-image experiments, Fixed-window ERF denotes the fixed 11-point four-parameter ERF fitting method.
Figure 11 further illustrates the RMSE variation of different methods under the four SNR-related imaging conditions.
As shown in
Table 16 and
Figure 11, the ranging errors of all methods are affected by the change in SNR-related imaging conditions. The pixel-level Canny method shows the largest RMSE values in all groups, with RMSE values of 6.058 cm, 6.389 cm, 7.256 cm, and 8.477 cm from G1 to G4, respectively. Its SD and maximum errors are also relatively large, indicating that integer-pixel edge localization is unstable when gray-level fluctuation and edge degradation are present. Since the monocular ranging model directly depends on the estimated target image size, pixel-level edge quantization and edge-detection fluctuations can be amplified into large ranging errors.
Compared with CannyPixel, the subpixel methods significantly reduce the ranging error under all SNR-related imaging conditions. In G1, Fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P achieve RMSE values of 1.655 cm, 1.760 cm, and 1.354 cm, respectively. ProposedDynamicERF_4P also obtains the lowest MAE, RMSE, SD, median error, IQR, and maximum error in this group, indicating that the proposed dynamic-window ERF fitting provides more accurate and concentrated edge-size estimation under high-quality imaging conditions.
As the imaging condition changes from G1 to G4, the errors of all methods generally increase. The RMSE of ProposedDynamicERF_4P increases from 1.354 cm in G1 to 2.166 cm in G4, while the RMSE of Fixed-window ERF increases from 1.655 cm to 2.854 cm, and that of SER-CIS increases from 1.760 cm to 2.205 cm. This trend shows that image-quality degradation affects all subpixel edge localization methods, but the proposed method maintains the lowest RMSE in all four groups. The results suggest that the normalized-gradient-entropy-guided window selection can adapt to changes in the local edge-transition profile and reduce the influence of gray-level fluctuation on the fitted edge position.
In G2 and G3, ProposedDynamicERF_4P achieves RMSE values of 1.514 cm and 1.689 cm, respectively, which are lower than those of CannyPixel, Fixed-window ERF, and SER-CIS. This indicates that the proposed method remains robust under moderate SNR-related image degradation. In contrast, Fixed-window ERF uses a fixed sampling window, and its performance becomes more sensitive when the local edge-transition width and gray-level distribution change. SER-CIS uses stable edge regions to suppress local interference, and its error growth is relatively moderate; however, its RMSE values remain slightly higher than those of the proposed method in all groups.
In the most degraded condition G4, all methods show increased errors. The RMSE values of CannyPixel, Fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P are 8.477 cm, 2.854 cm, 2.205 cm, and 2.166 cm, respectively. Although the difference between SER-CIS and ProposedDynamicERF_4P becomes small in this group, the proposed method still achieves the lowest RMSE and remains substantially better than CannyPixel and Fixed-window ERF. This result indicates that the dynamic-window ERF strategy remains competitive even under low-quality imaging conditions, while the stable-region-based SER-CIS method also shows good robustness in the severely degraded case.
Overall, the SNR-related experiment shows that subpixel edge localization is necessary for stable monocular ranging under changing imaging quality. ProposedDynamicERF_4P achieves the lowest RMSE in G1–G4 and maintains a relatively compact error distribution, as reflected by its lower SD and IQR values. These results indicate that the proposed dynamic-window ERF strategy improves the robustness of edge-size extraction under practical SNR-related imaging variations.
4.4.7. Target Diversity Analysis Under Different Shapes and Materials
To further evaluate the applicability of the proposed method to different target appearances, additional real-image experiments were conducted using targets with different shapes, gray-level contrasts, and surface materials. In this experiment, the camera, focus, exposure setting, illumination/gain condition, distance-measurement model, and evaluation metrics were kept consistent. Only the target appearance was changed. This experiment was designed to examine whether the proposed method can maintain stable edge localization and target-size extraction when the edge contrast, surface reflectance, or target geometry changes. The four target groups are defined as follows: T1 denotes the black matte square target, T2 denotes the black matte circular target, T3 denotes the gray matte square target, and T4 denotes the glossy black square target.
Table 17 reports the ranging error statistics for different target types. The compared methods include CannyPixel, fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P. The target-diversity experiment focuses on representative baselines rather than all methods, because its purpose is to evaluate the robustness of the proposed dynamic-window strategy under target appearance changes.
Figure 12 further illustrates the RMSE variation of different methods under the four target types.
As shown in
Table 17 and
Figure 12, target appearance has a clear influence on ranging accuracy. The pixel-level CannyPixel method shows the largest error in all target groups. Its RMSE values are 6.063 cm, 6.545 cm, 6.850 cm, and 9.656 cm for T1–T4, respectively. The maximum error also increases from 14.234 cm for the black matte square target to 34.745 cm for the glossy black square target. This indicates that integer-pixel edge extraction is highly sensitive to weak edge contrast, local edge discontinuity, and specular reflection. Since the monocular ranging model is inversely related to the extracted target image size, even a small error in pixel-level edge localization can be amplified into a larger distance error, especially when the target becomes smaller at longer distances.
Compared with CannyPixel, the three subpixel methods significantly reduce the ranging error under all target types. For the black matte square target T1, the RMSE values of Fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P are 1.738 cm, 1.746 cm, and 1.496 cm, respectively. The proposed method also obtains the lowest MAE, median error, IQR, and maximum error in this group. This result is consistent with the characteristics of T1: the black matte square target provides a high-contrast and weak-reflection edge, and its straight boundaries produce relatively stable one-dimensional gray-level transitions along the edge-normal direction. Under this condition, the normalized-gradient-entropy-guided dynamic window can select a suitable fitting range and reduce the influence of unnecessary background or target-side samples on the ERF fitting result.
For the black matte circular target T2, SER-CIS and ProposedDynamicERF_4P show close performance. SER-CIS achieves an RMSE of 1.576 cm, while ProposedDynamicERF_4P achieves an RMSE of 1.674 cm. Although the proposed method is slightly higher in RMSE in this group, the difference is small. This behavior is reasonable because the circular target uses edge points distributed around the full circumference for diameter estimation. The circle-fitting process can average local edge localization errors over many radial directions, which benefits both SER-CIS and the proposed method. In contrast to square targets, where the final size is determined by four fitted side lines, the circular target has stronger global averaging in the geometric fitting stage. Therefore, the advantage of the dynamic ERF window is less pronounced for T2 than for T1.
For the gray matte square target T3, all methods show increased errors compared with T1. The RMSE of CannyPixel increases to 6.850 cm, and the RMSE values of Fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P are 2.135 cm, 1.983 cm, and 1.888 cm, respectively. The degradation is mainly caused by the lower gray-level contrast between the target and the background. A weaker edge contrast reduces the gradient magnitude around the boundary and makes the edge-transition region more vulnerable to image noise and local gray-level fluctuation. Nevertheless, ProposedDynamicERF_4P still achieves the lowest RMSE in T3. This suggests that the entropy-guided window selection can partly compensate for the instability caused by weak edge transitions by adapting the ERF fitting window to the local gradient distribution.
The most challenging case is the glossy black square target T4. In this group, the RMSE values of CannyPixel, Fixed-window ERF, SER-CIS, and ProposedDynamicERF_4P are 9.656 cm, 2.935 cm, 2.294 cm, and 2.355 cm, respectively. The error increase is mainly caused by specular reflection and non-uniform surface brightness. These effects may introduce additional local gradients inside or near the target boundary and make the edge gray-level transition deviate from an ideal monotonic ERF profile. As a result, both pixel-level edge detection and intensity-based subpixel localization become more difficult. Compared with the fixed-window ERF method, SER-CIS and ProposedDynamicERF_4P are more stable in this group. SER-CIS obtains a slightly lower RMSE than the proposed method, while the proposed method remains very close and achieves a comparable error distribution. This indicates that the proposed method is not always the lowest for every individual appearance condition, but it maintains competitive robustness under the most difficult reflective case.
Overall, the target-diversity experiment confirms that subpixel edge localization is necessary for stable monocular ranging under changes in target geometry, gray-level contrast, and surface material. The proposed method achieves the lowest RMSE for T1 and T3 and comparable performance to SER-CIS for T2 and T4. Across all target types, ProposedDynamicERF_4P provides better overall accuracy than CannyPixel and fixed-window ERF, demonstrating that normalized-gradient-entropy-guided window selection improves the adaptability of ERF fitting under non-uniform edge-transition conditions. Meanwhile, the results also show that strong specular reflection remains a challenging factor, because it can alter the local gray-level profile and reduce the reliability of edge-size extraction even for subpixel methods.