Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Joint Modeling of Pixel-Wise Visibility and Fog Structure for Real-World Scene Understanding

Atmosphere 2025, 16(10), 1161; https://doi.org/10.3390/atmos16101161

by Jiayu Wu^1,2,†

, Jiaheng Li^1,†

, Jianqiang Wang³, Xuezhe Xu³, Sidan Du^1,* and Yang Li^1,2,*

Reviewer 1: Anonymous

Reviewer 2:

Francisco Javier Sánchez-Ruiz

Reviewer 3: Anonymous

Atmosphere 2025, 16(10), 1161; https://doi.org/10.3390/atmos16101161

Submission received: 10 September 2025 / Revised: 29 September 2025 / Accepted: 2 October 2025 / Published: 4 October 2025

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I have first some general question :

1) Have you assessed any sensitivity study to images color choice since it affects the results of the segmentation.

2) Have you made any comparison between monocular and stereo images using your framework?

3) What was you approach in sampling images, was there any equilibrium between foggy cases and clear ones?

4) How do you justify the choice of cameras emplacement?

Line 81-85: How the threshould selection was assessed for the confidence score?

Line 184-196: How you algorithm distinguish cloudy back ground with no fog from fog cases?

Line 200-209: How does the used formulas take into account the weather conditions?

Author Response

Comments 1: Have you assessed any sensitivity study to images color choice since it affects the results of the segmentation.

Response 1: Thank you for raising this important point. We appreciate the opportunity to clarify.

While we did not conduct a dedicated sensitivity study on image color variations, we would like to clarify that our training pipeline includes color jittering as part of the data augmentation strategy to improve robustness to color differences. Specifically, we randomly adjust brightness, contrast, saturation, and hue during training to simulate diverse lighting and imaging conditions as described in Section 6.1.1 of the revised manuscript (pp. 14, lines 472–474). By exposing the model to a wide range of color variations, this augmentation helps enhance its generalization capability under real-world scenarios where color and illumination may vary significantly. We believe this provides a practical mitigation against potential color-related biases in segmentation and downstream visibility estimation, even without a formal ablation on color sensitivity.

Comments 2: Have you made any comparison between monocular and stereo images using your framework?

Response 2: We have conducted comparative evaluations of monocular and stereo-based visibility estimation in outdoor scenarios, and observed that the performance of using monocular inputs is significantly inferior to that of stereo-based inputs. We provide a detailed analysis below.

Our visibility estimation framework fundamentally relies on depth maps with absolute metric scale as a critical input. The accuracy of the estimated visibility is directly dependent on the accuracy of the depth input. In principle, any method capable of providing reliable absolute-scale depth could be used. However, in practice, stereo vision systems offer geometric constraints that enable the recovery of physically accurate depth with true scale, ensuring consistency across diverse scenes. In contrast, monocular depth estimation methods typically rely on semantic priors and learned scale assumptions, often resulting in scale ambiguity and a tendency to overfit to training data distributions. Therefore, our system is designed around stereo image inputs to ensure the reliability and metric accuracy of the depth estimation.

Comments 3: What was you approach in sampling images, was there any equilibrium between foggy cases and clear ones?

Response 3: In constructing the dataset, we collected a diverse set of clear-weather RGB images from multiple sources, including road surveillance cameras, publicly available internet datasets, and simulation platforms as noted in Section 4.2.1 of the revised manuscript (pp. 7, lines 240-244). These clear-weather images were assumed to represent conditions of “infinite visibility” and served as the baseline for fog synthesis.

We applied the Koschmieder atmospheric model to synthesize foggy images. To ensure balanced coverage across visibility conditions, we uniformly sampled visibility values within the 0–5 km range, which covers both severe fog (e.g., ~50 m) and moderate conditions (~5 km). This sampling strategy ensures a uniform distribution of samples across different visibility levels, thereby achieving a balanced representation of both clear and foggy scenes in the training and evaluation sets.

Comments 4: How do you justify the choice of cameras emplacement?

Response 4: In the actual deployment of our system as described in Fig.10 and Section 6.3.1 of the revised manuscript (pp. 20, lines 636-641), the stereo camera was mounted on the rooftop of a building. This position allows the camera to capture a long-range horizontal scene and minimize s occlusions, which is critical for visibility estimation in outdoor environments. To enable direct comparison with the co-located FSVM (forward scatter visibility meter), the camera was installed in close proximity to it. This spatial colocation avoids mutual interference between sensors while ensuring both operate under similar atmospheric conditions. Furthermore, we select the camera’s pointing direction based on site-specific conditions, favoring orientations with minimal occlusions, stable background structures and low dynamic interference. This deployment configuration simultaneously meets the requirements of observation coverage, spatial comparability with ground-based instruments, and long-term operational stability.

We enhance Section 6.3.1 of the revised manuscript (pp. 20, lines 636-641) by adding information on the placement of the equipment.

Comments 5: Line 81-85: How the threshould selection was assessed for the confidence score?

Response 5: After training the fog detection network, we conducted a evaluation on the validation set and computed the corresponding precision and recall values. Based on this analysis, we plotted the precision-recall (PR) curve and selected the threshold that maximizes the F1 score, which provides an optimal trade-off between precision and recall as stated in Section 6.2.1 of the revised manuscript (pp. 17, lines 562–563). This threshold is used as the decision boundary to determine the presence of patchy fog in the final system.

This data-driven strategy ensures our detector achieves balanced performance, effectively identifying localized fog regions while minimizing false alarms, making it suitable for real-world deployment.

Comments 6: Line 184-196: How you algorithm distinguish cloudy back ground with no fog from fog cases?

Response 6: Our algorithm does not explicitly perform semantic differentiation between clouds and fog based on visual appearance alone. Instead, it leverages depth-aware modeling to implicitly distinguish these phenomena based on their spatial and geometric properties. Specifically, our visibility estimation framework operates on the observation that fog occurs close to the ground and affects visibility in the near-to-mid range (i.e., within the lower depth layers), whereas clouds are typically located at very large depths (approaching infinity in the depth map).

As illustrated in Section 5.3 of the revised manuscript (pp. 9, lines 324–326), the visibility calculation is performed per pixel in conjunction with its metric depth. Therefore, even if a sky region appears bright and hazy, its large depth value prevents it from being interpreted as a low-visibility area. In contrast, true fog patches are spatially correlated with foreground or midground regions and exhibit depth-dependent attenuation.

This depth-based reasoning allows our method to implicitly distinguish between cloud-covered skies and actual near-ground fog without requiring additional semantic segmentation modules.

Comments 7: Line 200-209: How does the used formulas take into account the weather conditions?

Response 7: The visibility estimation in our method is based on the classical Koschmieder atmospheric model, which is primarily applicable to fog conditions under stable illumination. This model does not explicitly account for the physical dynamics of other complex weather phenomena such as rain, snow, or dust storms. Instead, it focuses on modeling the relationship between image appearance and visibility distance under foggy conditions.

To improve robustness across varying daylight conditions, we set the airlight intensity within a fixed range of 192-202 in grayscale values as mentioned in Section 4.2.3 of the revised manuscript (pp. 8, line 268), which corresponds to typical daylight illumination levels observed in real-world foggy scenes. Furthermore, during training, we apply data augmentation including random adjustments of brightness and contrast to improve the model’s adaptability to illumination variations.

We acknowledge that in extreme weather conditions, such as heavy rain or snowfall, or when the camera lens is contaminated, the image quality may degrade significantly, potentially affecting the visibility prediction as noted in Section 6.3.2 of the revised manuscript (pp. 21, lines 661–663). This represents a current limitation of our system. In future work, we plan to investigate algorithmic enhancements and multi-modal sensor fusion to improve robustness under adverse weather.

In summary, although the formula we use is primarily targeted at foggy scenes, the implementation of realistic air light settings and lighting enhancement can help broaden its adaptability to different weather conditions.

Reviewer 2 Report

Comments and Suggestions for Authors

Please refer to the attachment.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Reduce the use of long phrases and opt for more direct constructions.
Avoid unnecessary repetition of previously explained concepts.
Maintain consistency in technical terminology throughout the document.
Simpler language in results sections: currently, highly technical phrases ("quantile-specific heterogeneity," "distributional asymmetries") are used, which could be alternated with more accessible explanations so that the findings can be understood without the need for a strong econometric background.
A general review in the language by a native speaker is recommended.

Author Response

Comments 1: Style and Clarity

Some sentences are too long and dense, which can make reading difficult. I would recommend breaking complex sentences into shorter statements to improve flow.

It is suggested that the transition between some methodological sections be improved, especially between the explanation of the mathematical model and the description of the neural network.

Response 1: Thank you for these valuable suggestions on improving the clarity and readability of our manuscript. We have revised the text accordingly.

We have identified and split some long and complex sentences, especially in sections 1 to 3, into shorter and clearer statements to improve readability and logical fluency. For example, we reconstructed the description of the significance of visibility estimation in the Introduciton, the limitations of monocular visibility estimation in the Related Work, the description of atmospheric scattering models in the Problem Setup, and the description of some design aspects of the visibility estimation module in the Method. These revisions better separate conceptual explanations from technical implementation details.

Regarding the transition between the mathematical model and the neural network description, we will insert a dedicated transitional paragraph in Section 3 of the revised manuscript (pp. 5, lines 201-205) to explicitly highlight how the atmospheric scattering model informs key aspects of our network and system design. including (i) the use of extinction coefficient prediction as the primary supervision target, which is detailed in Section 5.3.4 of the revised manuscript (pp. 12, lines 378-385), (ii) the use of depth context to estimate visibility measurements mentioned in Section 1 of the revised manuscript (pp. 2, lines 69-72), and (iii) the training strategy for visibility estimation described in Section 5.5 of the revised manuscript (pp. 13, lines 437-450). This revision will make clear that while the internal modules are learned in a data-driven manner, the system-level architecture and design choices are explicitly guided by the underlying visibility physics.

We make some modification in the revised manuscript:

Section 1 (1-3, lines 21-26, lines 39-40, lines 46-49, lines 95-99).
Section 2 (3-4, lines 127-128, lines 153-158).
Section 3 (5, lines 187-189, lines 201-205).
Section 5.3.4 (12, lines 373-376).
Section 5.3.5 (12, lines 389-390, line 396).

Comments 2:

Results and Discussion

While the quantitative results are solid, the comparative discussion section could delve deeper into the limitations compared to other approaches (e.g., cases where the model fails or extreme weather conditions such as heavy rain).

The error graphs (RMSE, AbsRel) are correct, but adding confidence intervals or statistical variability would make the conclusions more robust.

Response 2: Regarding the analysis of model limitations, we conduct further in-depth investigation. We include an analysis of the classification performance reported in Table 4 and Section 6.2.1 of the revisied manuscript (pp. 16, lines 546-559) that discusses failure cases such as false positives and false negatives. Furthermore, in Section 6.3.2 of the revisied manuscript (pp. 21, lines 661–663), we explicitly discuss performance degradation under extreme weather conditions, particularly when heavy rain or dust causes lens contamination, the image quality may degrade significantly, potentially affecting the visibility prediction. These analysis highlight the practical boundaries of current methods in real-world deployment.

Concerning the suggestion to include confidence intervals or statistical variability, we appreciate this important point. We adopt stochastic inference at test time to estimate prediction stability. Specifically, during evaluation, we apply introduce lightweight input perturbations, including small Gaussian noise and color jitter, across five forward passes per sample. The standard deviation of the resulting predictions serves as a proxy for uncertainty.

We make some modification in the revised manuscript:

We include an analysis of the precision and recall reported in Table 4 and Section 6.2.1 of the revisied manuscript (pp. 16, lines 546-559) that discusses failure cases such as false positives and false negatives.
We report the mean and standard deviation of each metric over the full test set in Section 6.2.1 of the revisied manuscript (pp. 15, lines 507-514) to demonstrate the consistency of the algorithm’s performance.

Comments 3:

Practical Implementation Aspects

The description of the embedded device is interesting, but it would be valuable to add a brief analysis of energy consumption and response times for real-time traffic scenarios.

In the actual validation section, the comparison with the FSVM (Forward Scatter Visibility Meter) is adequate, but a more detailed analysis of discrepancies in long measurements (>5 km) would be helpful.

Response 3: Regarding the efficiency of the system, as described in Section 6.3.1 of the revisied manuscript (pp. 19, lines 616–624), our network is deployed using lightweight optimization techniques on an RK3588 edge device equipped with a GPU and a NPU. During inference, a single forward pass takes approximately 200 ms. Although the system outputs visibility data once per minute in actual deployment, it has the capability to monitor at higher frequencies, faster than typical weather changes, enabling real-time visibility detection and rapid fog warning.

In terms of power consumption, the entire visual system, including stereo cameras, consumes 0.3–0.7 W in the idle state and 5.7–6.5 W in the active state, demonstrating excellent energy efficiency suitable for long-term outdoor deployments.

Regarding of performance of long measurements, we clarify that due to inherent limitations in camera sensor resolution and depth cue availability under extremely clear conditions, we cap the maximum estimated visibility at 5 km, which is a practical design choice that considers the physical constraints of the device and aligns with the operational focus of the fog warning system. Our method is primarily designed for medium-to-low visibility conditions, which are most critical for traffic safety and operational decision-making.

Although the output is set with an upper limit, the algorithm still ensures robustness and avoids unreliable inference in situations with high visual contrast gradients. As shown in Figure 11(b), in real-world scenes with very high visibility (>5 km), our system consistently outputs the capped value. This demonstrates stable and predictable performance: rather than producing erratic or divergent predictions when visibility exceeds the effective sensing range, our method saturates gracefully. In future work, we expect to use higher-resolution cameras and longer stereo baselines to extend the effective sensing range, thereby increasing the upper bound of reliable visibility estimation.

We enhance Section 6.3.1 of the revised manuscript (pp. 19-20, lines 624–627, 631-635) by adding detailed discussions on inference latency and power consumption, thereby providing a more comprehensive view of the system's engineering applicability.

Comments 4:

Language and presentation

The language is adequate, but there are minor editorial issues that could be refined (e.g., "expensive experiments" could be replaced with "extensive experiments" to avoid ambiguity).

Review by a native speaker is recommended.

Some tables and figures could benefit from improved graphic design (e.g., uniform scales, improved legibility of axis labels).

The article is solid, innovative, and has been improved in its writing, with clear contributions to visibility estimation and fog detection in intelligent transportation applications.

It only requires minor revisions in style, clarity of writing, and some practical aspects of the discussion. With these improvements, the manuscript is fully publishable.

Response 4: We fully agree with the reviewer’s suggestions regarding language refinement, clarity, and graphical presentation. In response to these comments, we have carefully revised the manuscript as follows:

We have thoroughly reviewed the entire manuscript to address minor editorial issues. For instance, as suggested, the phrase “expensive experiments” has been replaced with “extensive experiments” to avoid ambiguity. Additional improvements include enhanced sentence clarity (as detailed in Response 1) and consistent terminology (e.g., uniform reference to FSVM throughout the paper). All language refinements have been carried out by the authors through careful revision, ensuring both technical accuracy and improved readability without external assistance.

Regarding the graphical presentation, we have conducted a comprehensive review of all tables and figures to improve their visual clarity and consistency. Specifically, we have ensured uniform scales across all plots within the same experiment or comparative analysis to facilitate accurate visual comparison. For figures where axis label legibility was affected by layout constraints, such as Figure 9 and Figure 12 of the revised manuscript, we have increased the size of the subfigures and adjusted the layout to enhance readability.

Comments 5:

Quality of English Language

Reduce the use of long phrases and opt for more direct constructions.

Avoid unnecessary repetition of previously explained concepts.

Maintain consistency in technical terminology throughout the document.

Simpler language in results sections: currently, highly technical phrases ("quantile-specific heterogeneity," "distributional asymmetries") are used, which could be alternated with more accessible explanations so that the findings can be understood without the need for a strong econometric background.

A general review in the language by a native speaker is recommended.

Response 5: We thank the reviewer for the language feedback. We have revised the manuscript to use shorter, more direct sentence structures, reduce repetition of previously explained concepts, and ensure consistent technical terminology throughout.

Reviewer 3 Report

Comments and Suggestions for Authors

In the process of explaining Figure 7, there is a section stating, “After removing these erroneous outliers, on the patchy fog dataset, we obtain an AbsRel of 0.086, SqRel of 0.012, RMSE of 236.273, with a maximum distance error of 1166.941m” I would like to ask how many percent of the data were removed and what numerical criteria were used to identify the outliers in that section.
The paper explains that patchy fog is synthesized by introducing “variations in the form of a normal distribution (adding an extinction coefficient)” to uniform fog. However, the proposed patchy fog detection function has only been validated on the synthetic dataset, and no results for this are presented in the actual field test (Section 6.3). As this is a feature highlighted as one of the main contributions, I would like to ask whether there are experimental results from real-world environments.
This paper conducted a comparative evaluation by installing the proposed camera-based approach and FSVM side-by-side. However, the two devices fundamentally differ in their spatial coverage range. While FSVM measures local atmospheric scattering around the sensor, the camera-based method reflects the entire field of view. This difference raises questions about comparing the two methods using the same metric (visibility, m). I would like to ask how you accounted for synchronization at the same location and time, correction for differences in field of view angle and distance, variations in observation altitude or background brightness, and what points require caution when interpreting the results from both systems.
The proposed method in this paper was evaluated using regression metrics such as AbsRel, SqRel, and RMSE, as well as classification metrics like the PR curve, AUC, and F1-score. However, considering real-world traffic and aviation applications, metrics such as accuracy based on critical visibility distance, real-time processing performance (fps, latency), and power efficiency are also considered important. I would like to inquire whether results have been calculated for these metrics.
In the paper, you stated, “The system is intended for deployment in intelligent transportation and aviation monitoring scenarios, providing real-time visibility information under adverse weather.” However, for traffic and aviation monitoring, I believe that not only accuracy but also the detection failure rate (false alarms and omissions) in sections below a specific critical visibility distance is crucial. I would like to inquire whether there are any experimental results regarding this.

Author Response

Comments 1: In the process of explaining Figure 7, there is a section stating, “After removing these erroneous outliers, on the patchy fog dataset, we obtain an AbsRel of 0.086, SqRel of 0.012, RMSE of 236.273, with a maximum distance error of 1166.941m” I would like to ask how many percent of the data were removed and what numerical criteria were used to identify the outliers in that section.

Response 1: Thank you for this thoughtful comment. We appreciate the opportunity to clarify an important aspect regarding outlier handling in our evaluation.

Since our data is automatically collected, despite careful filtering, some unreasonable scenes still remain in the dataset, such as images containing densely nearby buildings or large areas of open sky. Direct visibility estimation on such ambiguous regions, which lack reliable depth context, can lead to erroneous results. We retain these samples for further analysis. In our model, they typically result in very low visibility outputs.

These failure cases are primarily caused by suboptimal camera placement rather than model deficiencies. To reflect the a fair and accurate capability of the model, we manually removed these outlier samples during quantitative evaluation, without applying a strict numerical threshold.

There are 69 instances excluded as outliers among 5,300 test samples, accounting for approximately 1.3% of the total. We believe this has negligible impact on the reported quantitative metrics, and does not affect the validity of our dataset generation pipeline or the proposed model.

We explicitly mark these outlier cases in Figure 7 and provide clearer textual explanation in the Section 6.2.1 of the revised manuscript (pp. 15, lines 499–504).

Comments 2: The paper explains that patchy fog is synthesized by introducing “variations in the form of a normal distribution (adding an extinction coefficient)” to uniform fog. However, the proposed patchy fog detection function has only been validated on the synthetic dataset, and no results for this are presented in the actual field test (Section 6.3). As this is a feature highlighted as one of the main contributions, I would like to ask whether there are experimental results from real-world environments.

Response 2: In the original manuscript, the quantitative evaluation of patchy fog detection is primarily conducted on the synthetic dataset. This is because the original deployment site did not experience patchy fog conditions, limiting the opportunity for validation. To address this limitation, we have established an additional deployment on a high-altitude expressway section in Guizhou Province, China. It is a region known for frequent and naturally occurring patchy fog due to its unique topography and climate. Although this new site is not equipped with a FSVM and lacks reference data, we can still provide algorithm outputs and corresponding analyses from real traffic environments.

In the revised manuscript, we represent representative image samples covering various scenarios, including uniform fog (high/low visibility) and patchy fog (high/low visibility), along with the corresponding visibility maps, visibility distribution curves, and detection responses from the patchy fog module. These cases demonstrate that the system is capable of identifying patchy fog regions in complex real-world traffic scenes and producing spatially coherent and reasonable visibility estimates.

We include these illustrations and analysis in Figure 12 and Section 6.3.3 of the revised manuscript (pp. 21, lines 672–683).

Comments 3: This paper conducted a comparative evaluation by installing the proposed camera-based approach and FSVM side-by-side. However, the two devices fundamentally differ in their spatial coverage range. While FSVM measures local atmospheric scattering around the sensor, the camera-based method reflects the entire field of view. This difference raises questions about comparing the two methods using the same metric (visibility, m). I would like to ask how you accounted for synchronization at the same location and time, correction for differences in field of view angle and distance, variations in observation altitude or background brightness, and what points require caution when interpreting the results from both systems.

Response 3: You have correctly highlighted a fundamental challenge in evaluating vision-based visibility estimation systems: the inherent differences in measurement principles and spatial coverage between FSVM and camera-based methods as noted in Section 6.3.1 of the revised manuscript (pp. 20, lines 645–649). We fully agree that this distinction must be carefully considered when interpreting comparison results.

Visibility is inherently a macroscopic parameter that characterizes the large-scale transparency of the atmosphere over a given path. Vision-based methods estimate visibility directly through observed image degradation across natural scenes, such as contrast loss, color shift, and haze accumulation. This perceptual proxy aligns closely with human visual experience, offering a direct, scene-level assessment of visibility.

In contrast, FSVM provides an indirect, point-based inference of visibility by measuring the angular scattering of light over a short baseline. While widely adopted in meteorology due to standardization and reliability, FSVMs do not measure visibility precisely; instead, they infer it under assumed atmospheric uniformity. This introduces several limitations: (i) their narrow sampling volume may not represent path-averaged conditions, especially in non-uniform fog; (ii) their output is sensitive to local turbulence, contamination, and installation height; and (iii) they lack spatial context, making them blind to scene structure and distant visibility variations.

Despite these differences and the fact that FSVM itself is an indirect method, we use its results as a practical reference because it remains the most widely accepted instrument in operational settings and there is no direct, path-integrated ground truth sensor currently exists for outdoor visibility. Accordingly, we perform trend and temporal consistency analysis rather than their absolute accuracy, to evaluate how well our vision-based estimates align with the temporal dynamics of atmospheric visibility. As demonstrated in our results in Figure 11 and Section 6.3.2 (pp. 21, lines 653-670), our method benefits from a larger field of view, spatial integration over extended paths, and direct sensitivity to perceptual degradation, enabling it to produce more stable and responsive visibility estimates, particularly in dynamic or spatially heterogeneous fog conditions.

We made some modifications in Section 6.3.1 of the revised manuscript (pp. 20, lines 648–652) to add explanations on the comparability between our method and FSCM.

Comments 4: The proposed method in this paper was evaluated using regression metrics such as AbsRel, SqRel, and RMSE, as well as classification metrics like the PR curve, AUC, and F1-score. However, considering real-world traffic and aviation applications, metrics such as accuracy based on critical visibility distance, real-time processing performance (fps, latency), and power efficiency are also considered important. I would like to inquire whether results have been calculated for these metrics.

Response 4: We understand that "critical visibility distance" refers to the threshold used in applications such as traffic management or aviation to trigger safety decisions. While our method is primarily designed for continuous visibility estimation rather than discrete classification, While our method outputs continuous estimates, we have added accuracy evaluation across standard visibility levels in Table 4 and Section 6.2.1 of the revised manuscript (pp. 16-17, lines 546–559). Given that our method outputs regression-based visibility estimates, it allows for flexible threshold setting, adapting to different application scenarios.

Regarding to real-time performance, as described in Section 6.3.1 of the revised manuscript (pp. 19, lines 617–624), our network is deployed on an RK3588 edge device equipped with a GPU and a NPU using lightweight optimization techniques. During inference, a single forward pass takes approximately 200 ms. Although the system outputs visibility data once per minute in actual deployment, it has the capability to monitor at higher frequencies, faster than typical weather changes, enabling real-time visibility detection.

Regarding to power efficiency, the entire visual system, including stereo cameras, consumes between 0.3–0.7 W in the idle state and between 5.7–6.5 W in the active state, demonstrating excellent energy efficiency suitable for long-term outdoor deployments.

In summary, our method not only performs well in academic metrics but also shows practical potential in terms of critical response capabilities, real-time performance, and power efficiency.

We add accuracy evaluation across standard visibility levels in Table 4 and Section 6.2.1 of the revised manuscript (pp. 16-17, lines 546–559). We enhance Section 6.3.1 of the revised manuscript (pp. 19-20, lines 624–627, 631-635) by adding detailed discussions on inference latency and power consumption.

Comments 5: In the paper, you stated, “The system is intended for deployment in intelligent transportation and aviation monitoring scenarios, providing real-time visibility information under adverse weather.” However, for traffic and aviation monitoring, I believe that not only accuracy but also the detection failure rate (false alarms and omissions) in sections below a specific critical visibility distance is crucial. I would like to inquire whether there are any experimental results regarding this.

Response 5: It is important to note that our work focuses on continuous regression-based visibility estimation rather than modeling "whether below a certain threshold" as a classification problem. Therefore, we have not directly reported detection failure rates in the original manuscript. Nonetheless, we have additionally evaluated performance in a graded setting as illustrated in Table 4 and Section 6.2.1 (pp. 16-17, lines 546-559). By dividing visibility into intervals, we computed precision and recall per level, showing strong discriminative ability across operational ranges.

Additionally, we have already analyzed typical failure cases in the manuscript. For example, some outliers in Figure 7 arise from scenarios where the scene is too close or dominated by sky views, due to suboptimal camera placement limiting effective observation range. Figure 11 (c) illustrates how raindrops or lens dirt can degrade image quality, leading to abnormal fluctuations in visibility estimates. These analyses indicate that the primary sources of error stem from deployment constraints and extreme imaging disturbances, rather than inherent flaws in the model itself.

We extend the analysis in Table 4 and Section 6.2.1 of the revised manuscript (pp. 16-17, lines 546-559) to include detailed examination of missed detections and false positives within specified visibility intervals, providing a comprehensive view of the system's real-world performance.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors addressed all observations

Article Menu

Joint Modeling of Pixel-Wise Visibility and Fog Structure for Real-World Scene Understanding

Further Information

Guidelines

MDPI Initiatives

Follow MDPI