Robust Visual Recognition in Poor Visibility Conditions: A Prior Knowledge-Guided Adversarial Learning Approach

: Deep learning has achieved remarkable success in numerous computer vision tasks. However, recent research reveals that deep neural networks are vulnerable to natural perturbations from poor visibility conditions, limiting their practical applications. While several studies have focused on enhancing model robustness in poor visibility conditions through techniques such as image restoration, data augmentation, and unsupervised domain adaptation, these efforts are predominantly conﬁned to speciﬁc scenarios and fail to address multiple poor visibility scenarios encountered in real-world settings. Furthermore, the valuable prior knowledge inherent in poor visibility images is seldom utilized to aid in resolving high-level computer vision tasks. In light of these challenges, we propose a novel deep learning paradigm designed to bolster the robustness of object recognition across diverse poor visibility scenes. By observing the prior information in diverse poor visibility scenes, we integrate a feature matching module based on this prior knowledge into our proposed learning paradigm, aiming to facilitate deep models in learning more robust generic features at shallow levels. Moreover, to further enhance the robustness of deep features, we employ an adversarial learning strategy based on mutual information. This strategy combines the feature matching module to extract task-speciﬁc representations from low visibility scenes in a more robust manner, thereby enhancing the robustness of object recognition. We evaluate our approach on self-constructed datasets containing diverse poor visibility scenes, including visual blur, fog, rain, snow, and low illuminance. Extensive experiments demonstrate that our proposed method yields signiﬁcant improvements over existing solutions across various poor visibility conditions.


Introduction
Recent advances in deep learning have led to remarkable success in computer vision tasks, including object recognition [1], object detection [2], and semantic segmentation [3]. Despite the achievements, deep learning models still face significant challenges when being applied to real-world scenarios. One of the most critical issues is the presence of poor visibility conditions, which introduce natural perturbations that can degrade image quality in various ways. These include the loss of texture information, object shape distortion, and partial occlusion. Studies have shown that deep models are susceptible to significant performance degradation under poor visibility conditions [4,5].
To address this problem, previous studies utilize image restoration techniques [6][7][8][9] to recover the damaged visual content, which is then fed into deep discriminative models for high-level tasks. These restoration methods are often limited in their generalization due to their specificity to certain types of scenes, such as image de-raining for rainy scenes. Another popular solution is unsupervised domain adaptation. In this setting, labeled clear images are employed as the source domain, whereas unlabeled images from scenes with poor visibility are utilized as the target domain. By learning domain-invariant features, unsupervised domain adaptation can improve the model performance in the target domain. While some studies have investigated unsupervised domain adaptation in a limited number of poor visibility scenarios, such as day-to-night variation [10], the majority of poor visibility scenes remain largely unexplored. Additionally, other works have designed various image augmentation strategies [11][12][13] to improve the model robustness against multiple visual perturbations.
The above-mentioned studies have demonstrated the efficacy of image restoration and unsupervised domain adaptation in enhancing the performance of object recognition tasks in single poor visibility scenarios. However, there is a lack of research exploring the integration of key ingredients from these two directions. For instance, the prior knowledge widely employed in image restoration tasks, such as the dark channel prior used for deblurring and dehazing [14,15], is seldom utilized in high-level semantic tasks. Most unsupervised domain adaptation research focus on the consistency of deep semantic representations across domains while ignoring the essential prior information embedded in shallow features. Therefore, we aim to leverage traditional prior knowledge effectively within the paradigm of unsupervised domain adaptation, thereby elevating the robustness of object recognition models across diverse types of poor visibility scenarios.
In this paper, we propose a new deep learning-based approach called Prior Knowledgeguided Adversarial Learning (PKAL) to enhance visual recognition in multiple poor visibility conditions. Based on our observations, the prior knowledge of intermediate features varies between clean and blurry images, and the dark and bright channel priors [15] from the intermediate features lose many meaningful contents due to visual blurs, as shown in Figure 1. Henceforth, it is difficult to extract sufficient semantic representations, which leads to a performance drop in high-level tasks. This phenomenon is widely witnessed in other poor visibility conditions. Moreover, prior studies have attempted to capture prior knowledge of low-level features and align these pivotal visual cues, ultimately enhancing the performance of downstream tasks [16][17][18]. To this end, we design a Feature Priors Matching Module (FPMM) to discern the discrepancies of prior knowledge-based features between clean and low-quality images and to suppress them during training. Deep models preserve more meaningful information from shallow layers of low-quality data under the constraints imposed by FPMM, thus enhancing the model robustness in poor visibility conditions.
Considering that the FPMM works in the shallow layers for robust and generic features, we propose a novel Mutual Information-based Robust Training (MIRT) strategy to improve the robustness of task-specific features. Concretely, MIRT establishes an adversarial learning mechanism between two feature generators and one class discriminator to enhance robust deep representations and to refine decision boundaries simultaneously. At the beginning of training, one feature generator is equipped with FPMM while the other is not. Thus, the former generator extracts more robust features than the latter one. The class discriminator accepts the robust features and rejects the other ones. Mutual information [19] is employed to quantify the receptive strength of deep features. Under the adversarial mechanism, MIRT encourages the generator with FPMM to generate robust features continuously, whereas the discriminator refines decision boundaries to reject sensitive features. Ultimately, the feature generator with FPMM and the class discriminator comprise a robust model for visual recognition under poor visibility conditions. Figure 1. From left to right for each image, we show the raw data, dark channel features, and bright channel features. We can observe that the reduction of bright and dark channel features induced by visual blur may remove the significant object details and thus lose semantic information.
To validate the efficacy of our proposed approach, we build a comprehensive dataset of poor visibility with several common perturbations, including visual blur, fog, rain, snow, and low illuminance. The dataset comprises both real-world samples and synthetically generated data that simulate realistic settings. Furthermore, we conduct a comparative analysis between our approach and 16 established methods renowned for their effectiveness in image restoration, domain adaptation, and model robustness. Through extensive experimentation, we demonstrate that our approach outperforms the majority of existing methods in various poor visibility scenarios, while achieving comparable performance to the remaining ones. In summary, the main contributions of this paper are as follows: • We propose a novel deep learning-based approach, PKAL, to enhance the model robustness for visual recognition under various poor visibility conditions. • The proposed feature matching module, FPMM, transfers typical prior knowledge widely used in low-level tasks to high-level ones. • We design an adversarial min-max optimization strategy to enhance robust taskspecific representations and to refine decision boundaries simultaneously. • We evaluate our proposed approach on a diversity of poor visibility scenarios, including visual blur, fog, rain, snow, and low illuminance. The experiments demonstrate the efficacy and adaptability of our approach.

Image Restoration for Poor Visibility Conditions
Image restoration is a long-standing research interest aimed at enhancing visual data in poor visibility conditions. This includes image deblurring [20][21][22], de-raining [23,24], defogging [8,25], and other related areas. While these approaches focus on improving visual effects for human perception, they do not necessarily consider the needs of high-level machine vision. Early approaches mainly rely on seeking image prior knowledge [6,14,15,[26][27][28], which represents special salient features in poor visibility scenes. These image priors are commonly incorporated as key regularization terms into the optimization process for image restoration. Despite achieving better visual results, these algorithms require high computation costs during testing and do not generalize well to similar poor visibility scenarios, thus limiting their application.
Recent advances in deep learning-based image restoration have aimed to address the aforementioned limitations by employing various loss functions, learning paradigms, and network architectures [20][21][22]29]. With the explosion of visual data, deep learning-based restoration algorithms have demonstrated superior performance and efficiency compared to prior-based algorithms. In addition to improving the visual quality of images, several studies investigate the potential benefits of image restoration for high-level vision tasks. For instance, restoration algorithms are evaluated as pre-processing modules in object recognition, detection, and segmentation [5,30].

Data Augmentation for Poor Visibility Conditions
Image augmentation has become a popular technique for training deep neural networks [31]. Early data augmentation methods were designed to improve model performance and alleviate overfitting. In more recent years, image augmentation has also been used to enhance model robustness by modifying visual attributes in various ways while preserving semantic content. One way of doing this is by altering patchlevel pixel regions. For instance, Cutout [32] randomly masks out square regions of the inputs to simulate occlusions that occur in real-world scenarios. CutMix [33], on the other hand, replaces the masked regions with patches of other images to mitigate information loss. Another approach is to augment the training examples with varying image styles. Ref. [34] shows that applying AdaIN [35] to add stylized training data can promote robust deep models. DeepAugment [12] randomly adjusts the weights in the image-to-image network to produce the same content under distinct textural variations. In addition, various augmentation techniques, such as AutoAugment [13] and RandAugment [36], have been developed to identify the optimal combination of image processing operations to train deep models. AugMix [37], a new combination, randomly selects data processing operations and their severity levels in a fixed and parallel pipeline. It has shown great potential in enhancing model robustness against synthetic image corruptions [11].

Unsupervised Domain Adaptation
Machine vision under poor visibility conditions inevitably captures a large number of unlabeled, low-quality images. These low-quality images are often accompanied by visual perturbations, leading to a distribution shift from clean data. Unsupervised Domain Adaptation (UDA) is well suited to handle such distribution shift problems, aiming to improve the generalization of deep models trained in the label-abundant domain to the unlabeled domain with different data distributions. Based on their learning paradigm, UDA methods can primarily be categorized into two groups. The first one focuses on aligning feature statistics to learn invariant representations across domains. DDC [38] minimizes high-dimensional distribution discrepancies by measuring Maximum Mean Discrepancy (MMD) on deep representations. DAN [39], an advanced version of DDC, estimates feature discrepancies on multiple layers using MMD with a non-linear kernel. Deep CORAL [40] aligns the second-order statistics of deep features between the source and target domains. The second group pursues domain-invariant features through the adversarial learning principle. DANN [41], the pioneer in this area, uses an adversarial learning framework to force the discriminator to fail in recognizing the domain label of deep features. MCD [42] introduces a novel adversarial training strategy between two label classifiers to refine the task-specific decision boundary. CDAN [43] incorporates a conditioning strategy into adversarial learning to achieve better discriminability and transferability. VADA and DIRT-T [44] implement a two-stage adversarial learning pipeline to penalize violations of the cluster assumption [45]. Our proposed approach adheres to the UDA settings, seeking to maximize the use of unlabeled data.

Setup
We consider a source domain D s = (x s i , y s i ) containing N t unlabeled low-quality data. Both domains are associated with K categories and sampled from joint distributions P(x s , y s ) and Q(x t , y t ), respectively. The ground-truth label y s i belongs to the set 1, 2, . . . , K. We define a deep classifier F for the K-class classification problem. Given an input x, the deep classifier produces a K-dimensional vector such that F(x) ∈ R K . Using the softmax operation, we can compute the prediction probability for the k-th category as follows: In this paper, we decompose the deep classifier F into a feature generator G and a label discriminator D, such that F = G • D. Therefore, we can rewrite (1) as follows:

Dark and Bright Channel Priors
In this paper, we revisit two types of classic image priors: the Dark Channel Prior (DCP) and the Bright Channel Prior (BCP), both of which are employed in our proposed approach. The DCP and BCP are derived from the observation that image perturbations can significantly affect the number of dark pixels (the smallest value in an image patch) and bright pixels (the largest value in an image patch). As illustrated in Figure 2, the DCP and BCP are clearly visible in the image histogram, where the number of dark and bright pixels decreases as natural perturbations occur. The DCP and BCP have been extensively utilized in image deblurring and dehazing [15,46]. For a given image I, we extract the dark channel information T d for a pixel location q as follows: where p, q are the pixel locations and c denotes the color channel. The local patch centered at q is represented by Ω(q). The bright channel prior, denoted by T b , can be defined in a similar manner: Furthermore, prior research incorporates the bright channel prior (BCP) and the dark channel prior (DCP) into the Extreme Channels Prior (ECP), which serves as the regularization term in both optimization-based deblurring [26,28] and deep deblurring [6,14].

The Estimation of Mutual Information
Mutual information is an entropy-based metric used to assess the mutual dependence between variables. Recent research demonstrates the effectiveness of mutual information as a regularization term in representation learning [19,47]. Let (X, Y) represent a pair of random variables. Its general form can be expressed as follows: where H(Y) is the marginal entropy of the variable Y and H(Y|X) is the conditional entropy of Y given the variable X. Due to the infeasibility of obtaining an analytical solution for mutual information, recent studies employ deep learning techniques to estimate its lower or upper bounds [48,49]. In our approach, we present a straightforward method to estimate the mutual information between the inputs and their predicted outputs, which has demonstrated its efficacy in semi-supervised and unsupervised learning problems [47,50].
To provide a better understanding of this method, let us consider a K-class (K ≥ 2) object recognition task, where x represents the input and y denotes its corresponding groundtruth label. We assume that F denotes the deep classifier. By using (1), we first calculate the conditional entropy of the prediction outputs given the inputs: where H is the symbol of entropy and N denotes the sample number. The marginal entropy of the prediction outputs can be estimated as: The conditional entropy gives an indication of the average uncertainty degree of the deep classifier on each input sample. In contrast, the marginal entropy reflects the total uncertainty of the deep classifier on the entire input distribution. By utilizing (5)- (7), we can estimate the mutual information between the inputs and their corresponding predicted outputs.

Prior Knowledge-Guided Adversarial Learning
This section provides a detailed description of our proposed approach, Prior Knowledgeguided Adversarial Learning (PKAL), along with its crucial components: the Feature Priors Matching Module (FPMM) and Mutual Information-based Robust Training (MIRT).

Feature Priors Matching Module
Image priors consist of latent image attributes that are imperceptible to human vision but capture significant differences between visual domains (e.g., clean vs. foggy images). In typical image processing, image priors are commonly integrated as regularization terms in optimization. However, as illustrated in Figure 1, our analysis shows that mismatches in image priors between clean and low-quality data persist not only in the raw pixel domain but also in the intermediate features of deep models. If ignored, these mismatches can lead to deep representations with incorrect semantic information, ultimately resulting in erroneous predictions. Thus, mitigating this feature-level mismatch is essential for enhancing the robustness of deep models in high-level discriminative tasks.
Based on our observations, we propose the Feature Priors Matching Module (FPMM) as a plug-and-play solution to address feature-level mismatches in the dark and bright channel priors on shallow features. FPMM constrains the statistical discrepancies between these priors during the training process. As shown in Figure 3, FPMM initially extracts the dark channel prior T d and the bright channel prior T b from the shallow layer φ 0 of the feature generator G. By using (3) and (4), we formulate T d and T b on the intermediate features as follows: where φ 0 (x) represents the intermediate feature map with C channels for the given input x. φ(x) c 0 denotes the c-th channel feature map, where C is the total number of channels. After extracting the feature priors, we employ the Maximum Mean Discrepancy (MMD) as the high-dimensional distribution distance metric to quantify the statistical discrepancy of these feature-level priors between the clean and low-quality domains. The empirical approximation of this distance metric can be expressed as follows: where H k refers to the Reproducing Kernel Hilbert Space (RKHS) with a characteristic kernel k. Here, T(·) can be replaced with either T d (·) or T b (·) to indicate the extraction of either dark or bright prior-based features. Therefore, the final objective for FPMM can be formally defined as follows: The final objective is to measure the statistical mismatch of two common feature-level priors between the clean and low-quality domains. By minimizing the above objective, we mitigate the negative effects of this mismatch and encourage deep models to learn robust features during the decision-making process.

Mutual Information-Based Robust Training
To enhance the discriminability of high-level semantic representations, we propose a novel min-max optimization strategy called Mutual Information-based Robust Training (MIRT), which complements the robustness of generic features in the shallow layers achieved by FPMM. Figure 3 provides details on the implementation of MIRT. Our approach relies on the intuition that, with the aid of FPMM, the feature generator G can obtain more robust representations from low-quality inputs than G adv . For these inputs, the class discriminator D is motivated to differentiate between deep representations generated by G and those generated by G adv . We employ mutual information between the inputs and their corresponding predicted outputs as the judgment criterion. Using (6) and (7), the estimation of mutual information over G and D can be calculated: Similarly, we denote the estimation of mutual information over G adv and D as L mi (G adv , D; D t ). For low-quality inputs, G and D learn to maximize the mutual information in the same direction. As we discussed earlier, this also entails maximizing the marginal entropy and minimizing the conditional entropy. A lower conditional entropy compels D to widen the margin of decision boundaries for relatively robust features from G and encourages G to produce deep features far from these boundaries. A higher marginal entropy promotes a uniform distribution over the predictions of G • D. Additionally, the discriminator D engages in an adversarial game with the feature generator G adv . Since G adv extracts deep features that are more sensitive to low visibility conditions, D minimizes mutual information to refine the decision boundaries close to these sensitive features from G adv . Consequently, D can successfully differentiate between deep features generated by G or G adv . G adv optimizes in the opposite direction. For labeled clean inputs, we train G, G adv , and D by minimizing the cross-entropy loss. The cross-entropy loss over G with D can be expressed as follows: Similarly, L y (G adv , D; D s ) denotes the cross-entropy loss over G adv with D. Therefore, the final objective in MIRT can be formulated as: where α, β, and γ are the weight factors. The detailed PKAL procedure is provided in Algorithm 1. We note that G and D comprise the final model for inference in poor visibility conditions.

Experiments
In this section, we introduce six self-constructed datasets that include various poor visibility scenes. These datasets cover natural and commonly encountered perturbations, such as visual blur, fog, rain, snow, and low illuminance. To simulate low visibility conditions, we collected datasets from real-world scenarios or generated them using SOTA synthesis algorithms. We then evaluate the performance of our proposed approach against fifteen existing solutions on these low-quality datasets.

Training Strategies
In our experiments, we utilize three backbone networks, namely, AlexNet, VGG19, and ResNet-18 [51][52][53]. For training our proposed approach, we use the SGD optimizer with an initial learning rate of 0.1 and a Cosine Annealing learning rate schedule. The batch size is set to 128, and the training epochs are set to 90 for all experiments. The hyper-parameter α is kept at 0.1 throughout the training process. β and γ are set to zero in the first 15 epochs and then increased to 0.2 and 0.1, respectively, for the remaining training process.

Comparison Methods
We compare our proposed approach with fifteen existing solutions, which can be divided into four categories: image restoration, statistical alignment, adversarial domain adaptation, and data augmentation. Image restoration aims to improve the visual quality of low-quality inputs before they are applied to high-level tasks [20][21][22]25,28,[54][55][56][57]. Statistical feature alignment belongs to typical unsupervised domain adaptation techniques, which seek invariant representations between the clean and low-quality domains by aligning the statistics of deep features [39,40]. Adversarial domain adaptation integrates adversarial learning and domain adaptation in a two-layer game, in which the feature generator and domain discriminator are adversarially trained to learn invariant representations [41][42][43][44]58]. We also include two data augmentation strategies that apply multiple operations of image transformation and stylization [36,37].

Visual Blur
Visual blur is a common perturbation encountered in real-world scenarios that can be caused by various factors, such as a shaky or out-of-focus camera, low exposure time, and fast-moving objects. To account for this, we construct two blurry datasets: REDS-BLUR and ImageNet-BLUR, each serving a distinct purpose. The REDS-BLUR dataset consists of 6170 clean images and 2155 blurry images in 11 classes, where the blurry subset is manually collected from the REDS dataset [59], a video deblurring dataset that utilizes deep learning techniques to simulate real-world motion blur. This dataset is designed to be comparable in size to typical unsupervised domain adaptation benchmarks, such as Image-CLEF, OFFICE-31, and OFFICE-Home.
In addition to motion blur, we introduce the ImageNet-BLUR dataset, which includes three additional types of blur: defocus blur, zoom blur, and glass blur. The clean subset consists of 129,377 high-quality images from 200 classes of ImageNet [60]. We generate an unlabeled blurry subset of 129,381 images using the synthesis pipeline in [11] to simulate the different types of blur. The testing set of ImageNet-BLUR includes five severity levels for each blur type, resulting in a total of 4 × 5 × 5000 testing samples.

Rain
Rain is a common outdoor weather condition that degrades the visibility due to its scattering and blurry effects. To create a synthetic rain dataset, we generated rain streaks using a classic rain rendering algorithm [61] and selected background scenes from ImageNet. The rain rendering model can simulate real-world rain scenes by capturing the interactions between the lighting direction, viewing direction, and the oscillating shape of the rain streaks. We name this dataset ImageNet-RAIN, which contains 129,377 clean images and 129,381 unlabeled rain images in 200 classes. Examples of the rainy images from ImageNet-RAIN are displayed in Figure 4e.

Snow
Snow particles, such as snowflakes, often cause severe occlusion in images. To address this, we created the ImageNet-SNOW dataset, which synthesizes snow images from clean backgrounds from ImageNet and 2000 snow templates varying in transparency, size, and location from the CSD snow scene dataset [9]. Each snow image is generated by combining a clean image with a randomly selected snow template. The ImageNet-SNOW dataset is of a similar magnitude to ImageNet-RAIN. We present some samples of ImageNet-SNOW in Figure 4f.

Low Illuminance
Low-light conditions often occur due to inadequate light or under-exposed cameras, resulting in images with object shapes but discarded local details, such as image texture. Texture information plays a key role in semantic-level tasks, such as object recognition [34]. To evaluate our method in low-light conditions, we designed the ImageNet-DARK dataset. To mimic realistic low-light conditions, we use a two-stage synthesis strategy to adjust low-light distribution from the perspectives of local and global regions. Following the generation of low-light data [55], we retrain the ZeroDCE enhancement model [55] with the low-exposure parameter for local low-light adjustment. The original ZeroDCE with the normal exposure parameter restores low-light images, but the revised ZeroDCE generates low-light images through a reverse process. After local adjustment, we globally manipulate the exposure intensity to form the final visually similar version. The clean backgrounds of ImageNet-DARK are selected from ImageNet. We present some low-light examples in Figure 4g.

Evaluation on Visual Blur
We evaluate the performance of our proposed approach and comparison methods on the REDS-BLUR dataset, and the results are presented in Table 1. Our approach outperforms the baselines, achieving gains of 13.9%, 11.4%, and 12.6% in AlexNet, ResNet-18, and VGG19, respectively. Among the deblurring algorithms, most of them have a positive effect on blurry object recognition, except for RL. Specifically, SRN achieves the largest increase of 14.7% in ResNet-18. For the UDA methods, their performance varies across different network structures. For example, CDAN obtains a gain of 7.1% in AlexNet but a drop of 6.5% in VGG19. In contrast, W-DANN and DIRT-T are the most stable approaches among them, improving 4.6% and 2.9% on average, respectively. Table 2 shows the comparison of our approach and existing solutions on the ImageNet-BLUR dataset. Our PKAL approach demonstrates great efficacy in all blur severity levels and types, with an average improvement of 16%. The best performance of 42.4% for defocus blur and 49.0% for zoom blur show the great transferability of PKAL.

Evaluation on Fog
We evaluate the performance of our approach and comparison methods on the Web-FOG dataset, and the results are presented in Table 3. The proposed PKAL achieves the largest gain of 31.4% compared with the baseline, and even using the single FPMM achieves the leading increase of 19.1%, which is comparable to the performance of Ran-dAug and AugMix. The UDA methods improve by above 18% on the foggy data, especially with a large positive margin of 26.0% for DIRT-T and 24.4% for DANN. However, FFA-Net, which has a dehazing effect, only shows a slight improvement of 2.3% compared with the baseline model.  Table 3 presents the results on the ImageNet-SNOW dataset. Both our proposed PKAL and single FPMM approaches demonstrate their efficacy in handling snow occlusion, achieving remarkable gains of 34.5% and 16.3%, respectively. RandAug and AugMix also exhibit improvements of 4.0% and 8.3%, respectively, consistent with their performance on the ImageNet-RAIN dataset. For domain adaptation, DIRT-T and CDAN outperform other domain adaptation approaches, with improvements of 32.4% and 30.4%, respectively. With its de-snowing effect, DesnowNet also improves model robustness under snow occlusion, with a gain of 11.7% compared to the baseline.

Evaluation on Low Illuminance
We present all results on the ImageNet-DARK dataset in Table 3. Consistent with the previous results, our proposed PKAL and single FPMM approaches exhibit leading performance in low illuminance conditions, achieving gains of 32.6% and 23.5%, respectively, compared to the baseline. UDA methods also improve model robustness, with positive gains ranging from 22.3% to 33.1%. DIRT-T exhibits the largest improvement among these approaches. RandAug and AugMix improve by 4.0% and 6.5%, respectively, in low illuminance situations, similar to their performance in rain and snow occlusion. In addition to improving the visual effect, the positive margin of 15.3% demonstrates the effectiveness of Zero-DCE in low illuminance situations. Table 3 summarizes the results on the ImageNet-RAIN dataset. Our proposed PKAL approach achieves the best performance, improving by 33.4% compared to the baseline. The single FPMM approach also significantly improves model robustness against rain occlusion, with a gain of 19.4%. RandAug and AugMix both achieve moderate gains of 4.5% and 8.4%, respectively. For UDA methods, adversarial-based adaptation outperforms statistical feature alignment. CDAN and DAN both exhibit improvements of 30.7% and 25.8%, respectively. MPRNet achieves a remarkable gain of 29.5% through its de-raining effect.

The Benefit of PKAL
We validate the discriminative ability of the deep representations learned from our approach using t-Distributed Stochastic Neighbor Embedding (t-SNE) [62], a non-linear visualization for high-dimensional features. In Figure 5, we show the t-SNE visualizations of deep features from different methods for Web-FOG. While FPMM and UDA methods show a similar effect by separating easily identified categories such as people, tree, and bird, they still struggle to distinguish between several classes with similar semantic content such as car, bus, and truck, which remain close to each other. In contrast, the t-SNE result of PKAL demonstrates its ability to further increase the intra-class compactness and inter-class discrepancy of hard-to-classify examples in the deep feature space. We apply Grad-CAM to visualize the informative regions of our proposed approach by projecting learned deep representations back onto the raw pixels. Figure 6 shows that the baseline model learns salient features mainly on the background rather than on the object, resulting in inaccurate predictions due to the fog occlusion. On the other hand, the attention of FPMM successfully focuses on the object to be recognized, highlighting its informative features. Furthermore, our results suggest that deep features learned from PKAL can better reflect the informative content of the image.

The Effect of FPMM
In order to assess the effectiveness of FPMM in our approach, we utilize Corresponding Angle (CA) [63] to verify the alignment effect on the shallow features between the clean and corrupted data. We obtain the eigenvectors U k s of the k-th channel low-level feature from the clean image and U k t from the low-quality image through Singular Value Decomposition (SVD). The cosine value of CA can be computed as follows.
where u k s,i and u k t,i indicate the i-th eigenvector in U k s and U k t , respectively, with the i-th largest singular value. We improve the visualization of our results by computing 1 + cos(ψ k i ). This value is indicative of the feature similarity between the clean and lowquality domains, with a larger cosine similarity implying a better feature similarity. Our experiments involve computing the channel-wise cosine similarities for the top 10 CAs in the baseline model with and without the FPMM. Figure 7a demonstrates that the deep model with FPMM exhibits larger CA values compared to the baseline model without FPMM. These results indicate that the FPMM can help deep models learn domain-invariant shallow features from low-quality data. Tables 1-3 demonstrate the effectiveness of FPMM, which improves model performance on both clean and low-quality data. Figure 7b depicts the change in FPMM loss and testing accuracy during the training process. As the FPMM loss decreases, the testing accuracy on both clean and corrupted data increases, demonstrating the benefits of reducing prior knowledge-based feature discrepancy. Moreover, the FPMM method requires little training time since it updates only a few parameters of the shallow layers in the back-propagation process.

The Effect of MIRT
In this section, we explore the effect of MIRT on the norm of deep features, which has been shown to represent discriminability and transferability across domains [63,64]. We compare the L 2 feature norm of the final output of the feature generator G across different approaches. Figure 7c illustrates the L 2 feature norm in the clean domain (left) and low-quality domain (right). Notably, MIRT achieves a larger L 2 feature norm compared to other approaches, indicating the better adaptability of deep features learned from MIRT. Furthermore, Figure 7d demonstrates the change in mutual information estimate and classification accuracy during the training period. We observe a similar trend of performance improvement on the low-quality data as the mutual information estimate increases, verifying the benefits of the mutual information-based adversarial learning in MIRT.

Conclusions
In this study, we propose PKAL, a novel approach for visual recognition in poor visibility conditions. The PKAL method integrates FPMM, a feature matching module that reduces feature discrepancy between clean and low-quality domains, and MIRT, a robust learning strategy that refines discriminative semantic features and task-specific decision boundaries for low-quality data through adversarial learning based on mutual information. Our proposed approach is evaluated on five typical low visibility conditions, including visual blur, fog, rain, snow, and low illuminance. The experimental results demonstrate consistent performance gains across various low visibility conditions, underscoring the effectiveness of our approach.