Design of an FPGA-Based High-Quality Real-Time Autonomous Dehazing System

: Image dehazing, as a common solution to weather-related degradation, holds great promise for photography, computer vision, and remote sensing applications. Diverse approaches have been proposed throughout decades of development, and deep-learning-based methods are currently predominant. Despite excellent performance, such computationally intensive methods as these recent advances amount to overkill, because image dehazing is solely a preprocessing step. In this paper, we utilize an autonomous image dehazing algorithm to analyze a non-deep dehazing approach. After that, we present a corresponding FPGA design for high-quality real-time vision systems. We also conduct extensive experiments to verify the efﬁcacy of the proposed design across different facets. Finally, we introduce a method for synthesizing cloudy images (loosely referred to as hazy images) to facilitate future aerial surveillance research.


Introduction
Image acquisition, for example outdoor imaging and remote sensing, is highly problematic owing to numerous natural factors, notably bad weather conditions.Under these adverse effects, acquired images are subject to various types of degradation, ranging from color distortion to visibility reduction.Consequently, high-level computer vision algorithms-which generally assume clean input images-may incur a sharp drop in performance, creating great demand for visibility restoration, as can be seen by the rapid development of myriad algorithms for image dehazing, deraining, and desnowing over the past two decades.In this paper, we restrict the discussion to image dehazing because haze (or equivalently fog) appears to be more prevalent than rain and snow.Furthermore, as haze and cloud originate from atmospheric scattering and absorption, image dehazing algorithms also find applications in remote sensing.

Image Dehazing in Remote Sensing
Remote sensing applications such as aerial surveillance, battlefield monitoring, and resource management fundamentally impact on many aspects of modern society, including transportation, security, agriculture, and so on.Despite their crucial importance, these applications are prone to failure in areas of cloud cover, because light waves are subject to atmospheric scattering and absorption when traversing cloud banks.As a result, remotely sensed images become unfavorable for subsequent high-level applications, rendering image dehazing highly relevant for visibility restoration.
For example, Figure 1 demonstrates the negative effects of cloud and the beneficial effects of image dehazing on an aerial surveillance application.Specifically, Figure 1a is a clean image from the Aerial Image Dataset (AID) [1], and Figure 1b is its corresponding synthetic cloudy image.Cloud is synthesized herein due to the sheer impracticality of remotely sensing the same area in two different weather conditions.We will discuss synthetic cloud generation in detail in Section 4.2.2. Figure 1c is the result of dehazing Figure 1b using a recent algorithm developed by Cho et al. [2].The three images on the second row are the final outcomes of processing Figure 1a-c with a YOLOv4-based object recognition algorithm [3].In addition, it is noteworthy that the haziness degree evaluator (HDE) [4] serves as a basis for discriminating Figure 1a as a clean image.It can be observed that the recognition algorithm detected nine airplanes from the clean image in Figure 1a.In contrast, the number of detected airplanes in Figure 1e was significantly lower.The detection rate dropped 66.67% from nine to three detected airplanes.This observation implies that bad weather conditions such as cloud and haze have a negative impact on high-level remote sensing applications.
To address this problem, we preprocessed the synthetic cloudy image using a dehazing algorithm developed by Cho et al. [2].As Figure 1c shows, the visibility improved; however, the airplane under the dense veil of cloud remains obscured.The corresponding detection result in Figure 1f demonstrates a considerable increase (133.33%) in detection rate from three (in Figure 1e) to seven detected airplanes.This observation, in turn, implies the crucial importance of image dehazing in remote sensing applications.
However, another issue arises regarding whether to apply image dehazing, because cloud occurs occasionally, while most image dehazing algorithms assume a hazy/cloudy input.Obviously, dehazing a clean image results in untoward degradation, as Figure 2 demonstrates.Although the dehazed image in Figure 2b appears to be passable, without any noticeable distortion, its corresponding detection results in Figure 2d exhibit a sharp drop (66.67%) in detection rate from nine to three detected airplanes.The algorithm also misrecognized two airplanes as birds compared to only one misrecognition in Figure 2c.This example, coupled with the previous one, emphasizes the need for an autonomous image dehazing algorithm.

Real-Time Processing
Remotely sensed images usually possess high resolution, leading to a computationally heavy burden for subsequent algorithms.For example, the S-65A35 camera of the SAPPHIRE series, widely available on aerial surveillance systems, can deliver a superb resolution of 9344 × 7000 pixels at 35.00 frames per second (fps) [5].As a result, virtually every embedded surveillance system downscales the acquired image sequence to a reasonable size before supplying the sequence to other algorithms, for computational efficiency and to enable real-time processing.A good example of this is an aerial surveillance system known as ShuffleDet [6], which downscales the input image to a resolution of 512 × 512 to achieve a processing speed of 14.00 fps.
Regarding the implementation of image dehazing, the software implementation per se usually fails to meet the real-time processing requirement.To support this claim, we adopt Table 1 from Ngo et al. [7].The authors measured the processing time of nine algorithms [2,[7][8][9][10][11][12][13][14] whose source code is publicly available, for different image resolutions.The simulation environment in this study was MATLAB R2019a, and the host computer was equipped with an Intel Core i9-9900K (3.6 GHz) CPU, with 64 GB RAM, and an Nvidia TITAN RTX graphics computing unit (GPU).The run-time evaluation in Table 1 demonstrates that none of the nine algorithms could deliver real-time processing.Even with such a small resolution as 640 × 480, the fastest algorithm, developed by Zhu et al. [11], exhibited a processing speed of 4.55 fps (≈1/0.22),approximately one fifth of the required speed of 25.00 fps.
Hence, there are currently two main approaches toward real-time processing.The first approach aims to reduce the development time by focusing on flexibility, portability, and programming abstraction.Under this approach, the embedded system usually needs to be equipped with powerful computing platforms such as GPUs and low-power GPUs.In the previous example of ShuffleDet, Azimi [6] presented an implementation on the Nvidia Jetson TX2 board including a low-power GPU named Tegra X2 [15].Although this approach can meet the growing demand for high computing performance, it is not the best choice compared with field-programmable gate arrays (FPGAs), which are at the center of the second approach toward real-time processing.Wielage et al. [16] verified that a Xilinx Virtex UltraScale+ FPGA was 6.5× faster and consumed 4.3× less power than the Tegra X2 GPU, to support the preceding claim.For this reason, we present herein an FPGA implementation of an autonomous dehazing system for aerial surveillance.The first is attributed to self-calibration on haze conditions, which results from the utilization of the HDE.The second is achieved through a pipelined architecture for improving throughput and a number of design techniques for reducing propagation delay.The third is the desired result of simulating haze/cloud using the low-frequency parts of a random distribution, with the density of synthetic haze/cloud controlled by the HDE.Thus far, it can be observed that the HDE plays an essential role in the proposed system, and therein lies the cause of its limitations, as discussed later in Section 4.3.

Literature Review
Image dehazing is a fundamental problem in computer vision, and is rooted in studies on atmospheric scattering and absorption phenomena.As witnessed by the work of Vincent [17] and Chavez [18], early research on image dehazing started five decades ago.Through the long history of development, there have been various approaches to restoring the scene radiance.Polarimetric dehazing [19,20], image fusion [21,22], and image enhancement [7,10] are cases in point.It is also noteworthy that each approach has resulted in hundreds of papers, and therein lies the sheer impracticality of reviewing them all.Consequently, we focus our discussion on the single-image approach that relies on an acquired red-green-blue (RGB) image.
To facilitate understanding of the review, we first briefly formalize the image dehazing problem.Given a hazy RGB image I ∈ R H×W×3 of size H × W, the atmospheric scattering model (ASM) [23] decomposes it into two terms, known as the direct attenuation and the airlight, as Equation (1) shows.Herein, J ∈ R H×W×3 is the scene radiance, t ∈ [0, 1] H×W is the transmission map, A ∈ R 1×1×3 is the global atmospheric light, and x represents the spatial coordinates of pixels.Direct attenuation and airlight correspond to Jt and A(1 − t), respectively.The former signifies the multiplicative attenuation of reflected light waves in the transmission medium, while the latter represents the additive influence of the illumination.
Based on the ASM, most image dehazing algorithms develop two mapping functions f A : R H×W×3 → R 1×1×3 and f t : R H×W×3 → R H×W that estimate the global atmospheric light and the transmission map, given the input image I. Researchers usually denote these two estimates as Â and t, and they restore the scene radiance J by rearranging Equation (1) as follows: where a small positive t 0 helps avoid division by zero.Recently, deep learning models have also found an application in image dehazing.Some early models [13,14] also learned the mapping functions f A : R H×W×3 → R 1×1×3 and f t : R H×W×3 → R H×W , whereas recently developed models [24,25] learned an end-to-end mapping function f J : R H×W×3 → R H×W×3 .Although image dehazing is achievable in various ways, it is worth recalling that this astonishing operation is a preprocessing step, since this imposes strict requirements on its implementation.A crucial requirement is real-time processing, as discussed in Section 1.2.
According to a recent systematic review [26], image dehazing algorithms in the literature fall into three categories: image processing, machine learning, and deep learning.Table 2 summarizes essential information on each category, and we exemplify them by one or two representative methods in the following sections.

Image processing
Uses traditional computer vision techniques and only the input RGB image [7][8][9][10] Machine learning Uses machine learning techniques additionally to exploit the hidden regularities in relevant image datasets [11,12,27,28] Deep learning Uses deep neural networks with powerful representation capability to learn relevant mapping functions [13,14,24,25]

Representative Single-Image Dehazing Algorithms
The categorization in Table 2 based on the primary technique employed to restore the scene radiance and how the algorithm exploits image data can give an early indication of the real-time processing capability of an image dehazing method.Generally, the first two categories-image processing and machine learning-can handle the input image sequence or video in real time.Conversely, the third category, deep learning, suffers from some practical difficulties in achieving real-time processing.

Image Processing
Image dehazing methods founded on traditional computer vision techniques usually favor human perception [29] because they are rooted in hand-engineered image features such as contrast and saturation, which greatly influence the perceptual image quality.Perhaps the most well-known research in this category is the dark channel prior of He et al. [9], inspired by the dark-object subtraction method of Chavez [18].He et al. [9] developed f t : R H×W×3 → R H×W from the following two assumptions:

•
The scene radiance J exhibits an extremely dark channel whose intensities approach zero in non-sky patches; • The transmission map t is locally homogeneous.
The first is based on the colorfulness of objects, i.e., one of the color channels should be very low for the color to manifest itself.The second is based on the depth-dependent characteristic of the transmission map.Depth information is mostly smooth except at discontinuities in an image, and so is the transmission map.Mathematically, the equivalent expressions are: • min y∈Ω(x) {min c∈{R,G,B} [J c (y)]} = 0, where Ω(x) denotes an image patch centered at x, and c denotes a color channel; • min y∈Ω(x) [t(y)] = t(x).
A transmission map estimate resulting from these two assumptions suffers from block artifacts, rendering a refinement step essential.Accordingly, He et al. [9] utilized soft matting [30].Despite an excellent dehazing performance, the method of He et al. [9] has two main drawbacks: failures in sky regions and high computational cost.These shortcomings have resulted in a series of follow-up studies [31][32][33].
Regarding the mapping function f A : R H×W×3 → R 1×1×3 , He et al. [9] developed a robust approach that remains widely used.Under this approach, the top 0.1% of the brightest pixels in the dark channel of the input image serve as candidates for singling out the atmospheric light.From among these, the pixel with highest intensity in the RGB color space is chosen.Consequently, this approach is fairly robust against the problem of incorrectly selecting white objects as atmospheric light.

Machine Learning
As image dehazing methods from the first category are based on hand-engineered features, they may fail in particular circumstances.A prime example is the fact that the dark channel prior proposed by He et al. [9] does not hold for sky regions.Therefore, some hidden regularities learned from relevant image datasets can improve performance in those cases.
Zhu et al. [11] developed the color attenuation prior in that manner.Through extensive observations on outdoor images, they discovered that the scene depth correlated with saturation and brightness.They then assumed that a linear model sufficed for expressing that correlation and devised the simple expression f t : R H×W×3 → R H×W .Next, they utilized maximum likelihood estimates to find the model's parameters.The input data consisted of a synthetic dataset with haze-free and corresponding synthesized hazy images.The dehazing method of Zhu et al. [11] was relatively fast and efficient, as were the methods in some of the follow-up studies [28, 34,35].
Another notable approach is the learning framework proposed by Tang et al. [27].This framework comprises two main steps: feature extraction and transmission map inference.Tang et al. [27] implemented the former in a multi-scale manner, and they utilized random forest regression to realize the latter.Many deep learning models developed thereafter bear a fundamental similarity to this framework.Despite an excellent dehazing performance, the implementation of Tang et al. [27] incurs a heavy computational burden, hindering its broad application in practice.

Deep Learning
An early attempt at applying deep learning models to image dehazing can be traced back to the DehazeNet developed by Cai et al. [13].They adopted a similar approach to that of He et al. [9] to devise the mapping function f A : R H×W×3 → R 1×1×3 .To estimate the transmission map, they utilized a convolutional neural network (CNN).The CNN's functionality is similar to that of the learning framework of Tang et al. [27].The main steps include: (i) feature extraction, (ii) feature augmentation, and (iii) transmission map inference, corresponding to: (i) feature extraction and multi-scale mapping, (ii) local extrema, and (iii) the nonlinear regression presented by Cai et al. [13].
Recently, end-to-end networks that learn the mapping function f J : R H×W×3 → R H×W×3 have been gaining popularity.These networks are usually based on the encoder- decoder architecture, which has been proven to be highly efficient due to its astonishing ability to learn a robust representation of image features from a low to a high level of abstraction.The FAMED-Net approach developed by Zhang and Tao [24] is a prime example.FAMED-Net is a densely connected CNN whose architecture is designed based upon multi-scale encoders and image fusion.It is also one of a few deep models that can fulfill the real-time processing requirement.Zhang and Tao [24] realized FAMED-Net using a powerful Nvidia Titan Xp, yielding a processing speed of 35.00 fps on 620 × 460 image resolution.

Summary
Image dehazing has long development history and dates back to the early 1970s.As a result, hundreds of studies have been recorded in the literature.However, it is fortunately unnecessary to review all of them.A recent systematic review [26] collated information from influential studies and categorized the results into image processing, machine learning, and deep learning approaches.This categorization can serve as an early indication of the real-time processing capability of image dehazing algorithms.The first two categories are generally capable, whereas the last one rarely is.
Moreover, most image dehazing methods assume a clean input image, but this assumption is uncertain in practice, rendering an autonomous dehazing method highly relevant.Therefore, we present herein an FPGA-based autonomous dehazing system to fulfill the aforementioned requirements: real-time processing and autonomy.

Autonomous Dehazing System
To achieve autonomous dehazing, it is necessary to answer the following questions: • How can the haze condition be determined from a single input image?• How can an input image be dehazed according to its haze condition?
Regarding the first question, a practical solution is to use a metric such as the HDE.This no-reference metric proportionally quantifies the haze density of the input image and can be considered as the following mapping function f HDE : R H×W×3 → R. Because the HDE yields a normalized score between zero and unity, it is highly appropriate for controlling the dehazing process.Hence, an elegant answer to the second question is to exploit the HDE score to adjust the dehazing power in proportion to the haze condition of the input image.
This idea is the underlying principle of the autonomous dehazing algorithm in [7], which fails to meet the real-time processing requirement, as Table 1 demonstrates.Based on this algorithm, the following first introduces the autonomous dehazing process and then discusses major real-time processing hindrances.After that, Section 3.2 describes in detail the proposed FPGA implementation for surmounting those hindrances, enabling real-time processing for even high-quality (DCI 4K) images.

Base Algorithm
Figure 3 illustrates the main steps constituting the autonomous dehazing algorithm, which accepts and handles arbitrary images.The fundamental idea is to combine the input image with its corresponding dehazed result according to the HDE score.More specifically, the algorithm first senses the haze condition of the input image and then adjusts the dehazing power correspondingly.If the condition is haze-free, the dehazing power becomes zero to keep the input image intact, because it is unnecessary to dehaze a haze-free image.Otherwise, the dehazing power varies in proportion to the sensed haze condition (thin, moderate, or dense haze).This haze-condition-appropriate processing scheme is robust against image distortion caused by excessive dehazing, as the evaluation results in [7] demonstrated.
According to [4], Equation (3) gives the HDE score ρ I of an RGB image I, where Ψ represents the whole image domain, and hence the representation |Ψ| denotes the total number of pixels.The variable B keeps Equation (3) from growing too lengthy; its expression is given in Equation ( 4), where κ is a user-defined parameter that was set to −1 in [7], I mc is the difference between two extremum channels, and σ I is the standard deviation of the image luminance.Finally, I mΩ and Â denote the dark channel and the global atmospheric light estimate discussed earlier in Section 2.
Based on the HDE score ρ I , the self-calibrating factor calculation block utilizes four additional user-defined parameters (ρ 1 , ρ 2 , α, and θ) to compute a weighting factor for image blending and adaptive tone remapping blocks.The self-calibrating factor calculation follows Equations ( 7) and (8).
where ω weights the contribution of the input image I in the image blending block, and ρI is a result of applying the mapping function f : R → R, Provided that J is the dehazed result of I, Equation ( 9) shows the restored image R, which is the output of the adaptive tone remapping block.This post-processing block first enhances the luminance and then emphasizes the chrominance accordingly, lest color distortion occurs.Equation ( 9) displays this as P ω {•} to imply that it is also guided by the self-calibrating factor.
The algorithm in [7] computes the dehazed result J based on multi-scale image fusion.This image dehazing approach belongs to the image processing category and is based on underexposure.Because this phenomenon occurs when inadequate incoming light hits the camera sensor, a postulation exists in the literature that underexposure can alleviate the negative effects of atmospheric scattering and absorption [36].Therefore, fusing images at different exposure degrees is analogous to image dehazing.Additionally, for adapting this idea to the single-image approach, researchers have widely utilized gamma correction to artificially underexpose an input image.Readers interested in a detailed treatment of this dehazing approach are referred to [7,36].Meanwhile, Algorithm 1 below provides a corresponding formal description.

Algorithm 1 Multi-scale image dehazing
Input: An RGB image I ∈ R H×W×3 , the number of artificially underexposed images K ∈ Z + 0 and corresponding gamma values (a) First scale: (a) Last scale: Fuse Laplacian pyramid: (a) Temporary results: The input data for multi-scale image dehazing include an RGB image I ∈ R H×W×3 of size H × W, a number of artificially underexposed images K ∈ Z + 0 and the corresponding gamma values After that, there follows the computation of Laplacian and guidance pyramids ({L k n } and {G k n }).It is noteworthy that Algorithm 1 computes the guidance pyramid according to the dark channel prior [9], due to its strong correlation with haze density.Before performing multi-scale fusion, it is essential to normalize the guidance pyramid to prevent the out-of-range problem.Finally, the fifth step demonstrates multi-scale fusion, beginning at the last scale and finishing at the first, whose result is the restored image J. Figure 4 depicts an example where K = 3 and N = 3. Substituting the restored image J into Equation ( 9) yields the final result R. Despite the excellent performance, the autonomous dehazing algorithm in [7] fails to deliver real-time processing, as shown by the run-time comparison in Table 1.A major reason is the multi-scale fusion scheme, because this algorithm sets N = log 2 [min(H, W)] .This setting is beneficial to the restored image's quality, but it carries a heavy burden of memory, thus prolonging the processing time.The problem worsens from the perspective of hardware implementation because multi-scale fusion requires multiple frame buffers for upsampling and downsampling.
Furthermore, the minimum filtering operation is also at the root of the failure to achieve real-time processing.From the perspective of software implementation, the ideal complexity of filtering operations is O(H × W), which comprises two for loops to filter an H × W image. Consequently, the processing time increases in proportion to the image size, hindering high-quality real-time processing.The following presents an FPGA implementation where the computing capability suffices for handling DCI 4K images in real-time to surmount the aforementioned challenges.

FPGA Implementation
The challenges of improving computing performance are rooted in software implementation, and parallelization is often a practical solution.In parallel computing, a task divides into several sub-tasks, which central processors can execute independently, combining the results upon completion.For example, Figure 5 illustrates a naive parallelization of the autonomous dehazing algorithm discussed above, in which multi-scale image dehazing and a haziness degree evaluator occur simultaneously.In contrast, self-calibrating factor calculation, image blending, and adaptive tone remapping are dependent and thus occur sequentially.This computation flow consists of four stages, and the first accounts for most of the heavy computations.Accordingly, we assume that it is responsible for nine tenths of the entire algorithm, which, fortunately, supports parallelization.Following Amdahl's law [37], it is theoretically possible to achieve at most a 10× speedup in processing time [=1/(1 − 0.9)].The run-time comparison results in Table 1 demonstrate that it took 0.65 s to handle a 640 × 480 image.Hence, even if we apply parallelization with the maximum 10× speedup, the corresponding processing speed of 15.38 fps (≈1/0.065)would still be less than required.Consequently, FPGA implementation is essential for real-time processing, and the following play key roles in the proposed design.

Pipelined Architecture
Figure 6 illustrates the pipelined architecture for a real-time FPGA implementation of the base algorithm.The three primary components are the main logic, arithmetic macros, and memories.The first realizes the computation flow depicted in Figure 5, in which computation-intensive operations (such as multiplication, division, and taking square roots) are offloaded onto the second.Meanwhile, the third is analogous to a cache, consisting of SPRAMs for the temporary storage of data.
Input data include an RGB image I and timing signals, namely, clock, reset, and horizontal and vertical active video (denoted as clk, rstb, hav, and vav in Figure 6).The image I simultaneously undergoes the following three blocks: stalling, single-scale image dehazing, and haziness degree evaluator.It is noteworthy that single-scale image dehazing is a special case of Algorithm 1 where N = 1 and K = 5.We restricted the proposed FPGA implementation to single-scale dehazing to circumvent the heavy burden of frame buffers.In addition, to avoid race conditions when combining the input I and its dehazed result J, we utilized stalling to delay I until J is available.After that, image blending combines I and J to produce the blended image B, which, in turn, undergoes adaptive tone remapping for luminance enhancement and chrominance emphasis.The proposed FPGA implementation then outputs the restored image R, together with its corresponding horizontal and vertical active video signals.
As briefly mentioned, arithmetic macros are responsible for heavy computations.Thus, the design of all modules in the main logic becomes straightforward because they only account for lightweight operations (such as addition, subtraction, and data routing).However, to avoid digression, we set out the discussion of arithmetic macros in Appendices A and B, except for split multipliers.These circuits are aimed at reducing the propagation delay of large multiplications, and we explain their operation principle later in Section 3.2.3.
Regarding the haziness degree evaluator, Equation (3) demonstrates that its calculation involves global average pooling.Therefore, we exploited the high similarity between video frames to design this block.As a result, its output ρ I becomes available during the vertical blank period, and the calculation of the self-calibrating factor ω takes place immediately thereafter.Hence, the ω value of a frame self-calibrates the next frame, thus enabling real-time processing of video data.Meanwhile, for processing still images, the proposed FPGA implementation needs a rerun to correctly self-calibrate the image blending and adaptive tone remapping blocks.To implement this hardware architecture, we utilized the Verilog hardware description language (IEEE Standard 1364-2005) [38] and register-transfer level (RTL) design abstraction.The former supports generality, portability, and plug-and-play capability, while the latter eases the hardware design burdens.For example, as the RTL methodology focuses on modeling the signal flow, it is simple and convenient to describe all modules in the main logic following the description in Section 3.1.In particular, the plug-and-play capability allows reuse of existing RTL designs, and the adaptive tone remapping is a case in point.Cho et al. [39] implemented and packaged this module as intellectual property, facilitating its integration into the proposed implementation.
The pipelined architecture in Figure 6 improves the system's throughput, whereas the processing speed depends on the propagation delay of combinational logic circuits (CLCs).Accordingly, the following describes two techniques for reducing the propagation delay: • Fixed-point design for minimizing the signal's word length to reduce the size of CLCs; • Split multiplying for breaking large multiplications (represented by a large CLC) into smaller ones and inserting pipeline registers (PRs) between them, thus reducing propagation delay.

Fixed-Point Design
Fixed-point representation is a concept in computing that represents fractional numbers using only a fixed number of digits.Consequently, it sacrifices accuracy to reduce the representational burden.The fixed-point representation Q f of a real number Q is given below, where U denotes the number of fractional digits (or fractional bits when dealing with binary numbers).
Fixed-point design refers to a method of finding the optimal fixed-point representation of all system signals, and an error tolerance ∆ is a prerequisite for that purpose.Specifically, given Q, its integer part determines the number of integer bits.Meanwhile, the absolute difference |Q f − Q • 2 U | is compared with ∆ to determine and adjust the number of fractional bits.Herein, given the eight-bit input image data, we determined the word length of the signals in Figure 6 based on an error tolerance of ±1 least significant bit.The results were {12, 13, 13, 12, 12} bits for {J, ρ I , ω, B, R}, respectively.

Customized Split Multiplier
Split multiplying is analogous to the grid method that is often taught at primary school.Under this approach, the S M -bit multiplicand M and the S E -bit multiplier E arbitrarily divide into The product P can then be expressed as follows: Hence, a large multiplication M • E divides into four smaller ones: By inserting four additional PRs to store the results of these multiplications, the latency increases by one clock cycle.However, the propagation delay incurred for computing each of and M 2 E 2 is significantly smaller than that for computing the original multiplication M • E.
As described thus far, the proposed FPGA implementation is the final result of a sophisticated design process.We adopted pipelining and fixed-point design to improve the throughput and processing speed, respectively.In addition, we also utilized split multiplying to break large multiplications into smaller ones, further reducing the propagation delay until achieving real-time processing for DCI 4K resolution.

Evaluation
This section provides the hardware implementation results and compares the proposed FPGA implementation with existing benchmark designs to verify its efficacy.A performance evaluation then follows to demonstrate the autonomous dehazing capability on outdoor and aerial images.

Implementation Results
Table 3 summarizes the implementation results of the proposed autonomous dehazing system.Given the total hardware resources available in the mid-size FPGA device mentioned above, less than one third was required to realize the proposed system.More precisely, it took 53,216 slice registers, 49,799 slice look-up tables (LUTs), 45 RAM36E1s, and 22 RAM18E1s out of the corresponding 437,200, 218,600, 545, and 1090.The minimum period reported in Table 3 is equivalent to the maximum propagation delay among all CLCs of the system.This specifies the minimum interval at which the system produces new output data; thus, its reciprocal is the maximum frequency.As reported, the proposed system can handle at most 271.37 Mpixels per second.
Let f max denote that maximum frequency.Then, the following equation demonstrates the calculation of maximum processing speed (MPS) in fps.

MPS
where H and W are the image height and width, and B ver and B hor denote the vertical and horizontal blank periods.Herein, the three variables f max , B ver , and B hor were design-dependent.Accordingly, if hardware designers fail to consider the blank periods, a design with an impressive f max may deliver a slow MPS.In this study, we implemented the proposed system to operate correctly with minimum periods of one clock cycle (B hor = 1) and one image line (B ver = 1).Table 4 summarizes the MPS values for different image resolutions, ranging from Full HD to DCI 4K.Thus, the proposed FPGA implementation can handle DCI 4K images/videos at 30.65 fps, which satisfies the real-time processing requirement.In the literature on image dehazing, a few real-time implementations exist, and those developed by Park and Kim [43] and Ngo et al. [35,42] are cases in point.The first design realizes the well-known algorithm of He et al. [9], in which Park and Kim [43] improve the atmospheric light estimation for video processing.The second design [42] improves the dehazing method of Tarel and Hautiere [8] by devising an excellent edge-preserving smoothing filter to replace the standard median one.Finally, the third design [35] is an improved version of the method of Zhu et al. [11].It has remedied several visually unpleasant problems such as background noise, color distortion, and post-dehazing false enlargement of bright objects.
Table 5 below summarizes the implementation results of the four designs.A conspicuous observation is that the proposed autonomous dehazing system requires the least hardware resources.Despite its compact size, its processing speed is virtually the same as the fastest implementation in [35].Finally, the proposed system is equipped with the unique feature of autonomous dehazing, as demonstrated in the following.

Performance
This section evaluates the dehazing performance of the proposed system against five state-of-the-art methods, including those proposed by He et al. [9], Zhu et al. [11], Cai et al. [13], Berman et al. [12], and Cho et al. [2].The evaluation is performed on two types of images-outdoor and aerial-to demonstrate the breadth of applications of the proposed system.An essential difference between these two is the area of inspection.Outdoor images depict an area close to the camera, and they serve as data for understanding the environment within which the camera operates.In contrast, aerial images depict a larger inspection area, and they serve as data for monitoring a changing situation.

Outdoor Images
Because the aforementioned methods usually deliver satisfactory performance, images demonstrated hereinafter are those for which dehazing-related artifacts are easily noticeable.Figure 7 shows four representative outdoor images and the corresponding results of applying six dehazing methods in which the haze condition is determined based on the HDE score.Following [7], we adopt two thresholds {ρ 1 , ρ 2 } = {0.8811,0.9344} to discriminate the haze condition.Let ρ I be the input image's HDE score.Then, its haze condition is one of the following: It emerges from Figure 7 is that the five benchmark methods could not handle haze-free images correctly, as can be seen by the severe color distortion (dark-blue sky), except for the method of Cai et al. [13], where it can be seen that the powerful CNN is versatile enough to adapt to various haze conditions.However, slight degradation is noticeable in the near-field plants.The proposed system, in contrast, successfully discriminates this image as haze-free and zeroes the dehazing power through ω = 1 in Equation ( 9).Consequently, it leaves the haze-free image intact and thus free of any visually unpleasant artifacts.
In addition, except for the deep CNN of Cai et al. [13], the benchmark methods exhibit post-dehazing artifacts in thin, moderate, and dense haze.Their dehazing power is too strong and not well adapted to the local content of images, as can be seen in the excess haze removal in the upper half and the persistence of haze in the lower half.For the same reason as that mentioned above, the results of Cai et al. [13] demonstrate a less severe problem.The proposed system takes a step forward and displays more satisfactory results than the benchmark methods.It automatically adjusts the dehazing power lest excess haze removal occurs.This desirable behavior is attributed to the elegant use of HDE scores to guide the image blending and adaptive tone remapping blocks.Furthermore, we utilized three full-reference metrics, namely, mean squared error (MSE), structural similarity (SSIM) [44], and feature similarity extended to color images (FSIMc) [45] to assess the dehazing performance quantitatively.In these three metrics, the smaller the MSE the better, whereas the opposite applies to SSIM and FSIMc.In addition, as these are full-reference metrics, we employed the following fully annotated datasets: FRIDA2 [46], D-HAZY [47], O-HAZE [48], I-HAZE [49], and Dense-Haze [50].FRIDA2 consists of 66 graphics-generated images of road scenes, based on which Tarel et al. [46] synthesized four hazy image groups (in total, 66 haze-free and 264 hazy images).Similarly, D-HAZY is composed of 1472 indoor images whose corresponding hazy images are synthesized with scene depths captured by a Microsoft Kinect camera.In contrast, O-HAZE, I-HAZE, and Dense-Haze comprise 45, 30, and 55 pairs of real hazy/haze-free images depicting indoor, outdoor, and both indoor and outdoor scenes, respectively.Another facet to consider is that input images to a dehazing system are not necessarily hazy.Hence, we employed both the haze-free and hazy images of those datasets and an additional 500IMG dataset [35] consisting of 500 haze-free images collected in our previous work.
Table 6 summarizes the quantitative evaluation results, where we boldface the top three results in red, green, and blue, respectively, for ease of interpretation.Thus, it is clearly seen that the proposed system demonstrates the best performance regardless of haze conditions.In particular, it attains virtually perfect scores for haze-free images, attributed to the excellent performance of HDE in haze condition discrimination.In addition, even the results on hazy images per se show a clear gap between this and the second-best method.
Overall, the methods of He et al. [9] and Cai et al. [13] share the following two positions.Table 6 shows that the former is situational.On the one hand, it exhibits the top scores on D-HAZY due to its well-known excellence in indoor dehazing.On the other hand, its inherent failure to handle sky regions results in poor performance on FRIDA2.Conversely, the latter is versatile as it performs relatively well on all datasets.It is also noteworthy that SSIM does not account for the chrominance information; hence, the method of He et al. [9] is ranked second overall under this metric.However, under FSIMc, which accounts for chrominance, the DehazeNet of Cai et al. [13] is ranked second, consistently with the qualitative evaluation results in Figure 7.
The remaining three methods of Berman et al. [12], Cho et al. [2], and Zhu et al. [11] occupy the last three positions.Quantitative results on Dense-Haze demonstrate that the two methods of Berman et al. [12] and Cho et al. [2] are effective for haze removal.However, as the qualitative evaluation shows, they are susceptible to severe post-dehazing artifacts.The method of Zhu et al. [11] suffers from several problems such as color distortion and background noise (as pointed out by Ngo et al. [34]), resulting in its poor performance.

Aerial Images
In the aerial surveillance literature, no real datasets exist comprising pairs of hazy (or cloudy) images and their corresponding ground-truth reference.This is due to the sheer impracticality of capturing the same area under different weather conditions.Therefore, we propose a method to synthesize hazy images for evaluating image dehazing algorithms in aerial surveillance.
According to Equation (1), the global atmospheric light A and transmission map t are prerequisites for hazy image synthesis.As A remains constant across the entire image domain, it is a common practice to derive A from the uniform distribution.In contrast, synthesizing t is a difficult task.On the one hand, Zhu et al. [11] proposed creating a pixel-wise random transmission map whose values were uniformly distributed.On the other hand, Jiang et al. [28] added a constant haze layer to a clean image by utilizing a scene-wise random transmission map.These two approaches are unrealistic because they do not reflect the true distribution of haze.To address this problem, we propose synthesizing haze/cloud as a set of low-frequency randomly distributed values, as shown in Algorithm 2.   Using the random haze/cloud distribution discussed above, we synthesized hazy/cloudy images from their clean counterparts based on Equation (1), as shown in Al-gorithm 3.For customization, we exploited the HDE [4] to guide the generation to arrive at an image that possessed a desirable HDE score.In Algorithm 3, the haze density control D ρ ∈ R + 0 and its step δ are responsible for varying the haze density to meet the predetermined HDE score.In addition, to help to avoid the generation of an infinite loop, we adopted the HDE tolerance ∆ ρ and a maximum number of iterations M I .An example of this synthetic hazy/cloudy image generation is shown in Figure 1b.
Figures 9 and 10 demonstrate the dehazing performance of the proposed system and the benchmark methods on synthetic aerial hazy images, where their corresponding hazefree images are from AID [1].As with the assessment of outdoor images, the benchmark methods suffered from color distortion and halo artifacts, causing a marked difference between their results and the corresponding haze-free reference at the top left.Table 7 summarizes the MSE, SSIM, and FSIMc scores on synthetic aerial images in Figures 9 and 10.It can be observed that the proposed system shares the top performance with the two methods of Cai et al. [13] and He et al. [9].More specifically, its performance is within the top two for images with thin and moderate haze as well as for haze-free images.However, for densely hazy images, the performance is slightly worse than that of the aforementioned two benchmark methods.This is due to the fact that the benchmark methods often suffer from severe color distortion in the sky, whereas aerial images generally cover territorial areas.Therefore, the reduced performance for aerial images with dense haze is explicable.
Finally, we assessed the performance of a YOLOv4-based high-level object recognition algorithm (mentioned in Section 1.1) on the dehazed results depicted in Figure 9. Table 8 summarizes the detection results, while Figure 11 illustrates them visually.The term Failure in Table 8 denotes the number of incorrectly detected objects.It is also noteworthy that the detection results reported in the table were aggregated based on the confidence level.The results for the method of Zhu et al. [11] for a moderately hazy image in Figure 11 can be taken as an example.The recognition algorithm yielded two detection results for the airplane near the center of the image: bird with 40% confidence and airplane with 31% confidence.Therefore, the final result for that airplane was the label with the higher confidence level, i.e., bird.Obviously, the algorithm incurred a Failure in this case, and the underlying reason was probably color distortion occurring due to excess haze removal.
Based on Table 8 and Figure 11, the proposed system is clearly superior to the benchmark methods because it does not cause any additional Failures compared with the input image.Two Failures for haze-free and thin haze images are inherent in the input image itself.In contrast, the benchmark methods are prone to excess haze removal, and therein lies the cause of many Failures.

Conclusions
This paper presented an FPGA-based autonomous dehazing system that could handle real-time DCI 4K images/videos.Starting from the position that the currently predominant deep approach represented overkill, we analyzed a non-deep approach for autonomous image dehazing.Under this approach, the fundamental idea was to combine the input image and its dehazed result according to the haze condition.After that, we adopted pipelining, fixed-point design, and split multiplying to devise a 4K-capable FPGA implementation.We then conducted a comparative evaluation with other benchmark hardware designs to verify its efficacy.In addition, we presented a performance evaluation on outdoor and aerial images to demonstrate its effectiveness in various circumstances, rendering the proposed implementation highly relevant to real-life systems (such as autonomous driving vehicles and aerial surveillance).
Furthermore, we pointed out two inherent limitations of the proposed system: handling haze-free images with a broad and homogeneous background and handling hazy night-time images.Since the adopted HDE discriminated the haze condition of these images incorrectly, the self-calibration feature did not function as intended.Such limitations notwithstanding, the proposed system is deemed to be reliable due to the HDE's high reliability for haze condition discrimination.

Appendix A
This appendix discusses the design of serial and parallel dividers in arithmetic macros.Figure A1 depicts the datapath and state machine for realizing the former type, which is appropriate for dividing user-defined parameters.The datapath consists of three main registers: the (M + N)-bit holder, N-bit divisor, and Q-bit quotient.There is also an implicit counter to signify the completion of division.Upon the transition from IDLE to OPERATION, the holder is loaded with an M-bit dividend at the least significant positions and zero-padded to (M + N) bits.
According to the state machine, the operation is relatively straightforward.Upon reset, the serial divider is in the IDLE state.When the start signal occurs, this changes to the OPERATION state, and loads the dividend and divisor into holder and divisor registers.In this state, if the divisor is equal to zero, the divider changes to the ERROR state and produces a flag to signify division by zero.After that, it returns to the IDLE state.Otherwise, it starts the implicit counter and compares the divisor to every bit of the dividend, beginning with the most significant bit and proceeding according to the comparison result.It also generates quotient bits, and shifts them to the quotient register at the least significant position.When the quotient register captures all Q bits, the counter produces a signal to trigger a transition to the DONE state.The divider then returns to the IDLE state and waits for the next call.

Figure 1 .
Figure 1.Illustration of the negative effects of cloud and beneficial effects of image dehazing on an aerial surveillance application.First row: (a) a clean image and its corresponding (b) synthetic cloudy image and (c) dehazed result.Second row: (d-f) results obtained after processing (a-c) using a YOLOv4-based high-level object recognition algorithm.Notes: cyan labels represent airplanes, and navy-blue labels represent birds.

Figure 2 .
Figure 2. Illustration of the negative effects of image dehazing on an aerial surveillance application when the input image is clean.(a,b) A clean image and its corresponding dehazed result.(c,d) Detection results obtained after processing of (a,b) by a YOLOv4-based high-level object recognition algorithm.Notes: (a,c) were adopted from Figure 1a,d.Cyan labels represent airplanes, and navy-blue labels represent birds.

1 :
and the number of scales N ∈ Z + 0 , N ≤ log 2 [min(H, W)] Output: The restored image J ∈ R H×W×3 Auxiliary functions: u 2 (•) and d 2 (•) denote upsampling and downsampling by a factor of two BEGIN Create input pyramid:

Figure 4 .
Figure 4. Illustration of the multi-scale image dehazing in Algorithm 1 with K = 3 and N = 3.

Figure 5 .
Figure 5. Illustration of a naive parallelization of the autonomous dehazing algorithm.

Figure 6 .
Figure 6.Pipelined architecture of the proposed FPGA implementation.

Table 1 .
Processing time in seconds of different image dehazing methods for different image resolutions.
•An FPGA-based implementation of an autonomous dehazing algorithm that can satisfactorily handle high-quality clean and hazy/cloudy images in real time.•Anin-depthdiscussion of FPGA implementation techniques to achieve real-time processing on high-resolution images (DCI 4K in particular).•Anefficient method for synthesizing cloudy images from a clean dataset (AID).

Table 2 .
Summary of image dehazing categories.
and the number of scales N ∈ Z + 0 .The representation Z + 0 denotes a set of non-negative integers, and thus k ∈ Z + 0 ∩ [1, K] means that k is a non-negative integer lying between 1 and K. Based on the image size, N must be smaller than its maximum value of log 2 [min(H, W)] .Two auxiliary functions u 2 (•) and d 2 (•) denote upsampling and downsampling by a factor of two.The first step is to create an input pyramid {I k n

Table 3 .
Hardware implementation results for the proposed autonomous dehazing system.LUT stands for look-up table, and the symbol # denotes quantities.

Table 4 .
Maximum processing speeds in frames per second for different image resolutions.The symbol # denotes quantities.

Table 5 .
Comparison with existing benchmark designs.The symbol # denotes quantities.

Table 6 .
Average mean squared error (MSE), structural similarity (SSIM), and feature similarity extended to color images (FSIMc) scores on different datasets.Top three results are boldfaced in red, green, and blue.Image size H, W ∈ Z + 0 and cut-off frequency F c ∈ [0, π] Output: Transmission map t ∈ [0, 1] H×W Auxiliary functions: N (H, W) generates a H × W image of random Gaussian noise, {F (•), I(•)} denote forward and inverse Fourier transforms, and L(X, F c ) denotes low-pass filtering the image X with the cut-off frequency F c

Table 7 .
Average MSE, SSIM, and FSIMc scores on synthetic aerial hazy images.Top three results for each image are boldfaced in red, green, and blue.