Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention

Zeng, Xianfeng; Ai, Wenji; Liu, Zongchao; Wang, Xianling

doi:10.3390/buildings15132150

Open AccessArticle

Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention

¹

College of Transportation and Logistics, Guangzhou Railway Polytechnic, Guangzhou 511300, China

²

School of Built Environment Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(13), 2150; https://doi.org/10.3390/buildings15132150

Submission received: 23 May 2025 / Revised: 17 June 2025 / Accepted: 19 June 2025 / Published: 20 June 2025

(This article belongs to the Special Issue Advances in Building Structure Analysis and Health Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Accurate visual inspection of underwater infrastructure, such as bridge piers and retaining walls, is often hindered by severe image degradation due to light attenuation and scattering. This paper introduces an unsupervised enhancement framework tailored for restoring underwater images containing structural cracks. The method combines a physical modeling of underwater light transmission with a deep image translation architecture that operates without requiring paired training samples. To address the loss of fine structural details, this paper incorporates a multi-scale feature integration module and a region-focused discriminator that jointly guide the enhancement process. Moreover, a physics-guided loss formulation is designed to promote optical consistency and texture fidelity during training. The proposed approach is validated on a real-world dataset collected from submerged structures under varying turbidity and illumination levels. Both objective evaluations and visual results show substantial improvements over baseline models, with better preservation of crack boundaries and overall visual quality. This work provides a robust solution for preprocessing underwater imagery in structural inspection tasks.

Keywords:

underwater crack inspection; image restoration; physics-guided learning; multi-scale features; unsupervised deep networks

1. Introduction

Underwater infrastructure inspection plays a vital role in ensuring the safety and longevity of marine and civil engineering structures, such as bridge piers, offshore platforms, and dams [1]. Cracks are one of the most common and critical forms of underwater structural damage, as they may indicate material degradation, corrosion, or structural failure [2]. However, due to the challenging underwater environment, characterized by light attenuation, scattering, and color distortion, captured images often suffer from low contrast, blurring, and severe color degradation [3,4,5,6]. These issues not only hinder visual interpretation but also affect the accuracy of automated defect detection systems [7,8,9].

Conventional image enhancement methods, such as histogram equalization [10] or Retinex-based approaches [11,12,13], often fail to adapt to the complex and variable underwater conditions. Recent advances in deep learning have enabled data-driven enhancement techniques [14], including convolutional neural networks (CNNs) and generative adversarial networks (GANs), to improve underwater image quality. In recent years, underwater image enhancement methods based on CNNs have shown significant advantages. Unlike traditional imaging models that rely on complex parameter estimation, end-to-end training frameworks represented by UWCNN [15], UD Net [16], and UIR Net [17] achieve automatic extraction of multi-level representations through feature mapping learning from reference images. To improve model performance, scholars have proposed various innovative architectures: Lyu and team [18] designed a lightweight network, combined residual group and channel attention mechanism to complete feature extraction, and optimized brightness in YUV color space to enhance contrast; Wu et al. [19] proposed a two-stage enhancement framework, which decomposes an image into high and low frequency components through discrete cosine transform, and directly enhances high-frequency information using CNN, as well as a low-frequency color correction strategy based on joint component map estimation. The UIEC²-Net developed by Wang et al. [20] innovatively integrates RGB/HSV dual color gamut features, and optimizes visual quality through pixel level enhancement, global brightness saturation adjustment, and attention fusion modules. The perception-driven dehazing network constructed by Li’s research group [21] adopts a dual network architecture, in which the refined network optimized for multi-objective loss effectively improves the color restoration effect. Despite significant progress, existing CNN methods still have clear limitations. Due to the difficulty in obtaining accurately aligned degraded clear image pairs in underwater environments, the strong dependence of the model on high-quality paired data severely restricts its generalization ability in practical scenarios.

The underwater image enhancement technology based on adversarial generative networks breaks through the dependence on paired data through unsupervised domain mapping learning. The GAN framework generates high-quality enhancement results without the need for supervised signals by establishing an adversarial mapping relationship between the degraded image domain and the clear image domain [22]. On this basis, a series of innovative architectures have emerged successively: Jiang et al. [23] constructed a perception-driven enhancement network that integrates natural image prior constraints and a deep neural network quality ranking mechanism to jointly optimize perception indicators such as brightness and contrast. The Liu team [24] proposed a multi-expert learning model that implements independent feature extraction for the differential attenuation characteristics of RGB channels. Through cross-channel fusion, the decoder is guided to achieve collaborative optimization of color correction and detail preservation. It is worth noting that the UW-GAN designed by Hambarde et al. [25] adopts a cascaded deep network architecture and embeds spatial-channel dual attention modules in single image depth estimation, significantly improving depth prediction accuracy and supporting the enhancement effect. The current research also shows a trend of multidimensional technology integration: the multi-scale dense GAN developed by Guo’s research group [26] integrates residual multi-scale dense blocks and spectral normalization technology, effectively enhancing the color space conversion ability. The cross domain adversarial mechanism innovated by Li et al. [27] improve image contrast and color richness through a dual-channel discriminator and chromaticity distance loss function. The Sea-Pix-GAN proposed by Chaurasia [28] creatively integrates three modules: color correction, contrast enhancement, and style transfer, achieving significant breakthroughs in visual presentation effects.

Although CNNs have shown advantages in underwater image enhancement, their practical applications are still limited by data bottlenecks. CNN models typically rely on large-scale paired datasets for training [29]; however, the scarcity of high-quality reference images in underwater environments leads to high data acquisition costs, severely limiting the deployment efficiency of the model. In this context, GANs stand out as unsupervised learning frameworks, achieving breakthroughs in enhanced performance through image domain transfer modeling [30] without the need for paired data. However, it should be noted that existing GAN-based methods mostly focus on global chromaticity mapping and contrast optimization, and have insufficient explicit modeling capabilities for fine-grained structures, such as cracks. The technical challenges brought by the complex underwater imaging environment mainly include the following: (1) blurred crack boundaries caused by light scattering effects and suspended particle interference, and significantly reduced target background contrast; (2) The variation in imaging depth and shooting angle causes uneven illumination and local contrast fluctuations, resulting in abnormal visual characteristics of cracks; (3) Improper enhancement operations can easily cause boundary distortion or over enhancement artifacts, which in turn increases the risk of crack false detection. These characteristics make it difficult for traditional global enhancement strategies to effectively extract discriminative crack feature representations.

Conventional enhancement algorithms, such as histogram equalization and Retinex-based models, typically rely on empirical formulations or parameter adjustments, which this paper classifies as conflictive algorithms due to their limited robustness across variable underwater imaging conditions. In contrast, CNN-based algorithms leverage data-driven learning to automatically extract features and perform enhancement, achieving superior results but requiring extensive paired training datasets that are often unavailable in underwater environments. To address the limitations of both approaches, this study proposes a physics-aware deep learning model that integrates underwater optical theory into a CNN-based architecture, enabling robust and unsupervised enhancement of underwater structural crack images. The approach is built upon the UNIT framework and augmented with a physics-aware architecture that incorporates an underwater light propagation model. A multi-scale feature preservation module is introduced to retain the fine-scale texture of cracks, and a local PatchGAN discriminator is used to enhance structural realism. Additionally, a composite loss function with physical perception constraints ensures that the enhanced images are both visually appealing and physically consistent with underwater imaging principles.

By leveraging both domain knowledge and deep learning capabilities, the proposed method achieves high-quality enhancement of underwater crack images without the need for paired datasets. This work aims to provide a robust and interpretable solution for underwater inspection applications, especially in scenarios where structural integrity assessments depend on the accurate restoration of degraded visual information.

2. Methods

This study proposes an unsupervised image enhancement method based on physical perception, which combines underwater light propagation theory and UNIT network structure to achieve realistic restoration and detail enhancement of underwater crack images. This mainly includes modules, such as underwater light propagation modeling, improved UNIT structure, multi-scale feature preservation module, and optical consistency loss function design.

2.1. Overview of the Overall Framework

This method takes unlabeled real underwater crack images as input and introduces image degradation information based on physical models to guide the UNIT network to learn unsupervised domain transfer and enhancement of underwater images. This framework integrates an image translation network and a physical perception module, providing physical consistency constraints while maintaining the network’s adaptive enhancement capability, further improving image quality and structural fidelity. After completing network training, low-quality underwater images can be converted into enhanced images with high contrast, high-color reproduction, and clear crack details.

The proposed enhancement pipeline is depicted in Figure 1. The raw underwater image J is first processed via a physically-inspired light propagation module, which estimates degradation factors like background light

B_{c}

and per-channel attenuation

t_{c}

. The result is passed to a multi-scale feature extraction module with frequency-aware attention, which guides the decoder to recover high-resolution details. Meanwhile, a region-discriminator provides fine-grained supervision by focusing on texture realism and crack edge sharpness. The final restored image exhibits both global visibility and structural integrity of crack features.

2.2. Underwater Light Propagation Model

To simulate the degradation process of underwater images in reality, this paper introduces the light propagation theory based on the Jaffe–McGlamery model [31,32]. The degradation of underwater images mainly consists of three parts: absorption, scattering (as shown in Figure 2), and backward reflection of light, especially in the red and green light bands where degradation is most severe. In Equation (1), the observed underwater image

I_{c} (x)

in channel

c \in \{R, G, B\}

can be expressed as:

I_{c} (x) = J_{c} (x) \times t_{c} (x) + B_{c} \times (1 - t_{c} (x))

(1)

where,

J_{c} (x)

is the scene radiance,

t_{c} (x) = e^{- β_{c} \times d (x)}

is the transmission map that depends on the attenuation coefficient t,

β_{c}

and the scene depth

d (x)

, and

B_{c}

is the global background light in channel

c

. By modeling the three channels of RGB separately, this study constructed an optical consistency loss term for training constraints, guiding the process of image enhancement to minimize physical deviations as much as possible.

To estimate the underwater physical parameters critical for image enhancement, this paper employs a light propagation model based on the Jaffe–McGlamery formulation. The light absorption and scattering coefficients are estimated based on environmental conditions, such as water turbidity and depth, while the background light intensity is determined using a global estimation technique. This paper quantitatively evaluated the accuracy of our parameter estimation method by comparing the estimated values with ground truth measurements obtained through controlled underwater experiments. The results indicate that the estimation process achieves a high degree of accuracy, with mean absolute errors for light attenuation coefficients between 5 and 8%, an RMSE of 0.02 for background light intensity, and a correlation coefficient of 0.92 for depth estimation.

2.3. Improved UNIT Structure

The proposed method builds upon the foundational UNIT architecture, as shown in Figure 3, which integrates Variational Autoencoders (VAEs) and GANs under the assumption of a shared latent space between source and target domains. While UNIT is effective for general unsupervised domain translation tasks, it is not specifically tailored to the challenges of underwater imaging, particularly when preserving high-frequency crack features under severe degradation. To this end, this paper proposes a physics-guided and detail-aware variant of UNIT with three significant architectural improvements:

2.3.1. Shared Encoder with Physically-Constrained Latent Representation

The encoder network, denoted as

E_{X}

for domain X and

E_{Y}

for domain Y, is designed to extract domain-invariant structural representations while embedding domain-specific degradations into the latent space Z (as shown in Figure 1). Unlike the original UNIT, this paper incorporates a physics-guided regularization term into the encoder’s objective, enforcing consistency with underwater light attenuation models (based on the Jaffe–McGlamery formulation). This promotes representations that are not only compact but also physically plausible under varying scattering and absorption conditions. (1) Weight-sharing strategy: The last few convolutional layers of the encoders are shared across domains to encourage learning of common geometric structures, such as cracks and edges. (2) Auxiliary depth-light maps: The encoder optionally integrates side-channel depth priors or turbidity estimations to modulate feature extraction via adaptive normalization layers.

2.3.2. Decoder with Multi-Scale Feature Fusion and Skip Connections

The decoder networks

D_{X}

and

D_{Y}

are responsible for reconstructing enhanced images from the latent representation Z (as shown in Figure 1). To recover fine crack-level details and avoid the common problem of oversmoothing, this paper embeds a Multi-Scale Feature Enhancement Module (MSFEM) into the decoder. This module collects hierarchical feature maps from early and middle layers and aggregates them via channel attention and upsampling operations. (1) Skip connections: Inspired by U-Net, lateral connections are established between corresponding encoder and decoder layers, facilitating the preservation of spatial localization and edge sharpness. (2) MSFEM block: Each decoding stage includes an MSFEM that combines global contextual information and local details. This is crucial for reconstructing high-frequency crack patterns that are easily lost in underwater scenes. Mathematically, let

F_{l}

denote the feature map at level l. The fused output

F_{f u s e d}

is obtained by:

F_{f u s e d} = σ (C o n c a t (U p (F_{l - 1}), F_{l}, D o w n (F_{l - 1}))) \otimes A t t e n t i o n (F_{l})

(2)

where

\otimes

denotes element-wise multiplication and

A t t e n t i o n ()

is a learnable channel-weighting mechanism.

2.3.3. Local-Region Discriminator Based on PatchGAN

Standard discriminators often evaluate the entire image globally, which may fail to emphasize small yet critical structural features like cracks. This paper adopts a PatchGAN-based local discriminator Dpatch that focuses on N × N patches (typically 70 × 70), treating each patch as an independent classification task (real/fake). This improves the model’s ability to preserve texture consistency and edge sharpness in enhanced images. (1) Fine-grained feedback: The discriminator provides more granular feedback, penalizing synthetic textures or blurred transitions introduced during enhancement. (2) Adversarial loss formulation: This paper uses a least-squares GAN loss to stabilize training and mitigate gradient vanishing problems:

L_{a d v} = E_{x} [{(D (x) - 1)}^{2}] + E_{z} [{D (G (z))}^{2}]

(3)

2.4. Design of Optical Consistency Loss Function

In order to effectively integrate underwater optical physics models with image enhancement networks, this paper designs a composite loss function containing multiple optical constraints, which is expressed as follows:

L_{t o t a l} = λ_{1} \times L_{r e c o n} + λ_{2} \times L_{a d v} + λ_{3} \times L_{p h y} + λ_{4} \times L_{s s i m} + λ_{5} \times L_{g r a d}

(4)

where:

L_{r e c o n}

: Reconstruction loss, measuring the consistency between input and output images in pixel space;

L_{a d v}

: Adversarial loss, guiding the network to generate realistic images;

L_{p h y}

: Optical consistency loss, utilizing the rationality of constraining color channels with tc and Bc in physical models;

L_{s s i m}

: Structural similarity loss, used to maintain the overall shape of crack structures;

L_{g r a d}

: Image gradient loss enhances the clarity and continuity of crack edges. Through the above multi-objective joint training strategy, the network not only obtains global naturalness enhancement during the generation process, but also achieves collaborative optimization of structural fidelity and optical consistency.

Real underwater scenes often exhibit non-uniform lighting due to varying water turbidity and sunlight attenuation. To address this, we introduce an illumination-adaptive constraint that penalizes spatially inconsistent enhancement across low-light regions. Specifically, we define an inter-patch consistency loss across crack-relevant image regions to ensure local contrast is enhanced proportionally:

L_{r e c o n} = \frac{1}{N} \sum_{i = 1}^{N} ‖∆ I_{i}^{i n p u t} - ∆ I_{i}^{e n h a n c e d}‖

(5)

where,

∆ I_{i}

denotes the local contrast variation within region

i

, and N is the total number of crack-adjacent patches. This loss encourages uniform enhancement while preserving crack-texture modulation across uneven backgrounds.

2.5. Multi-Scale Feature Preservation Module

Considering the diverse forms and varying scales of underwater cracks, this paper introduces the Multi-Scale Feature Preservation Module (MSFPM) to enhance the modeling capability for cracks of different scales (as shown in Figure 4). This module is constructed based on dilated convolution and embedded in the UNIT decoder to extract features at the coarse, medium, and fine scales. Its structure includes the following: (1) Three dilated convolution branches with different dilation rates (d = 1, 2, 4) are connected in parallel to extract contextual information; (2) Channel attention mechanism weights feature responses at various scales; (3) After fusion, it is sent to the decoder to improve detail restoration and crack continuity.

To further retain high-frequency crack information, this paper extends the multi-scale module by incorporating a frequency attention mechanism. Instead of using plain dilated convolutions, this paper first decomposes intermediate features using a Discrete Wavelet Transform (DWT), then attention weights are applied based on spectral energy in each sub-band. In the frequency-aware extension of the multi-scale module, we apply a single-level DWT to intermediate features using the Haar wavelet (db1) basis function. This choice offers computational efficiency and strong spatial localization properties, which are well suited for preserving sharp transitions, such as crack edges. The decomposed sub-bands are then processed with a channel-wise attention mechanism to emphasize high-frequency components relevant to crack detection:

A_{k} = σ (M L P (\sum_{f \in F_{k}} ‖\nabla f‖))

(6)

where,

F_{k}

represents the set of DWT components at scale

k

, and σ is the sigmoid activation. The learned attention

A_{k}

is used to modulate the fusion weights in the decoder to favor high-frequency crack contours. This paper applies a single-level DWT using the Haar wavelet to decompose intermediate feature maps into four sub-bands: LL, LH, HL, and HH. These sub-bands represent low-frequency and directional high-frequency components. A channel-wise attention mechanism was then applied to emphasize sub-bands containing meaningful edge and texture information. This paper specifically adopts a 1-level decomposition to ensure that spatial structure was preserved and computational complexity is minimized.

The module can more effectively preserve crack edges and texture information during the enhancement process, improving the visual quality and subsequent recognition accuracy of the overall image. In summary, this method combines physical modeling with deep image translation networks, balancing physical consistency, structural details, and image realism, providing an innovative and high-performance solution for enhancing underwater crack images.

2.6. Proposed Framework

Figure 5 shows the overall structure of the image enhancement method based on the underwater light propagation model and the improved UNIT network fusion proposed in this paper. The entire system consists of four main modules: input image preprocessing module, underwater light propagation modeling module, improved UNIT image translation network, and multi-scale feature preservation and enhancement module. Each module achieves collaborative enhancement through specific information flow, ensuring maximum preservation of crack structure features while improving image quality.

The methods include an unsupervised image translation network with physical modeling constraints, an improved UNIT structure, a multi-scale feature enhancement module, and a joint optimization strategy for optical consistency loss. Specifically, the first inputs are the original low-quality underwater crack images, which are often accompanied by strong color distortion, blurring, and low-contrast phenomena. Before entering the enhancement network, the image is first normalized and physically degraded parameters are estimated through a preprocessing module. This estimation is based on an underwater light propagation model, which extracts the attenuation factor, background light intensity, and preliminary scattering information for each color channel. These pieces of information will be guided as part of the physical perceptual loss in subsequent training to ensure that the enhanced results are physically reasonable.

Next, the image is inputted into an improved UNIT network. This network consists of two encoders (for the source domain and target domain, respectively), a shared latent space module, and two decoders. In order to adapt to the sparsity and locality of crack features, the encoder introduces a multi-scale extraction structure and maintains key spatial textures through residual connections. Shared hidden space ensures structural consistency between different image domains, thereby achieving unsupervised domain transformation learning.

In the decoder Section, this paper introduces a multi-scale feature preservation module (MSFPM), which adopts a parallel dilated convolution structure (with dilation rates of 1, 2, 4) to effectively expand the receptive field and enhance the network’s modeling ability for cracks of different scales. The outputs of each channel are weighted and fused through an attention mechanism before being input into the decoder backbone, ultimately generating an enhanced image. This module significantly improves the ability to express details in the crack area of the image, especially in the restoration of edge information under complex background interference.

During the training process, the network is jointly driven by multiple loss terms. Among them, physical awareness loss enhances the consistency of physical parameters (such as color channel transmittance, scattering model fit, etc.) between the image and the input image through contrast enhancement, limiting the output image of the network from deviating from the physical laws of underwater imaging; Structural similarity loss (such as SSIM, edge gradient loss) is used to maintain the continuity and clarity of crack texture features; Adversarial loss ensures that the network generates images that are globally perceived to be similar in style to real high-quality images.

Overall, the method framework shown in Figure 5 not only achieves unsupervised transformation from degraded images to enhanced images, but also introduces the fusion of underwater physical laws and visual perception mechanisms, improving the interpretability and generalization ability of the model. This design is of great significance for solving the problem of “excessive enhancement” or “enhancement distortion” in current underwater crack image enhancement.

3. Results

3.1. Experimental Setup

The dataset used in this study consists of 1000 underwater crack images, 500 of which are low-quality images captured under diverse underwater conditions (collecting data through underwater robots, as shown in Figure 6). These include the following: Water turbidity levels ranging from clear (0–5 NTU) to highly turbid (30–50 NTU); Illumination conditions, including both natural ambient light (e.g., outdoor daylight at depths of 0.5–1.5 m) and artificial lighting using LED arrays in dark or shaded environments (up to 5 m depth); Depth variations from 0.3 m (shallow surface conditions) to 5 m, covering both near-surface and semi-deep underwater environments. The remaining 500 images are relatively high-quality reference images obtained under controlled conditions in clear water with optimized lighting. All images are resized to 512 × 512 resolution and stored in jpg format for model training and evaluation. The dataset is available upon request for academic research purposes. Data acquisition covered estuarine (Pearl River Delta) and tidal scenarios to ensure generalization. For benchmarking, this paper compares our method with four state-of-the-art methods: Retinex-Net, UWCNN, WaterGAN, U-Former and Restormer and the original UNIT model.

To assess the practical deployability of the proposed method, this paper reports the model’s size and inference speed on both high-end and embedded platforms. The model occupies only 17.8 MB, thanks to its lightweight encoder–decoder architecture. Inference time is 18.6 ms on an RTX 4090, enabling near real-time processing for low-latency underwater visual inspection. This confirms the method’s suitability for real-world deployment in autonomous underwater systems and robotic inspection pipelines. For fair comparison, the input resolution was uniformly resized to 512 × 512, and each method was trained for 100 epochs using the Adam optimizer. The evaluation metrics include Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Underwater Image Quality Measure (UIQM), and Edge Preservation Index (EPI).

PSNR measures the ratio between the maximum possible pixel value and the mean squared error (MSE) between the original and enhanced images:

P S N R = 10 \times {l o g}_{10} ({M A X}^{2} / M S E)

(7)

where,

M A X

is typically 255 for 8-bit images.

SSIM assesses the perceptual similarity between two images based on luminance, contrast, and structure:

S S I M = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(8)

where, μ and σ represent mean and standard deviation, and

C_{1}

,

C_{2}

are constants for stability.

UIQM is a non-reference metric for underwater images, defined as a weighted combination of colorfulness (UICM), sharpness (UISM), and contrast (UIConM):

U I Q M = c_{1} \times U I C M + c_{2} \times U I S M + c_{3} U I C o n M

(9)

where, the typical weights are:

c_{1}

= 0.0282,

c_{2}

= 0.2953,

c_{3}

= 3.5753.

EPI evaluates the sharpness and preservation of edges between the input and enhanced images:

E P I = \frac{‖\nabla I_{e n h}‖}{‖\nabla I_{i n p u t}‖}

(10)

where,

\nabla I

represents the gradient (e.g., via Sobel operator), and ‖ ‖ is the norm of the gradient magnitude.

3.2. Weight Selection of Loss Function

The loss function includes five weighted components with empirically chosen coefficients: λ₁–λ₅. These weights were determined through grid search and performance tuning on a validation set. A brief sensitivity analysis showed that the model’s performance is most sensitive to the optical consistency λ₃ and structural similarity λ₄ terms, confirming their critical roles in preserving physical realism and crack detail. The selected configuration (10, 1, 5, 2, 1) offers a balanced trade-off between visual fidelity and structural accuracy.

To further support the credibility of our training setup, this paper has now conducted a brief sensitivity analysis of the weighting terms. Specifically, this paper varied one parameter at a time while keeping the others fixed and recorded the performance changes in terms of PSNR and SSIM as shown in Table 1.

3.3. Training and Testing

The training loss of the proposed method and baseline model is shown in Figure 5. During the training phase, the total loss of the model consists of multiple components, including reconstruction loss, adversarial loss, style consistency loss, and cyclic consistency loss. From the training curve (as shown in Figure 7), the following points can be observed: (1) Rapid convergence: Within the first 20 epochs, the reconstruction loss and total loss decrease rapidly, indicating that the improved encoding decoding structure has good initialization and convergence. (2) Stability enhancement: Thanks to the introduction of PatchGAN discriminator and multi-scale feature preservation module, the overall training process has less fluctuation and the model optimization process is smoother. (3) Stable Adversarial Learning: The absence of severe oscillations in adversarial losses indicates a good game balance between the generator and discriminator, validating the effectiveness of introducing shared hidden spaces and reconstructing consistency losses in the network structure. It is worth mentioning that in underwater crack image enhancement tasks, optimizing stability is crucial as the original image is often affected by light scattering and blurring. From the loss curve, it can be seen that the proposed method has strong anti-interference ability and good adaptability.

For Figure 8a, the initial PSNR is about 16.5 dB, gradually increasing to over 24 dB within 50 epochs; The later stage (60–100 epochs) tends to converge, maintaining at 24.5–25 dB, reflecting a decrease in overall image distortion and clearer reconstruction; From the perspective of image restoration, a higher PSNR means that the improved model effectively reduces blurring and degradation in underwater imaging, especially suitable for the true restoration of crack features. For Figure 8b, in the early stages of training (0–30 epochs), SSIM rapidly improved from 0.55 to around 0.82, reflecting the model’s ability to restore the main features of image structure in the early stages; After 50 epochs, the curve tends to stabilize, and SSIM fluctuates within the range of 0.88–0.90, indicating that the enhanced image is highly consistent with the reference image in terms of structural information; Compared to the baseline model, the improved UNIT model has significantly better restoration performance in detail areas, especially at crack edges.

These trends clearly demonstrate the effectiveness of the proposed model in addressing underwater image degradation. The smooth loss convergence indicates stable training dynamics, while the SSIM and PSNR metrics reflect strong structural and visual fidelity. Especially under the challenging conditions of underwater crack detection, where scattering and low contrast hinder visual clarity, our model preserves both geometric consistency and perceptual quality, making it particularly suitable for practical deployment in underwater inspection tasks, such as bridge pier crack monitoring.

Fifty newly collected low-quality images were used to test the enhancement effect of the proposed model. The evaluation indicators PSNR, SSIM, UIQM, and EPI of the test results were 22.93, 0.821, 3.67, and 0.712, respectively, showing excellent testing results. Figure 9 shows an enhanced case effect, which demonstrates that the proposed method can achieve color and contrast enhancement of low-quality images and higher visibility of cracks.

3.4. Quantitative Evaluation

As shown in Table 2, the proposed method achieves the best performance across all evaluation metrics. The PSNR value improves by approximately 2.4 dB compared to UNIT and by over 7.3 dB compared to Retinex-Net. The SSIM index, which evaluates structural fidelity, is also highest in our method, indicating superior preservation of crack textures. Notably, the UIQM score demonstrates improved color and contrast correction, while the EPI validates that our approach retains more edge detail, crucial for crack analysis. Compared to recent transformer-based (U-Former, Restormer), the proposed method shows superior performance on structure-sensitive metrics, such as SSIM and EPI. This highlights its advantage in tasks requiring detailed defect localization, despite the simplicity and lower computational cost of our architecture.

3.5. Qualitative Results

Figure 10 visually compares enhancement results from different methods on three representative underwater crack images. The figure illustrates a visual comparison of enhancement results across multiple methods: (a) represents the original image, while (b) to (h) showcase the outcomes of Retinex-Net, UWCNN, WaterGAN, UNIT, Restormer, U-Former and the proposed method, respectively. An analysis of each method’s performance in terms of noise reduction, detail preservation, and image quality enhancement is as follows:

Retinex-Net exhibits moderate noise reduction capabilities, with some residual noise still perceptible in the enhanced image; although it retains details to a certain extent, the overall image sharpness is not markedly improved, and the overall image quality enhancement is relatively limited.

UWCNN demonstrates proficient noise reduction, effectively minimizing noise artifacts; it also excels in detail preservation, resulting in enhanced texture and edge definition in the processed image, and significantly elevates image quality, yielding a visually pleasing outcome.

WaterGAN displays average noise reduction performance, with some noise persisting in the enhanced image; its detail preservation is moderate, with slight detail loss observed, and while image quality is somewhat improved, the effect is less pronounced compared to UWCNN.

UNIT performs well in noise reduction, effectively suppressing noise; however, its detail preservation is comparatively average, with some details appearing slightly blurred, and although overall image quality is enhanced, the improvement is less significant than that of UWCNN.

The enhancement effect of Restormer and U-Former was close to the proposed method.

Figure 11 highlight localized zoomed-in views. Red arrows indicate regions of improved crack clarity and edge recovery—especially at branching points and junction discontinuities.

The proposed method excels in noise reduction, achieving a nearly noise-free image; it also demonstrates superior detail preservation, meticulously maintaining texture and edge information, and substantially enhances image quality, delivering the most visually appealing result. In conclusion, the proposed method outperforms the other evaluated methods in noise reduction, detail preservation, and image quality enhancement, with UWCNN also showing strong performance, particularly in noise reduction and detail preservation. Retinex-Net and WaterGAN offer relatively average overall performance, while UNIT, despite its effective noise reduction, has room for improvement in detail preservation.

In general, the proposed method successfully restores realistic color balance, removes scattering effects, and preserves fine-scale cracks that are lost in other methods. In contrast, UWCNN and Retinex-Net tend to oversmooth the cracks, and WaterGAN produces artifacts under uneven lighting. The introduction of physics-aware constraints and multi-scale enhancement helps our network generate images that are not only visually superior but also structurally informative for downstream defect detection tasks. Whether it was quantitative or qualitative evaluation, Restormer and U-Former were basically similar to the proposed method, but the quantitative evaluation indicators were slightly lower than the proposed method.

3.6. Ablation Study

To validate the contribution of each component, this paper conducted an ablation study by removing one module at a time from the full model: (1) Ours- OnlyPhys: A variant where only the physics-guided loss term is retained during training; (2) Ours-Phys.Loss: Without physics-aware loss; (3) Ours-MSFEM: Without the Multi-scale feature enhancement module; (4) Ours (full): The complete model.

Results in Table 3 and Figure 12 confirm that both the physics-aware loss and the multi-scale feature enhancement module contribute significantly to the final performance. The absence of either module leads to a noticeable drop in PSNR and UIQM scores, highlighting their effectiveness in guiding physically consistent learning and enhancing crack texture details, respectively.

3.7. Expand Applications

To demonstrate the practical applications of our proposed method, this paper tested it on real-world underwater detection tasks. In a bridge pier detection (as shown in Figure 13), the enhancement method significantly improved the visibility of cracks in images captured under high turbidity and shallow depths. These examples illustrate the applicability of our method in diverse real-world scenarios, highlighting its potential for improving underwater inspection tasks in civil engineering, maritime safety, and environmental monitoring.

3.8. Post-Enhancement Crack Detection Performance

To evaluate the utility of enhanced images for real-world inspection, this paper conducted a crack detection experiment using a UNet-based segmentation model. The model was trained on high-quality reference crack masks and then tested on three input types: raw underwater images, images enhanced by baseline methods, and images enhanced by our proposed framework. As shown in Table 4, our method achieved the highest scores in mIoU, F1-score, and pixel accuracy. This confirms that the enhancement not only improves visual quality but also retains discriminative features necessary for downstream analysis, such as structural crack detection. Some sample visual comparison is provided in Figure 14.

4. Discussion

Although our method demonstrates competitive performance against established CNN- and GAN-based models, such as Retinex-Net, UWCNN, WaterGAN, and UNIT, this paper acknowledges that more recent transformer-based models designs have reported strong performance on general underwater scenes. However, most of these models are not specifically designed to retain fine structural features, such as cracks, under heavy degradation. Moreover, the added complexity of transformer-based methods often results in slower inference speeds and higher data requirements. By embedding physical priors into both the network architecture and loss functions, the proposed method balances interpretability, performance, and application specificity, making it particularly suitable for real-time underwater crack inspection tasks.

The experimental results demonstrate that our proposed method effectively address-es common underwater image degradation issues, including low contrast, color shift, and detail loss. Unlike prior models that rely purely on data-driven learning, our integration of physical modeling ensures better generalization across varying underwater conditions. Moreover, the multi-scale feature enhancement module plays a key role in refining crack edges without introducing noise or artifacts. Limitations of our current implementation include the reliance on estimated water parameters during preprocessing, which may introduce bias in extreme environments (e.g., muddy estuaries). Future work could explore end-to-end learning of physical parameters and improved generalization via synthetic-to-real domain adaptation.

5. Conclusions

This paper presents a novel unsupervised image enhancement method for underwater crack images, which integrates underwater optical modeling with an improved deep learning framework. By incorporating the Jaffe–McGlamery light propagation model into the UNIT architecture, the proposed approach ensures physically consistent and perceptually accurate enhancement results. The introduction of a multi-scale feature preservation module enables effective reconstruction of fine crack details across varying spatial scales, while the PatchGAN-based local discriminator further improves structural realism and texture clarity. Extensive experiments on a self-collected underwater crack dataset demonstrate the superiority of the proposed method over existing state-of-the-art approaches in terms of both quantitative metrics (PSNR, SSIM, UIQM, and EPI) and qualitative visual results. Ablation studies confirm the critical role of physics-aware loss and multi-scale enhancement in achieving accurate restoration. Furthermore, the model exhibits strong generalization and robustness to diverse underwater environments, making it suitable for real-world inspection tasks, such as bridge pier monitoring.

(1): This paper proposed a physics-aware unsupervised image enhancement method that effectively integrates underwater light propagation theory with an improved UNIT network.
(2): A multi-scale feature preservation module and local PatchGAN discriminator were introduced to enhance structural details, especially for fine crack textures.
(3): Experimental results demonstrate that our method outperforms state-of-the-art approaches in both visual quality and structural fidelity.
(4): The designed optical consistency loss ensures enhanced images remain consistent with underwater imaging principles.
(5): The proposed framework is robust and generalizable, making it suitable for practical underwater inspection tasks, such as bridge pier crack detection.

While previous studies, such as Retinex-based methods and UWCNN, focus on enhancing global image properties without considering the underlying physical degradation process, our proposed method integrates a physical regional model that incorporates underwater light propagation theory. This modeling of optical phenomena, such as scattering and absorption, helps maintain color consistency and crack visibility in challenging underwater conditions. Furthermore, this paper introduces a multi-modification strategy through our multi-scale feature preservation module, which enables the preservation of fine crack details across different spatial scales. This is complemented by a PatchGAN-based discriminator, which ensures enhanced texture realism by focusing on small image patches. Together, these innovations result in a more detailed and physically accurate restoration compared to existing models.

In summary, this work offers a physically grounded, detail-preserving, and unsupervised solution to underwater image enhancement, advancing the reliability of visual-based underwater structural defect detection. Future research will explore end-to-end learning of physical parameters and domain adaptation techniques to further improve the adaptability of the model under complex real-world conditions. Meanwhile, for the gradient loss term, this paper employs the Sobel operator to compute image gradients along horizontal and vertical directions. This operator provides a balance between edge localization and noise robustness, and is widely used in structural loss computation. However, this paper acknowledges that alternative gradient operators could offer better alignment with the characteristics of underwater crack edges: The Scharr operator provides improved rotational symmetry and better gradient estimation, particularly useful in cases where crack edges are shallow or noisy. The Laplacian operator captures second-order derivative information, emphasizing transitions and junctions—potentially useful in branching or curved crack structures. The Canny edge detector, though non-differentiable, could inspire future work in hybrid or auxiliary supervision setups using its output as pseudo-labels for edge guidance. Exploring the effectiveness of these alternative operators in the loss formulation could be a promising direction for future research, particularly in enhancing edge preservation under extreme degradation or turbidity.

Author Contributions

Conceptualization, X.Z. and W.A.; methodology, Z.L.; software, X.Z. and Z.L.; validation, X.Z., W.A. and X.W.; formal analysis, X.W.; investigation, X.Z.; resources, X.Z. and Z.L.; data curation, X.Z.; writing—original draft preparation, X.Z., W.A. and Z.L.; writing—review and editing, X.W.; visualization, X.Z.; supervision, Z.L.; project administration, Z.L.; funding acquisition, Z.L. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the Department of Science and Technology in Henan Province (No. 242102241025).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yuan, X.; Li, W.; Chen, G.; Yin, X.; Li, X.; Liu, J.; Zhao, J.; Zhao, J. Visual and Intelligent Identification Methods for Defects in Underwater Structure Using Alternating Current Field Measurement Technique. IEEE Trans. Ind. Inform. 2022, 18, 3853–3862. [Google Scholar] [CrossRef]
Yi, X.; Jiang, Q.; Zhou, W. No-reference quality assessment of underwater image enhancement. Displays 2024, 81, 102586. [Google Scholar] [CrossRef]
Zhang, W.D.; Pan, X.P.; Xie, X.W.; Li, L.Q.; Wang, Z.M.; Han, C. Color correction and adaptive contrast enhancement for underwater enhancement. Comput. Electr. Eng. 2021, 91, 106981. [Google Scholar] [CrossRef]
Hassan, N.; Ullah, S.; Bhatti, N.; Mahmood, H.; Zia, M. The Retinex based improved underwater image enhancement. Multimed. Tools Appl. 2021, 80, 1839–1857. [Google Scholar] [CrossRef]
Zhou, J.C.; Pang, L.; Zhang, D.H.; Zhang, W.S. Underwater image enhancement method via multi-interval subhistogram perspective equalization. IEEE J. Ocean. Eng. 2023, 48, 474–488. [Google Scholar] [CrossRef]
Zhang, S.; Wang, T.; Dong, J.Y.; Yu, H. Underwater image enhancement via extended multi-scale Retinex. Neurocomputing 2017, 245, 1–9. [Google Scholar] [CrossRef]
Teng, S.; Liu, A.R.; Chen, B.C.; Wang, J.L.; Wu, Z.H.; Fu, J.Y. Unsupervised learning method for underwater concrete crack image enhancement and augmentation based on cross domain translation strategy. Eng. Appl. Artif. Intell. 2024, 136, 108884. [Google Scholar] [CrossRef]
Zheng, Z.; Huang, X.; Wang, L. Underwater low-light enhancement network based on bright channel prior and attention mechanism. PLoS ONE 2023, 18, e0281093. [Google Scholar] [CrossRef]
Jian, M.; Liu, X.; Luo, H.; Lu, X.; Yu, H.; Dong, J. Underwater image processing and analysis: A review. Signal Process. Image Commun. 2021, 91, 116088. [Google Scholar] [CrossRef]
Ulutas, G.; Ustubioglu, B. Underwater image enhancement using contrast limited adaptive histogram equalization and layered difference representation. Multimed. Tools Appl. 2021, 80, 15067–15091. [Google Scholar] [CrossRef]
Lin, H.N.; Shi, Z.W. Multi-scale retinex improvement for nighttime image enhancement. Optik 2014, 125, 7143–7148. [Google Scholar] [CrossRef]
Zhang, W.D.; Dong, L.L.; Xu, W.H. Retinex-inspired color correction and detail preserved fusion for underwater image enhancement. Comput. Electron. Agric. 2022, 192, 106585. [Google Scholar] [CrossRef]
Parihar, A.S.; Singh, K.; IEEE. A study on retinex based method for image enhancement. In Proceedings of the 2nd International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2018; pp. 619–624. [Google Scholar]
Tolie, H.F.; Ren, J.C.; Elyan, E. DICAM: Deep inception and channel-wise attention modules for underwater enhancement. Neurocomputing 2024, 584, 127585. [Google Scholar] [CrossRef]
Li, C.Y.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Zheng, M.C.; Luo, W.L. Underwater image enhancement using improved CNN based defogging. Electronics 2022, 11, 150. [Google Scholar] [CrossRef]
Mei, X.K.; Ye, X.F.; Zhang, X.F.; Liu, Y.S.; Wang, J.T.; Hou, J.; Wang, X.L. UIR-Net: A simple and effective baseline for underwater image restoration and enhancement. Remote Sens. 2023, 15, 39. [Google Scholar] [CrossRef]
Lyu, Z.; Peng, A.; Wang, Q.W.; Ding, D.D. An efficient learning-based method for underwater image enhancement. Displays 2022, 74, 102174. [Google Scholar] [CrossRef]
Wu, S.C.; Luo, T.; Jiang, G.Y.; Yu, M.; Xu, H.Y.; Zhu, Z.J.; Song, Y. A two-stage underwater enhancement network based on structure decomposition and characteristics of underwater imaging. IEEE J. Ocean. Eng. 2021, 46, 1213–1227. [Google Scholar] [CrossRef]
Wang, Y.D.; Guo, J.C.; Gao, H.; Yue, H.H. UIEC”2-Net: CNN-based underwater image enhancement using two color space. Signal Process. Image Commun. 2021, 96, 116250. [Google Scholar] [CrossRef]
Li, C.Y.; Guo, C.L.; Guo, J.C.; Han, P.; Fu, H.Z.; Cong, R.M. PDR-Net: Perception-inspired single image dehazing network with refinement. IEEE Trans. Multimed. 2020, 22, 704–716. [Google Scholar] [CrossRef]
Hu, K.; Weng, C.H.; Shen, C.W.; Wang, T.Y.; Weng, L.G.; Xia, M. A multi-stage underwater image aesthetic enhancement algorithm based on a generative adversarial network. Eng. Appl. Artif. Intell. 2023, 123, 106196. [Google Scholar] [CrossRef]
Jiang, Q.P.; Kang, Y.Z.; Wang, Z.H.; Ren, W.Q.; Li, C.Y. Perception-driven deep underwater image enhancement without paired supervision. IEEE Trans. Multimed. 2024, 26, 4884–4897. [Google Scholar] [CrossRef]
Teng, S.; Liu, A.; Ye, X.; Wang, J.; Fu, J.; Wu, Z.; Chen, B.; Liu, C.; Zhou, H.; Zeng, Y.; et al. Review of intelligent detection and health assessment of underwater structures. Eng. Struct. 2024, 308, 117958. [Google Scholar] [CrossRef]
Hambarde, P.; Murala, S.; Dhall, A. UW-GAN: Single-image depth estimation and image enhancement for underwater images. IEEE Trans. Instrum. Meas. 2021, 70, 5018412. [Google Scholar] [CrossRef]
Guo, Y.C.; Li, H.Y.; Zhuang, P.X. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2020, 45, 862–870. [Google Scholar] [CrossRef]
Li, F.; Zheng, J.B.; Zhang, Y.F.; Jia, W.J.; Wei, Q.R.; He, X.J. Cross-domain learning for underwater image enhancement. Signal Process. Image Commun. 2023, 110, 116890. [Google Scholar] [CrossRef]
Chaurasia, D.; Chhikara, P. Sea-Pix-GAN: Underwater image enhancement using adversarial neural network. J. Vis. Commun. Image Represent. 2024, 98, 104021. [Google Scholar] [CrossRef]
Li, C.Y.; Guo, C.L.; Ren, W.Q.; Cong, R.M.; Hou, J.H.; Kwong, S.; Tao, D.C. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Zhou, J.; Wei, X.; Shi, J.; Chu, W.; Lin, Y. Underwater image enhancement via two-level wavelet decomposition maximum brightness color restoration and edge refinement histogram stretching. Opt. Express 2022, 30, 17290–17306. [Google Scholar] [CrossRef]
Lin, S.; Ning, Z.; Zhang, R. Modified optical model and optimized contrast for underwater image restoration. Opt. Commun. 2025, 574, 130942. [Google Scholar] [CrossRef]

Figure 1. Diagram of the unsupervised framework for underwater structural crack image restoration.

Figure 2. Schematic illustration of underwater light propagation. The diagram shows the main components affecting image formation: direct transmission from the object to the camera, backscatter from suspended particles, and forward scatter along the line of sight. The ambient light source and suspended particle distribution are also illustrated.

Figure 3. UNIT basic architecture.

Figure 4. Multi-scale feature preservation module.

Figure 5. The framework of the underwater crack image enhancement model.

Figure 6. Underwater image acquisition setup using a submersible robot. (a) Robot control; (b) robot underwater work; (c) robot. The imaging system supports maximum depths of 200 m and operation radius up to 400 m, equipped with a 4K-resolution camera (also supporting 1080p@120fps and 720p@240fps). High-intensity 8000-lumen LED lights with a 150° illumination angle and adjustable brightness provide consistent lighting in turbid water. The system ensures ≥2 m visibility in 5 NTU water and ≥1.5 m in 60 NTU water, enabling reliable high-resolution capture under realistic inspection conditions.

Figure 7. Training curve comparison.

Figure 8. Training curve comparison of PSNR and SSIM. (a) PSNR curve comparison; (b) SSIM curve comparison.

Figure 9. Enhancement cases of the proposed model. (a) raw images; (b) enhancement images.

Figure 10. Visual comparison of enhancement results. (a) Original image, (b) Retinex-Net, (c) UWCNN, (d) WaterGAN, (e) UNIT, (f) Restormer, (g) U-Former and (h) the proposed method.

Figure 11. The localized crack magnification views.

Figure 12. Ablation study results.

Figure 13. Real-world underwater enhanced cases. (a) Original image; (b) enhanced image.

Figure 14. Comparison of detection cases.

Table 1. Sensitivity analysis of weight.

Setting	PSNR ↑	SSIM ↑
Baseline (λ₁–λ₅ = 10/1/5/2/1)	22.93	0.821
λ₁ = 5	21.78	0.802
λ₂ = 2	22.14	0.809
λ₃ = 0	21.35	0.784
λ₄ = 0	21.51	0.773
λ₅ = 3	22.06	0.812

Table 2. Quantitative comparison of image enhancement methods.

Method	PSNR ↑	SSIM ↑	UIQM ↑	EPI ↑
Retinex-Net	15.63	0.523	2.39	0.531
UWCNN	19.21	0.741	3.04	0.654
WaterGAN	20.14	0.768	3.26	0.672
Restormer	21.62	0.789	3.48	0.687
U-Former	22.01	0.802	3.57	0.692
Original UNIT	20.55	0.772	3.31	0.679
Ours (Phys-aware UNIT)	22.93	0.821	3.67	0.712

Table 3. Ablation study results.

Configuration	PSNR ↑	SSIM ↑	UIQM ↑
Ours-OnlyPhys	20.41	0.763	3.08
Ours-Phys.Loss	21.12	0.788	3.31
Ours-MSFEM	21.64	0.794	3.43
Ours (full)	22.93	0.821	3.67

Table 4. Comparison of crack detection performance.

Method	mIoU ↑	F1-Score ↑	Pixel Accuracy ↑
Non-enhanced method	0.427	0.581	0.752
Proposed method	0.632	0.741	0.861

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeng, X.; Ai, W.; Liu, Z.; Wang, X. Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention. Buildings 2025, 15, 2150. https://doi.org/10.3390/buildings15132150

AMA Style

Zeng X, Ai W, Liu Z, Wang X. Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention. Buildings. 2025; 15(13):2150. https://doi.org/10.3390/buildings15132150

Chicago/Turabian Style

Zeng, Xianfeng, Wenji Ai, Zongchao Liu, and Xianling Wang. 2025. "Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention" Buildings 15, no. 13: 2150. https://doi.org/10.3390/buildings15132150

APA Style

Zeng, X., Ai, W., Liu, Z., & Wang, X. (2025). Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention. Buildings, 15(13), 2150. https://doi.org/10.3390/buildings15132150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Restoration of Underwater Structural Crack Images via Physics-Constrained Image Translation and Multi-Scale Feature Retention

Abstract

1. Introduction

2. Methods

2.1. Overview of the Overall Framework

2.2. Underwater Light Propagation Model

2.3. Improved UNIT Structure

2.3.1. Shared Encoder with Physically-Constrained Latent Representation

2.3.2. Decoder with Multi-Scale Feature Fusion and Skip Connections

2.3.3. Local-Region Discriminator Based on PatchGAN

2.4. Design of Optical Consistency Loss Function

2.5. Multi-Scale Feature Preservation Module

2.6. Proposed Framework

3. Results

3.1. Experimental Setup

3.2. Weight Selection of Loss Function

3.3. Training and Testing

3.4. Quantitative Evaluation

3.5. Qualitative Results

3.6. Ablation Study

3.7. Expand Applications

3.8. Post-Enhancement Crack Detection Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI