MDPI - Publisher of Open Access Journals

36 pages, 13404 KiB

Open AccessArticle

A Multi-Task Deep Learning Framework for Road Quality Analysis with Scene Mapping via Sim-to-Real Adaptation

by Rahul Soans, Ryuichi Masuda and Yohei Fukumizu

Appl. Sci. 2025, 15(16), 8849; https://doi.org/10.3390/app15168849 - 11 Aug 2025

Viewed by 219

Robust perception of road surface conditions is a critical challenge for the safe deployment of autonomous vehicles and the efficient management of transportation infrastructure. This paper introduces a synthetic data-driven deep learning framework designed to address this challenge. We present a large-scale, procedurally [...] Read more.

Robust perception of road surface conditions is a critical challenge for the safe deployment of autonomous vehicles and the efficient management of transportation infrastructure. This paper introduces a synthetic data-driven deep learning framework designed to address this challenge. We present a large-scale, procedurally generated 3D synthetic dataset created in Blender, featuring a diverse range of road defects—including cracks, potholes, and puddles—alongside crucial road features like manhole covers and patches. Crucially, our dataset provides dense, pixel-perfect annotations for segmentation masks, depth maps, and camera parameters (intrinsic and extrinsic). Our proposed model leverages these rich annotations in a multi-task learning framework that jointly performs road defect segmentation and depth estimation, enabling a comprehensive geometric and semantic understanding of the road environment. A core contribution is a two-stage domain adaptation strategy to bridge the synthetic-to-real gap. First, we employ a modified CycleGAN with a segmentation-aware loss to translate synthetic images into a realistic domain while preserving defect fidelity. Second, during model training, we utilize a dual-discriminator adversarial approach, applying alignment at both the feature and output levels to minimize domain shift. Benchmarking experiments validate our approach, demonstrating high accuracy and computational efficiency. Our model excels in detecting subtle or occluded defects, attributed to an occlusion-aware loss formulation. The proposed system shows significant promise for real-time deployment in autonomous navigation, automated infrastructure assessment and Advanced Driver-Assistance Systems (ADAS). Full article

► Show Figures

Figure 1

42 pages, 6539 KiB

Open AccessArticle

Multimodal Sparse Reconstruction and Deep Generative Networks: A Paradigm Shift in MR-PET Neuroimaging

by Krzysztof Malczewski

Appl. Sci. 2025, 15(15), 8744; https://doi.org/10.3390/app15158744 - 7 Aug 2025

Viewed by 447

Abstract

A novel multimodal super-resolution framework is introduced, combining GAN-based synthesis, perceptual constraints, and joint low-rank sparsity regularization to noticeably enhance MR-PET image quality. The architecture integrates modality-specific ResNet encoders, a transformer-based attention fusion block, and a multi-scale PatchGAN discriminator. Training is guided by [...] Read more.

A novel multimodal super-resolution framework is introduced, combining GAN-based synthesis, perceptual constraints, and joint low-rank sparsity regularization to noticeably enhance MR-PET image quality. The architecture integrates modality-specific ResNet encoders, a transformer-based attention fusion block, and a multi-scale PatchGAN discriminator. Training is guided by a hybrid loss function incorporating adversarial, pixel-wise, perceptual (VGG19), and structured Hankel constraints. The proposed method outperforms all baselines in PSNR, SSIM, LPIPS, and diagnostic confidence metrics. Clinical PET metrics, such as SUV recovery and lesion detectability, show substantial improvement. A thorough analysis of computational complexity, dataset composition, training reproducibility, and motion compensation is provided. These findings are visually supported by processed scan panels and benchmark tables. This framework advances reproducible and interpretable hybrid neuroimaging with strong clinical and technical validation. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

19 pages, 23096 KiB

Open AccessArticle

GAN-Based Super-Resolution in Linear R-SAM Imaging for Enhanced Non-Destructive Semiconductor Measurement

by Thi Thu Ha Vu, Tan Hung Vo, Trong Nhan Nguyen, Jaeyeop Choi, Le Hai Tran, Vu Hoang Minh Doan, Van Bang Nguyen, Wonjo Lee, Sudip Mondal and Junghwan Oh

Appl. Sci. 2025, 15(12), 6780; https://doi.org/10.3390/app15126780 - 17 Jun 2025

Viewed by 560

Abstract

The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed [...] Read more.

The precise identification and non-destructive measurement of structural features and defects in semiconductor wafers are essential for ensuring process integrity and sustaining high yield in advanced manufacturing environments. Unlike conventional measurement techniques, scanning acoustic microscopy (SAM) is an advanced method that provides detailed visualizations of both surface and internal wafer structures. However, in practical industrial applications, the scanning time and image quality of SAM significantly impact its overall performance and utility. Prolonged scanning durations can lead to production bottlenecks, while suboptimal image quality can compromise the accuracy of defect detection. To address these challenges, this study proposes LinearTGAN, an improved generative adversarial network (GAN)-based model specifically designed to improve the resolution of linear acoustic wafer images acquired by the breakthrough rotary scanning acoustic microscopy (R-SAM) system. Empirical evaluations demonstrate that the proposed model significantly outperforms conventional GAN-based approaches, achieving a Peak Signal-to-Noise Ratio (PSNR) of 29.479 dB, a Structural Similarity Index Measure (SSIM) of 0.874, a Learned Perceptual Image Patch Similarity (LPIPS) of 0.095, and a Fréchet Inception Distance (FID) of 0.445. To assess the measurement aspect of LinearTGAN, a lightweight defect segmentation module was integrated and tested on annotated wafer datasets. The super-resolved images produced by LinearTGAN significantly enhanced segmentation accuracy and improved the sensitivity of microcrack detection. Furthermore, the deployment of LinearTGAN within the R-SAM system yielded a 92% improvement in scanning performance for 12-inch wafers while simultaneously enhancing image fidelity. The integration of super-resolution techniques into R-SAM significantly advances the precision, robustness, and efficiency of non-destructive measurements, highlighting their potential to have a transformative impact in semiconductor metrology and quality assurance. Full article

► Show Figures

Figure 1

17 pages, 1829 KiB

Open AccessArticle

Research on Improved Occluded-Face Restoration Network

by Shangzhen Pang, Tzer Hwai Gilbert Thio, Fei Lu Siaw, Mingju Chen and Li Lin

Symmetry 2025, 17(6), 827; https://doi.org/10.3390/sym17060827 - 26 May 2025

Viewed by 400

Abstract

The natural features of the face exhibit significant symmetry. In practical applications, faces may be partially occluded due to factors like wearing masks or glasses, or the presence of other objects. Occluded-face restoration has broad application prospects in fields such as augmented reality, [...] Read more.

The natural features of the face exhibit significant symmetry. In practical applications, faces may be partially occluded due to factors like wearing masks or glasses, or the presence of other objects. Occluded-face restoration has broad application prospects in fields such as augmented reality, virtual reality, healthcare, security, etc. It is also of significant practical importance in enhancing public safety and providing efficient services. This research establishes an improved occluded-face restoration network based on facial feature points and Generative Adversarial Networks. A facial landmark prediction network is constructed based on an improved MobileNetV3-small network. On the foundation of U-Net, dilated convolutions and residual blocks are introduced to form an enhanced generator network. Additionally, an improved discriminator network is built based on Patch-GAN. Compared to the Contextual Attention network, under various occlusions, the improved face restoration network shows a maximum increase in the Peak Signal-to-Noise Ratio of 24.47%, and in the Structural Similarity Index of 24.39%, and a decrease in the Fréchet Inception Distance of 81.1%. Compared to the Edge Connect network, under various occlusions, the improved network shows a maximum increase in the Peak Signal-to-Noise Ratio of 7.89% and in the Structural Similarity Index of 10.34%, and a decrease in the Fréchet Inception Distance of 27.2%. Compared to the LaFIn network, under various occlusions, the improved network shows a maximum increase in the Peak Signal-to-Noise Ratio of 3.4% and in the Structural Similarity Index of 3.31%, and a decrease in the Fréchet Inception Distance of 9.19%. These experiments show that the improved face restoration network yields better restoration results. Full article

(This article belongs to the Section Physics)

► Show Figures

Figure 1

13 pages, 1111 KiB

Open AccessFeature PaperArticle

Data Augmentation for Enhanced Fish Detection in Lake Environments: Affine Transformations, Neural Filters, SinGAN

by Kidai Watanabe, Thao Nguyen-Nhu, Saya Takano, Daisuke Mori and Yasufumi Fujimoto

Animals 2025, 15(10), 1466; https://doi.org/10.3390/ani15101466 - 19 May 2025

Viewed by 409

Abstract

Understanding fish habitats is essential for fisheries management, habitat restoration, and species protection. Automated fish detection is a key tool in these applications, which enables real-time monitoring and quantitative analysis. Recent advancements in high-resolution cameras and machine learning technologies have facilitated image analysis [...] Read more.

Understanding fish habitats is essential for fisheries management, habitat restoration, and species protection. Automated fish detection is a key tool in these applications, which enables real-time monitoring and quantitative analysis. Recent advancements in high-resolution cameras and machine learning technologies have facilitated image analysis automation, promoting remote fish tracking. However, many of these detection methods require large volumes of annotated data, which involve considerable effort and time. Additionally, their practical implementation remains challenging in environments with limited data. Hence, this study proposes an anomaly-based fish detection approach by integrating Patch Distribution Modeling with data augmentation techniques, including Affine Transformations, Neural Filters, and SinGAN. Field experiments were conducted in Lake Izunuma-Uchinuma, Japan, using an electrofishing boat to acquire data. Evaluation metrics, such as AUROC and F1-score, assessed detection performance. The results indicate that, compared to the original dataset (AUROC: 0.836, F1-score: 0.483), Neural Filters (AUROC: 0.940, F1-score: 0.879) and Affine Transformations (AUROC: 0.942, F1-score: 0.766) improve anomaly detection. However, SinGAN exhibited no measurable enhancement, indicating the necessity for further optimization. This shows the potential of the proposed approach to enhance automated fish detection in limited-data environments, supporting aquatic ecosystem sustainability. Full article

(This article belongs to the Special Issue Conservation and Restoration of Aquatic Animal Habitats)

► Show Figures

Figure 1

17 pages, 5627 KiB

Open AccessArticle

A Generative Model-Based Method for Inverse Design of Microstrip Filters

by Haipeng Wang, Chenchen Nie, Zhongfang Ren and Yunbo Li

Electronics 2025, 14(10), 1989; https://doi.org/10.3390/electronics14101989 - 13 May 2025

Viewed by 652

Abstract

In the area of microstrip filter design and optimization, deep learning (DL) algorithms have become much more attractive and powerful in recent years. Here, we propose a method to realize the inverse design of passive microstrip filters, applying generative adversarial networks (GANs). The [...] Read more.

In the area of microstrip filter design and optimization, deep learning (DL) algorithms have become much more attractive and powerful in recent years. Here, we propose a method to realize the inverse design of passive microstrip filters, applying generative adversarial networks (GANs). The proposed DL-assisted framework is composed of three components, including a compositional pattern-producing network GAN-based graphic generator, a convolution neural network (CNN)-based electromagnetic (EM) response predictor, and a genetic algorithm optimizer. The filter adopts a square patch resonator structure with an irregular-graphic slot and corner-cuts introduced at diagonal positions. By constructing a hybrid model of pixelated patterns in the filter structures and the corresponding EM response S-parameters, we can obtain customized filter solutions with wideband and dual-band magnitude responses in the 3–8 GHz and 1–6 GHz frequency range, respectively. For each inverse design, it cost 3.6 min for executing 1000 iterations, on average. Numerical simulations and experimental results show that the S-parameters of the generated filters are in excellent agreement with the self-defined targets. Full article

► Show Figures

Figure 1

8 pages, 3697 KiB

Open AccessProceeding Paper

Pansharpening Remote Sensing Images Using Generative Adversarial Networks

by Bo-Hsien Chung, Jui-Hsiang Jung, Yih-Shyh Chiou, Mu-Jan Shih and Fuan Tsai

Eng. Proc. 2025, 92(1), 32; https://doi.org/10.3390/engproc2025092032 - 28 Apr 2025

Viewed by 328

Abstract

Pansharpening is a remote sensing image fusion technique that combines a high-resolution (HR) panchromatic (PAN) image with a low-resolution (LR) multispectral (MS) image to produce an HR MS image. The primary challenge in pansharpening lies in preserving the spatial details of the PAN [...] Read more.

Pansharpening is a remote sensing image fusion technique that combines a high-resolution (HR) panchromatic (PAN) image with a low-resolution (LR) multispectral (MS) image to produce an HR MS image. The primary challenge in pansharpening lies in preserving the spatial details of the PAN image while maintaining the spectral integrity of the MS image. To address this, this article presents a generative adversarial network (GAN)-based approach to pansharpening. The GAN discriminator facilitated matching the generated image’s intensity to the HR PAN image and preserving the spectral characteristics of the LR MS image. The performance in generating images was evaluated using the peak signal-to-noise ratio (PSNR). For the experiment, original LR MS and HR PAN satellite images were partitioned into smaller patches, and the GAN model was validated using an 80:20 training-to-testing data ratio. The results illustrated that the super-resolution images generated by the SRGAN model achieved a PSNR of 31 dB. These results demonstrated the developed model’s ability to reconstruct the geometric, textural, and spectral information from the images. Full article

(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

► Show Figures

Figure 1

23 pages, 57584 KiB

Open AccessArticle

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

by Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan and Shiho Kim

Technologies 2025, 13(4), 154; https://doi.org/10.3390/technologies13040154 - 11 Apr 2025

Cited by 1 | Viewed by 989

Abstract

This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our method leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder–decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. [...] Read more.

This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our method leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder–decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed global representations and preserves essential spectral characteristics, treating RGB-to-NIR translation as more than a simple domain transfer problem. A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation. We performed experiments on the RANUS and IDD-AW datasets to demonstrate Pix2Next’s advantages in quantitative metrics and visual quality, highly improving the FID score compared to existing methods. Furthermore, we demonstrate the practical utility of Pix2Next by showing improved performance on a downstream object detection task using generated NIR data to augment limited real NIR datasets. The proposed method enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications. Full article

► Show Figures

Graphical abstract

35 pages, 6560 KiB

Open AccessArticle

Adversarial Content–Noise Complementary Learning Model for Image Denoising and Tumor Detection in Low-Quality Medical Images

by Teresa Abuya, Richard Rimiru and George Okeyo

Signals 2025, 6(2), 17; https://doi.org/10.3390/signals6020017 - 3 Apr 2025

Viewed by 1283

Abstract

Medical imaging is crucial for disease diagnosis, but noise in CT and MRI scans can obscure critical details, making accurate diagnosis challenging. Traditional denoising methods and deep learning techniques often produce overly smooth images that lack vital diagnostic information. GAN-based approaches also struggle [...] Read more.

Medical imaging is crucial for disease diagnosis, but noise in CT and MRI scans can obscure critical details, making accurate diagnosis challenging. Traditional denoising methods and deep learning techniques often produce overly smooth images that lack vital diagnostic information. GAN-based approaches also struggle to balance noise removal and content preservation. Existing research has not explored tumor detection after image denoising; instead, it has concentrated on content and noise learning. To address these challenges, this study proposes the Adversarial Content–Noise Complementary Learning (ACNCL) model, which enhances image denoising and tumor detection. Unlike conventional methods focusing solely on content or noise learning, ACNCL simultaneously learns both through dual predictors, ensuring the complementary reconstruction of high-quality images. The model integrates multiple denoising techniques (DnCNN, U-Net, DenseNet, CA-AGF, and DWT) within a GAN framework, using PatchGAN as a local discriminator to preserve fine image textures. The ACNCL separates anatomical details and noise into distinct pathways, ensuring stable noise reduction while maintaining structural integrity. Evaluated on CT and MRI datasets, ACNCL demonstrated exceptional performance compared to traditional models both qualitatively and quantitatively. It exhibited strong generalization across datasets, improving medical image clarity and enabling earlier tumor detection. These findings highlight ACNCL’s potential to enhance diagnostic accuracy and support improved clinical decision-making. Full article

(This article belongs to the Special Issue Recent Development of Signal Detection and Processing)

► Show Figures

Figure 1

14 pages, 743 KiB

Open AccessArticle

AD-VAE: Adversarial Disentangling Variational Autoencoder

by Adson Silva and Ricardo Farias

Sensors 2025, 25(5), 1574; https://doi.org/10.3390/s25051574 - 4 Mar 2025

Viewed by 1044

Abstract

Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like [...] Read more.

Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like pose, illumination, and occlusion. Deep learning techniques have shown promising results in recent years using VAE and GAN, with approaches such as patch-VAE, VAE-GAN for 3D Indoor Scene Synthesis, and hybrid VAE-GAN models. However, in Single Sample Per Person Face Recognition (SSPP FR), the challenge of learning robust and discriminative features that preserve the subject’s identity persists. To address these issues, we propose a novel framework called AD-VAE, specifically for SSPP FR, using a combination of variational autoencoder (VAE) and Generative Adversarial Network (GAN) techniques. The proposed AD-VAE framework is designed to learn how to build representative identity-preserving prototypes from both controlled and wild datasets, effectively handling variations like pose, illumination, and occlusion. The method uses four networks: an encoder and decoder similar to VAE, a generator that receives the encoder output plus noise to generate an identity-preserving prototype, and a discriminator that operates as a multi-task network. AD-VAE outperforms all tested state-of-the-art face recognition techniques, demonstrating its robustness. The proposed framework achieves superior results on four controlled benchmark datasets—AR, E-YaleB, CAS-PEAL, and FERET—with recognition rates of 84.9%, 94.6%, 94.5%, and 96.0%, respectively, and achieves remarkable performance on the uncontrolled LFW dataset, with a recognition rate of 99.6%. The AD-VAE framework shows promising potential for future research and real-world applications. Full article

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 4771 KiB

Open AccessArticle

Untargeted Evasion Attacks on Deep Neural Networks Using StyleGAN

by Hyun Kwon

Electronics 2025, 14(3), 574; https://doi.org/10.3390/electronics14030574 - 31 Jan 2025

Cited by 3 | Viewed by 813

Abstract

In this study, we propose a novel method for generating untargeted adversarial examples using a Generative Adversarial Network (GAN) in an unrestricted black-box environment. The proposed approach produces adversarial examples that are classified into random classes distinct from their original labels, while maintaining [...] Read more.

In this study, we propose a novel method for generating untargeted adversarial examples using a Generative Adversarial Network (GAN) in an unrestricted black-box environment. The proposed approach produces adversarial examples that are classified into random classes distinct from their original labels, while maintaining high visual similarity to the original samples from a human perspective. This is achieved by leveraging the capabilities of StyleGAN to manipulate the latent space representation of images, enabling precise control over visual distortions. To evaluate the efficacy of the proposed method, we conducted experiments using the CelebA-HQ dataset and TensorFlow as the machine learning framework, with ResNet18 serving as the target classifier. The experimental results demonstrate the effectiveness of the method, achieving a 100% attack success rate in a black-box environment after 3000 iterations. Moreover, the adversarial examples generated by our approach exhibit a distortion value of 0.069 based on the Learned Perceptual Image Patch Similarity (LPIPS) metric, highlighting the balance between attack success and perceptual similarity. These findings underscore the potential of GAN-based approaches in crafting robust adversarial examples while preserving visual fidelity. Full article

► Show Figures

Figure 1

19 pages, 4528 KiB

Open AccessArticle

Grounding Grid Electrical Impedance Imaging Method Based on an Improved Conditional Generative Adversarial Network

by Ke Zhu, Donghui Luo, Zhengzheng Fu, Zhihang Xue and Xianghang Bu

Algorithms 2025, 18(1), 48; https://doi.org/10.3390/a18010048 - 15 Jan 2025

Viewed by 1005

Abstract

The grounding grid is an important piece of equipment to ensure the safety of a power system, and thus research detecting on its corrosion status is of great significance. Electrical impedance tomography (EIT) is an effective method for grounding grid corrosion imaging. However, [...] Read more.

The grounding grid is an important piece of equipment to ensure the safety of a power system, and thus research detecting on its corrosion status is of great significance. Electrical impedance tomography (EIT) is an effective method for grounding grid corrosion imaging. However, the inverse process of image reconstruction has pathological solutions, which lead to unstable imaging results. This paper proposes a grounding grid electrical impedance imaging method based on an improved conditional generative adversarial network (CGAN), aiming to improve imaging precision and accuracy. Its generator combines a preprocessing module and a U-Net model with a convolutional block attention module (CBAM). The discriminator adopts a PatchGAN structure. First, a grounding grid forward problem model was built to calculate the boundary voltage. Then, the image was initialized through the preprocessing module, and the important features of ground grid corrosion were extracted again through the encoder module, decoder module and attention module. Finally, the generator and discriminator continuously optimized the objective function and conducted adversarial training to achieve ground grid electrical impedance imaging. Imaging was performed on grounding grids with different corrosion conditions. The results showed a final average peak signal-to-noise ratio of 20.04. The average structural similarity was 0.901. The accuracy of corrosion position judgment was 94.3%. The error of corrosion degree judgment was 9.8%. This method effectively improves the pathological problem of grounding grid imaging and improves the precision and accuracy, with certain noise resistance and universality. Full article

(This article belongs to the Special Issue Data-Driven Intelligent Modeling and Optimization Algorithms for Industrial Processes: 2nd Edition)

► Show Figures

Graphical abstract

17 pages, 3986 KiB

Open AccessArticle

Efficient Image Inpainting for Handwritten Text Removal Using CycleGAN Framework

by Somanka Maiti, Shabari Nath Panuganti, Gaurav Bhatnagar and Jonathan Wu

Mathematics 2025, 13(1), 176; https://doi.org/10.3390/math13010176 - 6 Jan 2025

Viewed by 2107

Abstract

With the recent rise in the development of deep learning techniques, image inpainting—the process of restoring missing or corrupted regions in images—has witnessed significant advancements. Although state-of-the-art models are effective, they often fail to inpaint complex missing areas, especially when handwritten occlusions are [...] Read more.

With the recent rise in the development of deep learning techniques, image inpainting—the process of restoring missing or corrupted regions in images—has witnessed significant advancements. Although state-of-the-art models are effective, they often fail to inpaint complex missing areas, especially when handwritten occlusions are present in the image. To address this issue, an image inpainting model based on a residual CycleGAN is proposed. The generator takes as input the image occluded by handwritten missing patches and generates a restored image, which the discriminator then compares with the original ground truth image to determine whether it is real or fake. An adversarial trade-off between the generator and discriminator motivates the model to improve its training and produce a superior reconstructed image. Extensive experiments and analyses confirm that the proposed method generates inpainted images with superior visual quality and outperforms state-of-the-art deep learning approaches. Full article

(This article belongs to the Special Issue New Trends in Computer Vision, Pattern Recognition and Machine Learning)

► Show Figures

Figure 1

21 pages, 66390 KiB

Open AccessArticle

Photorealistic Texture Contextual Fill-In

by Radek Richtr

Heritage 2025, 8(1), 9; https://doi.org/10.3390/heritage8010009 - 27 Dec 2024

Cited by 1 | Viewed by 1463

Abstract

This paper presents a comprehensive study of the application of AI-driven inpainting techniques to the restoration of historical photographs of the Czech city Most, with a focus on restoration and reconstructing the lost architectural heritage. The project combines state-of-the-art methods, including generative adversarial [...] Read more.

This paper presents a comprehensive study of the application of AI-driven inpainting techniques to the restoration of historical photographs of the Czech city Most, with a focus on restoration and reconstructing the lost architectural heritage. The project combines state-of-the-art methods, including generative adversarial networks (GANs), patch-based inpainting, and manual retouching, to restore and enhance severely degraded images. The reconstructed/restored photographs of the city Most offer an invaluable visual representation of a city that was largely destroyed for industrial purposes in the 20th century. Through a series of blind and informed user tests, we assess the subjective quality of the restored images and examine how knowledge of edited areas influences user perception. Additionally, this study addresses the technical challenges of inpainting, including computational demands, interpretability, and bias in AI models. Ethical considerations, particularly regarding historical authenticity and speculative reconstruction, are also discussed. The findings demonstrate that AI techniques can significantly contribute to the preservation of cultural heritage, but must be applied with careful oversight to maintain transparency and cultural integrity. Future work will focus on improving the interpretability and efficiency of these methods, while ensuring that reconstructions remain historically and culturally sensitive. Full article

(This article belongs to the Section Cultural Heritage)

► Show Figures

Figure 1

18 pages, 6401 KiB

Open AccessArticle

Continuous Satellite Image Generation from Standard Layer Maps Using Conditional Generative Adversarial Networks

by Arminas Šidlauskas, Andrius Kriščiūnas and Dalia Čalnerytė

ISPRS Int. J. Geo-Inf. 2024, 13(12), 448; https://doi.org/10.3390/ijgi13120448 - 11 Dec 2024

Cited by 3 | Viewed by 1606

Abstract

Satellite image generation has a wide range of applications. For example, parts of images must be restored in areas obscured by clouds or cloud shadows or areas that must be anonymized. The need to cover a large area with the generated images faces [...] Read more.

Satellite image generation has a wide range of applications. For example, parts of images must be restored in areas obscured by clouds or cloud shadows or areas that must be anonymized. The need to cover a large area with the generated images faces the challenge that separately generated images must maintain the structural and color continuity between the adjacent generated images as well as the actual ones. This study presents a modified architecture of the generative adversarial network (GAN) pix2pix that ensures the integrity of the generated remote sensing images. The pix2pix model comprises a U-Net generator and a PatchGAN discriminator. The generator was modified by expanding the input set with images representing the known parts of ground truth and the respective mask. Data used for the generative model consist of Sentinel-2 (S2) RGB satellite imagery as the target data and OpenStreetMap mapping data as the input. Since forested areas and fields dominate in images, a Kneedle clusterization method was applied to create datasets that better represent the other classes, such as buildings and roads. The original and updated models were trained on different datasets and their results were evaluated using gradient magnitude (GM), Fréchet inception distance (FID), structural similarity index measure (SSIM), and multiscale structural similarity index measure (MS-SSIM) metrics. The models with the updated architecture show improvement in gradient magnitude, SSIM, and MS-SSIM values for all datasets. The average GMs of the junction region and the full image are similar (do not exceed 7%) for the images generated using the modified architecture whereas it is more than 13% higher in the junction area for the images generated using the original architecture. The importance of class balancing is demonstrated by the fact that, for both architectures, models trained on the dataset with a higher ratio of classes representing buildings and roads compared to the models trained on the dataset without clusterization have more than 10% lower FID (162.673 to 190.036 for pix2pix and 173.408 to 195.621 for the modified architecture) and more than 5% higher SSIM (0.3532 to 0.3284 for pix2pix and 0.3575 to 0.3345 for the modified architecture) and MS-SSIM (0.3532 to 0.3284 for pix2pix and 0.3575 to 0.3345 for the modified architecture) values. Full article

► Show Figures

Figure 1

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI