MDPI - Publisher of Open Access Journals

20 pages, 4264 KB

Open AccessArticle

Skeleton-Guided Diffusion for Font Generation

by Li Zhao, Shan Dong, Jiayi Liu, Xijin Zhang, Xiaojiao Gao and Xiaojun Wu

Electronics 2025, 14(19), 3932; https://doi.org/10.3390/electronics14193932 - 3 Oct 2025

Generating non-standard fonts, such as running script (e.g., XingShu), poses significant challenges due to their high stroke continuity, structural flexibility, and stylistic diversity, which traditional component-based prior knowledge methods struggle to model effectively. While diffusion models excel at capturing continuous feature spaces and [...] Read more.

Generating non-standard fonts, such as running script (e.g., XingShu), poses significant challenges due to their high stroke continuity, structural flexibility, and stylistic diversity, which traditional component-based prior knowledge methods struggle to model effectively. While diffusion models excel at capturing continuous feature spaces and stroke variations through iterative denoising, they face critical limitations: (1) style leakage, where large stylistic differences lead to inconsistent outputs due to noise interference; (2) structural distortion, caused by the absence of explicit structural guidance, resulting in broken strokes or deformed glyphs; and (3) style confusion, where similar font styles are inadequately distinguished, producing ambiguous results. To address these issues, we propose a novel skeleton-guided diffusion model with three key innovations: (1) a skeleton-constrained style rendering module that enforces semantic alignment and balanced energy constraints to amplify critical skeletal features, mitigating style leakage and ensuring stylistic consistency; (2) a cross-scale skeleton preservation module that integrates multi-scale glyph skeleton information through cross-dimensional interactions, effectively modeling macro-level layouts and micro-level stroke details to prevent structural distortions; (3) a contrastive style refinement module that leverages skeleton decomposition and recombination strategies, coupled with contrastive learning on positive and negative samples, to establish robust style representations and disambiguate similar styles. Extensive experiments on diverse font datasets demonstrate that our approach significantly improves the generation quality, achieving superior style fidelity, structural integrity, and style differentiation compared to state-of-the-art diffusion-based font generation methods. Full article

(This article belongs to the Special Issue Artificial Intelligence for Smart Image Perception, Recognition and Understanding)

21 pages, 3715 KB

Open AccessArticle

SPIRIT: Symmetry-Prior Informed Diffusion for Thangka Segmentation

by Yukai Xian, Yurui Lee, Liang Yan, Te Shen, Ping Lan, Qijun Zhao and Yi Zhang

Symmetry 2025, 17(10), 1643; https://doi.org/10.3390/sym17101643 - 3 Oct 2025

Abstract

Thangka paintings, as intricate forms of Tibetan Buddhist art, present unique challenges for image segmentation due to their densely arranged symbolic elements, complex color patterns, and strong structural symmetry. To address these difficulties, we propose SPIRIT, a structure-aware and prompt-guided diffusion segmentation framework [...] Read more.

Thangka paintings, as intricate forms of Tibetan Buddhist art, present unique challenges for image segmentation due to their densely arranged symbolic elements, complex color patterns, and strong structural symmetry. To address these difficulties, we propose SPIRIT, a structure-aware and prompt-guided diffusion segmentation framework tailored for Thangka images. Our method incorporates a support-query-encoding scheme to exploit limited labeled samples and introduces semantic guided attention fusion to integrate symbolic knowledge into the denoising process. Moreover, we design a symmetry-aware refinement module to explicitly preserve bilateral and radial symmetries, enhancing both accuracy and interpretability. Experimental results on our curated Thangka dataset and the artistic ArtBench benchmark demonstrate that our approach achieves 88.3% mIoU on Thangka and 86.1% mIoU on ArtBench, outperforming the strongest baseline by 6.1% and 5.6% mIoU, respectively. These results confirm that SPIRIT not only captures fine-grained details, but also excels in segmenting structurally complex regions of artistic imagery. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision)

33 pages, 4190 KB

Open AccessArticle

Preserving Songket Heritage Through Intelligent Image Retrieval: A PCA and QGD-Rotational-Based Model

by Nadiah Yusof, Nazatul Aini Abd. Majid, Amirah Ismail and Nor Hidayah Hussain

Computers 2025, 14(10), 416; https://doi.org/10.3390/computers14100416 - 1 Oct 2025

Abstract

Malay songket motifs are a vital component of Malaysia’s intangible cultural heritage, characterized by intricate visual designs and deep cultural symbolism. However, the practical digital preservation and retrieval of these motifs present challenges, particularly due to the rotational variations typical in textile imagery. [...] Read more.

Malay songket motifs are a vital component of Malaysia’s intangible cultural heritage, characterized by intricate visual designs and deep cultural symbolism. However, the practical digital preservation and retrieval of these motifs present challenges, particularly due to the rotational variations typical in textile imagery. This study introduces a novel Content-Based Image Retrieval (CBIR) model that integrates Principal Component Analysis (PCA) for feature extraction and Quadratic Geometric Distance (QGD) for measuring similarity. To evaluate the model’s performance, a curated dataset comprising 413 original images and 4956 synthetically rotated songket motif images was utilized. The retrieval system featured metadata-driven preprocessing, dimensionality reduction, and multi-angle similarity assessment to address the issue of rotational invariance comprehensively. Quantitative evaluations using precision, recall, and F-measure metrics demonstrated that the proposed PCAQGD + Rotation technique achieved a mean F-measure of 59.72%, surpassing four benchmark retrieval methods. These findings confirm the model’s capability to accurately retrieve relevant motifs across varying orientations, thus supporting cultural heritage preservation efforts. The integration of PCA and QGD techniques effectively narrows the semantic gap between machine perception and human interpretation of motif designs. Future research should focus on expanding motif datasets and incorporating deep learning approaches to enhance retrieval precision, scalability, and applicability within larger national heritage repositories. Full article

► Show Figures

Graphical abstract

25 pages, 13955 KB

Open AccessArticle

Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12 for Improving Nighttime Pedestrian Detection in Security

by Lijuan Wang, Zuchao Bao and Dongming Lu

Appl. Sci. 2025, 15(19), 10607; https://doi.org/10.3390/app151910607 - 30 Sep 2025

Abstract

In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose [...] Read more.

In security applications, visible-light pedestrian detectors are highly sensitive to changes in illumination and fail under low-light or nighttime conditions, while infrared sensors, though resilient to lighting, often produce blurred object boundaries that hinder precise localization. To address these complementary limitations, we propose a practical multimodal pipeline—Adaptive Energy–Gradient–Contrast (EGC) Fusion with AIFI-YOLOv12—that first fuses infrared and low-light visible images using per-pixel weights derived from local energy, gradient magnitude and contrast measures, then detects pedestrians with an improved YOLOv12 backbone. The detector integrates an AIFI attention module at high semantic levels, replaces selected modules with A2C2f blocks to enhance cross-channel feature aggregation, and preserves P3–P5 outputs to improve small-object localization. We evaluate the complete pipeline on the LLVIP dataset and report Precision, Recall, mAP@50, mAP@50–95, GFLOPs, FPS and detection time, comparing against YOLOv8, YOLOv10–YOLOv12 baselines (n and s scales). Quantitative and qualitative results show that the proposed fusion restores complementary thermal and visible details and that the AIFI-enhanced detector yields more robust nighttime pedestrian detection while maintaining a competitive computational profile suitable for real-world security deployments. Full article

(This article belongs to the Special Issue Advanced Image Analysis and Processing Technologies and Applications)

15 pages, 3046 KB

Open AccessArticle

Enhancing Semantic Interoperability of Heritage BIM-Based Asset Preservation

by Karol Argasiński and Artur Tomczak

Heritage 2025, 8(10), 410; https://doi.org/10.3390/heritage8100410 - 30 Sep 2025

Abstract

Preservation of Cultural Heritage (CH) demands precise and comprehensive information representation to document, analyse, and manage assets effectively. While Building Information Modelling (BIM) facilitates as-is state documentation, challenges in semantic interoperability of complex cultural data often limit its potential in heritage contexts. This [...] Read more.

Preservation of Cultural Heritage (CH) demands precise and comprehensive information representation to document, analyse, and manage assets effectively. While Building Information Modelling (BIM) facilitates as-is state documentation, challenges in semantic interoperability of complex cultural data often limit its potential in heritage contexts. This study investigates the integration of BIM tools with the buildingSMART Data Dictionary (bSDD) platform to enhance semantic interoperability for heritage assets. Using a proof-of-concept approach, the research focuses on a historic tenement house in Tarnów, Poland, modelled with the IFC schema standard and enriched with the MIDAS heritage classification system. The methodology includes transforming the classification system into bSDD data dictionary, publishing thesauri for components, materials, and monument types, and semantic enrichment of the model using Bonsai (formerly BlenderBIM) plugin for Blender. Results demonstrate improved consistency, accuracy, and usability of BIM data for heritage preservation. The integration ensures detailed documentation and facilitates interoperability across platforms, addressing preservation challenges with enriched narratives of cultural significance. This method supports future predictive models for heritage asset conservation, emphasizing the importance of data quality and interoperability in safeguarding shared cultural heritage for future generations. Full article

(This article belongs to the Special Issue Cultural Heritage and New Technologies: NextGen Innovations in Preservation and Education)

► Show Figures

Figure 1

26 pages, 1089 KB

Open AccessReview

Collecting, Integrating and Processing IoT Sensor Data on Edge Devices for PD Monitoring: A Scoping Review

by Eleftherios Efkleidis Stefanou, Pavlos Bitilis, Georgios Bouchouras and Konstantinos Kotis

Appl. Sci. 2025, 15(19), 10541; https://doi.org/10.3390/app151910541 - 29 Sep 2025

Abstract

Bradykinesia and tremor are critical motor symptoms in diagnosing and monitoring Parkinson’s disease (PD), a progressive neurodegenerative disorder. The integration of IoT sensors, smartwatch technology, and edge computing has facilitated real-time collection, processing, and analysis of data related to these impairments, enabling continuous [...] Read more.

Bradykinesia and tremor are critical motor symptoms in diagnosing and monitoring Parkinson’s disease (PD), a progressive neurodegenerative disorder. The integration of IoT sensors, smartwatch technology, and edge computing has facilitated real-time collection, processing, and analysis of data related to these impairments, enabling continuous monitoring of PD beyond traditional clinical settings. This survey provides a comprehensive review of recent technological advancements in data collection from wearable IoT sensors and its semantic integration and processing on edge devices, emphasizing methods optimized for efficient and low-latency processing. Additionally, this survey explores AI-driven techniques for detecting and analyzing bradykinesia and tremor symptoms on edge devices. By leveraging localized computation on edge devices, these approaches facilitate energy efficiency, data privacy, and scalability, making them suitable for deployment in real environments. This paper also examines related open-source tools and datasets, assessing their roles in improving reproducibility and integration into these environments. Furthermore, key challenges, including variability in real environments, model generalization, and computational constraints, are discussed, along with potential strategies to enhance detection accuracy and system robustness. By bridging the gap between sensor data collection and integration, and AI-based detection of bradykinesia and tremor on edge devices, this survey intends to contribute to the development of efficient, scalable, and privacy-preserving healthcare solutions for continuous PD monitoring. Full article

(This article belongs to the Special Issue Advanced Internet of Things Ecosystems: Architectures, Intelligence, and Communication Innovations)

► Show Figures

Figure 1

40 pages, 19754 KB

Open AccessArticle

Trans-cVAE-GAN: Transformer-Based cVAE-GAN for High-Fidelity EEG Signal Generation

by Yiduo Yao, Xiao Wang, Xudong Hao, Hongyu Sun, Ruixin Dong and Yansheng Li

Bioengineering 2025, 12(10), 1028; https://doi.org/10.3390/bioengineering12101028 - 26 Sep 2025

Abstract

Electroencephalography signal generation remains a challenging task due to its non-stationarity, multi-scale oscillations, and strong spatiotemporal coupling. Conventional generative models, including VAEs and GAN variants such as DCGAN, WGAN, and WGAN-GP, often yield blurred waveforms, unstable spectral distributions, or lack semantic controllability, limiting [...] Read more.

Electroencephalography signal generation remains a challenging task due to its non-stationarity, multi-scale oscillations, and strong spatiotemporal coupling. Conventional generative models, including VAEs and GAN variants such as DCGAN, WGAN, and WGAN-GP, often yield blurred waveforms, unstable spectral distributions, or lack semantic controllability, limiting their effectiveness in emotion-related applications. To address these challenges, this research proposes a Transformer-based conditional variational autoencoder–generative adversarial network (Trans-cVAE-GAN) that combines Transformer-driven temporal modeling, label-conditioned latent inference, and adversarial learning. A multi-dimensional structural loss further constrains generation by preserving temporal correlation, frequency-domain consistency, and statistical distribution. Experiments on three SEED-family datasets—SEED, SEED-FRA, and SEED-GER—demonstrate high similarity to real EEG, with representative mean ± SD correlations of Pearson ≈ 0.84 ± 0.08/0.74 ± 0.12/0.84 ± 0.07 and Spearman ≈ 0.82 ± 0.07/0.72 ± 0.12/0.83 ± 0.08, together with low spectral divergence (KL ≈ 0.39 ± 0.15/0.41 ± 0.20/0.37 ± 0.18). Comparative analyses show consistent gains over classical GAN baselines, while ablations verify the indispensable roles of the Transformer encoder, label conditioning, and cVAE module. In downstream emotion recognition, augmentation with generated EEG raises accuracy from 86.9% to 91.8% on SEED (with analogous gains on SEED-FRA and SEED-GER), underscoring enhanced generalization and robustness. These results confirm that the proposed approach simultaneously ensures fidelity, stability, and controllability across cohorts, offering a scalable solution for affective computing and brain–computer interface applications. Full article

(This article belongs to the Special Issue Recent Advances in Machine Learning and Explainable Artificial Intelligence in Biomedical Data Mining, and Disease Diagnosis Frameworks)

► Show Figures

Figure 1

19 pages, 4247 KB

Open AccessArticle

Dynamic Visual Privacy Governance Using Graph Convolutional Networks and Federated Reinforcement Learning

by Chih Yang, Wei-Xun Lu and Ray-I Chang

Electronics 2025, 14(19), 3774; https://doi.org/10.3390/electronics14193774 - 24 Sep 2025

Viewed by 110

Abstract

The proliferation of image sharing on social media poses significant privacy risks. Although some previous works have proposed to detect privacy attributes in image sharing, they suffer from the following shortcomings: (1) reliance only on legacy architectures, (2) failure to model the label [...] Read more.

The proliferation of image sharing on social media poses significant privacy risks. Although some previous works have proposed to detect privacy attributes in image sharing, they suffer from the following shortcomings: (1) reliance only on legacy architectures, (2) failure to model the label correlations (i.e., semantic dependencies and co-occurrence patterns among privacy attributes) between privacy attributes, and (3) adoption of static, one-size-fits-all user preference models. To address these, we propose a comprehensive framework for visual privacy protection. First, we establish a new state-of-the-art (SOTA) architecture using modern vision backbones. Second, we introduce Graph Convolutional Networks (GCN) as a classifier head to counter the failure to model label correlations. Third, to replace static user models, we design a dynamic personalization module using Federated Learning (FL) for privacy preservation and Reinforcement Learning (RL) to continuously adapt to individual user preferences. Experiments on the VISPR dataset demonstrate that our approach can outperform the previous work by a substantial margin of 6% in mAP (52.88% vs. 46.88%) and improve the Overall F1-score by 10% (0.770 vs. 0.700). This provides more meaningful and personalized privacy recommendations, setting a new standard for user-centric privacy protection systems. Full article

(This article belongs to the Special Issue Security and Privacy for Modern Wireless Communication Systems, 3rd Edition)

► Show Figures

Figure 1

25 pages, 471 KB

Open AccessArticle

Mitigating Membership Inference Attacks via Generative Denoising Mechanisms

by Zhijie Yang, Xiaolong Yan, Guoguang Chen and Xiaoli Tian

Mathematics 2025, 13(19), 3070; https://doi.org/10.3390/math13193070 - 24 Sep 2025

Viewed by 138

Abstract

Membership Inference Attacks (MIAs) pose a significant threat to privacy in modern machine learning systems, enabling adversaries to determine whether a specific data record was used during model training. Existing defense techniques often degrade model utility or rely on heuristic noise injection, which [...] Read more.

Membership Inference Attacks (MIAs) pose a significant threat to privacy in modern machine learning systems, enabling adversaries to determine whether a specific data record was used during model training. Existing defense techniques often degrade model utility or rely on heuristic noise injection, which fails to provide a robust, mathematically grounded defense. In this paper, we propose Diffusion-Driven Data Preprocessing (D³P), a novel privacy-preserving framework leveraging generative diffusion models to transform sensitive training data before learning, thereby reducing the susceptibility of trained models to MIAs. Our method integrates a mathematically rigorous denoising process into a privacy-oriented diffusion pipeline, which ensures that the reconstructed data maintains essential semantic features for model utility while obfuscating fine-grained patterns that MIAs exploit. We further introduce a privacy–utility optimization strategy grounded in formal probabilistic analysis, enabling adaptive control of the diffusion noise schedule to balance attack resilience and predictive performance. Experimental evaluations across multiple datasets and architectures demonstrate that D³P significantly reduces MIA success rates by up to

42.3 %

compared to state-of-the-art defenses, with a less than

2.5 %

loss in accuracy. This work provides a theoretically principled and empirically validated pathway for integrating diffusion-based generative mechanisms into privacy-preserving AI pipelines, which is particularly suitable for deployment in cloud-based and blockchain-enabled machine learning environments. Full article

(This article belongs to the Special Issue Privacy-Preserving Techniques in AI, Blockchain and Cloud Systems with Formal Mathematical Analysis, 2nd Edition)

► Show Figures

Figure 1

25 pages, 12087 KB

Open AccessArticle

MSHEdit: Enhanced Text-Driven Image Editing via Advanced Diffusion Model Architecture

by Mingrui Yang, Jian Yuan, Jiahui Xu and Weishu Yan

Electronics 2025, 14(19), 3758; https://doi.org/10.3390/electronics14193758 - 23 Sep 2025

Viewed by 96

Abstract

To address limitations in structural preservation and detail fidelity in existing text-driven image editing methods, we propose MSHEdit—a novel editing framework built upon a pre-trained diffusion model. MSHEdit is designed to achieve high semantic alignment during image editing without the need for additional [...] Read more.

To address limitations in structural preservation and detail fidelity in existing text-driven image editing methods, we propose MSHEdit—a novel editing framework built upon a pre-trained diffusion model. MSHEdit is designed to achieve high semantic alignment during image editing without the need for additional training or fine-tuning. The framework integrates two key components: the High-Order Stable Diffusion Sampler (HOS-DEIS) and the Multi-Scale Window Residual Bridge Attention Module (MS-WRBA). HOS-DEIS enhances sampling precision and detail recovery by employing high-order integration and dynamic error compensation, while MS-WRBA improves editing region localization and edge blending through multi-scale window partitioning and dual-path normalization. Extensive experiments on public datasets including DreamBench-v2 and DreamBench++ demonstrate that compared to recent mainstream models, MSHEdit reduces structural distance by 2% and background LPIPS by 1.2%. These results demonstrate its ability to achieve natural transitions between edited regions and backgrounds in complex scenes while effectively mitigating object edge blurring. MSHEdit exhibits excellent structural preservation, semantic consistency, and detail restoration, providing an efficient and generalizable solution for high-quality text-driven image editing. Full article

► Show Figures

Figure 1

19 pages, 6027 KB

Open AccessArticle

An Improved HRNetV2-Based Semantic Segmentation Algorithm for Pipe Corrosion Detection in Smart City Drainage Networks

by Liang Gao, Xinxin Huang, Wanling Si, Feng Yang, Xu Qiao, Yaru Zhu, Tingyang Fu and Jianshe Zhao

J. Imaging 2025, 11(10), 325; https://doi.org/10.3390/jimaging11100325 - 23 Sep 2025

Viewed by 235

Abstract

Urban drainage pipelines are essential components of smart city infrastructure, supporting the safe and sustainable operation of underground systems. However, internal corrosion in pipelines poses significant risks to structural stability and public safety. In this study, we propose an enhanced semantic segmentation framework [...] Read more.

Urban drainage pipelines are essential components of smart city infrastructure, supporting the safe and sustainable operation of underground systems. However, internal corrosion in pipelines poses significant risks to structural stability and public safety. In this study, we propose an enhanced semantic segmentation framework based on High-Resolution Network Version 2 (HRNetV2) to accurately identify corroded regions in Traditional closed-circuit television (CCTV) images. The proposed method integrates a Convolutional Block Attention Module (CBAM) to strengthen the feature representation of corrosion patterns and introduces a Lightweight Pyramid Pooling Module (LitePPM) to improve multi-scale context modeling. By preserving high-resolution details through HRNetV2’s parallel architecture, the model achieves precise and robust segmentation performance. Experiments on a real-world corrosion dataset show that our approach attains a mean Intersection over Union (

m I o U

) of 95.92

\pm

0.03%,

R e c a l l

of 97.01

\pm

0.02%, and an overall

A c c u r a c y

of 98.54%. These results demonstrate the method’s effectiveness in supporting intelligent infrastructure inspection and provide technical insights for advancing automated maintenance systems in smart cities. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 1694 KB

Open AccessArticle

FAIR-Net: A Fuzzy Autoencoder and Interpretable Rule-Based Network for Ancient Chinese Character Recognition

by Yanling Ge, Yunmeng Zhang and Seok-Beom Roh

Sensors 2025, 25(18), 5928; https://doi.org/10.3390/s25185928 - 22 Sep 2025

Viewed by 150

Abstract

Ancient Chinese scripts—including oracle bone carvings, bronze inscriptions, stone steles, Dunhuang scrolls, and bamboo slips—are rich in historical value but often degraded due to centuries of erosion, damage, and stylistic variability. These issues severely hinder manual transcription and render conventional OCR techniques inadequate, [...] Read more.

Ancient Chinese scripts—including oracle bone carvings, bronze inscriptions, stone steles, Dunhuang scrolls, and bamboo slips—are rich in historical value but often degraded due to centuries of erosion, damage, and stylistic variability. These issues severely hinder manual transcription and render conventional OCR techniques inadequate, as they are typically trained on modern printed or handwritten text and lack interpretability. To tackle these challenges, we propose FAIR-Net, a hybrid architecture that combines the unsupervised feature learning capacity of a deep autoencoder with the semantic transparency of a fuzzy rule-based classifier. In FAIR-Net, the deep autoencoder first compresses high-resolution character images into low-dimensional, noise-robust embeddings. These embeddings are then passed into a Fuzzy Neural Network (FNN), whose hidden layer leverages Fuzzy C-Means (FCM) clustering to model soft membership degrees and generate human-readable fuzzy rules. The output layer uses Iteratively Reweighted Least Squares Estimation (IRLSE) combined with a Softmax function to produce probabilistic predictions, with all weights constrained as linear mappings to maintain model transparency. We evaluate FAIR-Net on CASIA-HWDB1.0, HWDB1.1, and ICDAR 2013 CompetitionDB, where it achieves a recognition accuracy of 97.91%, significantly outperforming baseline CNNs (p < 0.01, Cohen’s d > 0.8) while maintaining the tightest confidence interval (96.88–98.94%) and lowest standard deviation (±1.03%). Additionally, FAIR-Net reduces inference time to 25 s, improving processing efficiency by 41.9% over AlexNet and up to 98.9% over CNN-Fujitsu, while preserving >97.5% accuracy across evaluations. To further assess generalization to historical scripts, FAIR-Net was tested on the Ancient Chinese Character Dataset (9233 classes; 979,907 images), achieving 83.25% accuracy—slightly higher than ResNet101 but 2.49% lower than SwinT-v2-small—while reducing training time by over 5.5× compared to transformer-based baselines. Fuzzy rule visualization confirms enhanced robustness to glyph ambiguities and erosion. Overall, FAIR-Net provides a practical, interpretable, and highly efficient solution for the digitization and preservation of ancient Chinese character corpora. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 1070 KB

Open AccessArticle

Saliency-Guided Local Semantic Mixing for Long-Tailed Image Classification

by Jiahui Lv, Jun Lei, Jun Zhang, Chao Chen and Shuohao Li

Mach. Learn. Knowl. Extr. 2025, 7(3), 107; https://doi.org/10.3390/make7030107 - 22 Sep 2025

Viewed by 226

Abstract

In real-world visual recognition tasks, long-tailed distributions pose a widespread challenge, with extreme class imbalance severely limiting the representational learning capability of deep models. In practice, due to this imbalance, deep models often exhibit poor generalization performance on tail classes. To address this [...] Read more.

In real-world visual recognition tasks, long-tailed distributions pose a widespread challenge, with extreme class imbalance severely limiting the representational learning capability of deep models. In practice, due to this imbalance, deep models often exhibit poor generalization performance on tail classes. To address this issue, data augmentation through the synthesis of new tail-class samples has become an effective method. One popular approach is CutMix, which explicitly mixes images from tail and other classes, constructing labels based on the ratio of the regions cropped from both images. However, region-based labels completely ignore the inherent semantic information of the augmented samples. To overcome this problem, we propose a saliency-guided local semantic mixing (LSM) method, which uses differentiable block decoupling and semantic-aware local mixing techniques. This method integrates head-class backgrounds while preserving the key discriminative features of tail classes and dynamically assigns labels to effectively augment tail-class samples. This results in efficient balancing of long-tailed data distributions and significant improvements in classification performance. The experimental validation shows that this method demonstrates significant advantages across three long-tailed benchmark datasets, improving classification accuracy by 5.0%, 7.3%, and 6.1%, respectively. Notably, the LSM framework is highly compatible, seamlessly integrating with existing classification models and providing significant performance gains, validating its broad applicability. Full article

► Show Figures

Figure 1

30 pages, 10206 KB

Open AccessArticle

Evaluation and Improvement of Image Aesthetics Quality via Composition and Similarity

by Xinyu Cui, Guoqing Tu, Guoying Wang, Senjun Zhang and Lufeng Mo

Sensors 2025, 25(18), 5919; https://doi.org/10.3390/s25185919 - 22 Sep 2025

Viewed by 157

Abstract

The evaluation and enhancement of image aesthetics play a pivotal role in the development of visual media, impacting fields including photography, design, and computer vision. Composition, a key factor shaping visual aesthetics, significantly influences an image’s vividness and expressiveness. However, existing image optimization [...] Read more.

The evaluation and enhancement of image aesthetics play a pivotal role in the development of visual media, impacting fields including photography, design, and computer vision. Composition, a key factor shaping visual aesthetics, significantly influences an image’s vividness and expressiveness. However, existing image optimization methods face practical challenges: compression-induced distortion, imprecise object extraction, and cropping-caused unnatural proportions or content loss. To tackle these issues, this paper proposes an image aesthetic evaluation with composition and similarity (IACS) method that harmonizes composition aesthetics and image similarity through a unified function. When evaluating composition aesthetics, the method calculates the distance between the main semantic line (or salient object) and the nearest rule-of-thirds line or central line. For images featuring prominent semantic lines, a modified Hough transform is utilized to detect the main semantic line, while for images containing salient objects, a salient object detection method based on luminance channel salience features (LCSF) is applied to determine the salient object region. In evaluating similarity, edge similarity measured by the Canny operator is combined with the structural similarity index (SSIM). Furthermore, we introduce a Framework for Image Aesthetic Evaluation with Composition and Similarity-Based Optimization (FIACSO), which uses semantic segmentation and generative adversarial networks (GANs) to optimize composition while preserving the original content. Compared with prior approaches, the proposed method improves both the aesthetic appeal and fidelity of optimized images. Subjective evaluation involving 30 participants further confirms that FIACSO outperforms existing methods in overall aesthetics, compositional harmony, and content integrity. Beyond methodological contributions, this study also offers practical value: it supports photographers in refining image composition without losing context, assists designers in creating balanced layouts with minimal distortion, and provides computational tools to enhance the efficiency and quality of visual media production. Full article

(This article belongs to the Special Issue Recent Innovations in Computational Imaging and Sensing)

► Show Figures

Figure 1

23 pages, 3485 KB

Open AccessArticle

MSGS-SLAM: Monocular Semantic Gaussian Splatting SLAM

by Mingkai Yang, Shuyu Ge and Fei Wang

Symmetry 2025, 17(9), 1576; https://doi.org/10.3390/sym17091576 - 20 Sep 2025

Viewed by 436

Abstract

With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting [...] Read more.

With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting SLAM), which innovatively integrates monocular vision with three-dimensional Gaussian distribution models within a semantic SLAM framework. Our approach exploits the inherent spherical symmetries of isotropic Gaussian distributions, enabling symmetric optimization processes that maintain computational efficiency while preserving geometric consistency. Current mainstream three-dimensional Gaussian semantic SLAM systems typically rely on depth sensors for map reconstruction and semantic segmentation, which not only significantly increases hardware costs but also limits the deployment potential of systems in diverse scenarios. To overcome this limitation, this research introduces a depth estimation proxy framework based on Metric3D-V2, which effectively addresses the inherent deficiency of monocular vision systems in depth information acquisition. Additionally, our method leverages architectural symmetries in indoor environments to enhance semantic understanding through symmetric feature matching. Through this approach, the system achieves robust and efficient semantic feature integration and optimization without relying on dedicated depth sensors, thereby substantially reducing the dependency of three-dimensional Gaussian semantic SLAM systems on depth sensors and expanding their application scope. Furthermore, this research proposes a keyframe selection algorithm based on semantic guidance and proxy depth collaborative mechanisms, which effectively suppresses pose drift errors accumulated during long-term system operation, thereby achieving robust global loop closure correction. Through systematic evaluation on multiple standard datasets, MSGS-SLAM achieves comparable technical performance to existing three-dimensional Gaussian model-based semantic SLAM systems across multiple key performance metrics including ATE RMSE, PSNR, and mIoU. Full article

(This article belongs to the Section Engineering and Materials)

► Show Figures

Figure 1

Search Results (558)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (558)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI