MDPI - Publisher of Open Access Journals

18 pages, 2545 KiB

Open AccessArticle

Reliable Indoor Fire Detection Using Attention-Based 3D CNNs: A Fire Safety Engineering Perspective

by Mostafa M. E. H. Ali and Maryam Ghodrat

Fire 2025, 8(7), 285; https://doi.org/10.3390/fire8070285 - 21 Jul 2025

Viewed by 534

Despite recent advances in deep learning for fire detection, much of the current research prioritizes model-centric metrics over dataset fidelity, particularly from a fire safety engineering perspective. Commonly used datasets are often dominated by fully developed flames, mislabel smoke-only frames as non-fire, or [...] Read more.

Despite recent advances in deep learning for fire detection, much of the current research prioritizes model-centric metrics over dataset fidelity, particularly from a fire safety engineering perspective. Commonly used datasets are often dominated by fully developed flames, mislabel smoke-only frames as non-fire, or lack intra-video diversity due to redundant frames from limited sources. Some works treat smoke detection alone as early-stage detection, even though many fires (e.g., electrical or chemical) begin with visible flames and no smoke. Additionally, attempts to improve model applicability through mixed-context datasets—combining indoor, outdoor, and wildland scenes—often overlook the unique false alarm sources and detection challenges specific to each environment. To address these limitations, we curated a new video dataset comprising 1108 annotated fire and non-fire clips captured via indoor surveillance cameras. Unlike existing datasets, ours emphasizes early-stage fire dynamics (pre-flashover) and includes varied fire sources (e.g., sofa, cupboard, and attic fires), realistic false alarm triggers (e.g., flame-colored objects, artificial lighting), and a wide range of spatial layouts and illumination conditions. This collection enables robust training and benchmarking for early indoor fire detection. Using this dataset, we developed a spatiotemporal fire detection model based on the mixed convolutions ResNets (MC3_18) architecture, augmented with Convolutional Block Attention Modules (CBAM). The proposed model achieved 86.11% accuracy, 88.76% precision, and 84.04% recall, along with low false positive (11.63%) and false negative (15.96%) rates. Compared to its CBAM-free baseline, the model exhibits notable improvements in F1-score and interpretability, as confirmed by Grad-CAM++ visualizations highlighting attention to semantically meaningful fire features. These results demonstrate that effective early fire detection is inseparable from high-quality, context-specific datasets. Our work introduces a scalable, safety-driven approach that advances the development of reliable, interpretable, and deployment-ready fire detection systems for residential environments. Full article

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Fire and Flame Detection)

► Show Figures

Figure 1

21 pages, 1115 KiB

Open AccessArticle

Non-Contact Oxygen Saturation Estimation Using Deep Learning Ensemble Models and Bayesian Optimization

by Andrés Escobedo-Gordillo, Jorge Brieva and Ernesto Moya-Albor

Technologies 2025, 13(7), 309; https://doi.org/10.3390/technologies13070309 - 19 Jul 2025

Viewed by 385

Abstract

Monitoring Peripheral Oxygen Saturation (SpO₂) is an important vital sign both in Intensive Care Units (ICUs), during surgery and convalescence, and as part of remote medical consultations after of the COVID-19 pandemic. This has made the development of new SpO₂ [...] Read more.

Monitoring Peripheral Oxygen Saturation (SpO₂) is an important vital sign both in Intensive Care Units (ICUs), during surgery and convalescence, and as part of remote medical consultations after of the COVID-19 pandemic. This has made the development of new SpO₂-measurement tools an area of active research and opportunity. In this paper, we present a new Deep Learning (DL) combined strategy to estimate SpO₂ without contact, using pre-magnified facial videos to reveal subtle color changes related to blood flow and with no calibration per subject required. We applied the Eulerian Video Magnification technique using the Hermite Transform (EVM-HT) as a feature detector to feed a Three-Dimensional Convolutional Neural Network (3D-CNN). Additionally, parameters and hyperparameter Bayesian optimization and an ensemble technique over the dataset magnified were applied. We tested the method on 18 healthy subjects, where facial videos of the subjects, including the automatic detection of the reference from a contact pulse oximeter device, were acquired. As performance metrics for the SpO₂-estimation proposal, we calculated the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and other parameters from the Bland–Altman (BA) analysis with respect to the reference. Therefore, a significant improvement was observed by adding the ensemble technique with respect to the only optimization, obtaining 14.32% in RMSE (reduction from 0.6204 to 0.5315) and 13.23% in MAE (reduction from 0.4323 to 0.3751). On the other hand, regarding Bland–Altman analysis, the upper and lower limits of agreement for the Mean of Differences (MOD) between the estimation and the ground truth were 1.04 and −1.05, with an MOD (bias) of −0.00175; therefore, MOD

\pm 1.96 σ

= −0.00175 ± 1.04. Thus, by leveraging Bayesian optimization for hyperparameter tuning and integrating a Bagging Ensemble, we achieved a significant reduction in the training error (bias), achieving a better generalization over the test set, and reducing the variance in comparison with the baseline model for SpO₂ estimation. Full article

(This article belongs to the Section Assistive Technologies)

► Show Figures

Figure 1

21 pages, 9571 KiB

Open AccessArticle

Performance Evaluation of Real-Time Image-Based Heat Release Rate Prediction Model Using Deep Learning and Image Processing Methods

by Joohyung Roh, Sehong Min and Minsuk Kong

Fire 2025, 8(7), 283; https://doi.org/10.3390/fire8070283 - 18 Jul 2025

Viewed by 518

Abstract

Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of [...] Read more.

Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of real-time predictive capability. Therefore, this study proposes an image-based HRR prediction model that uses deep learning and image processing techniques. The flame region in a fire video was segmented using the YOLO-YCbCr model, which integrates YCbCr color-space-based segmentation with YOLO object detection. For comparative analysis, the YOLO segmentation model was used. Furthermore, the fire diameter and flame height were determined from the spatial information of the segmented flame, and the HRR was predicted based on the correlation between flame size and HRR. The proposed models were applied to various experimental fire videos, and their prediction performances were quantitatively assessed. The results indicated that the proposed models accurately captured the HRR variations over time, and applying the average flame height calculation enhanced the prediction performance by reducing fluctuations in the predicted HRR. These findings demonstrate that the image-based HRR prediction model can be used to estimate real-time HRR values in diverse fire environments. Full article

► Show Figures

Figure 1

27 pages, 6541 KiB

Open AccessArticle

Multi-Object-Based Efficient Traffic Signal Optimization Framework via Traffic Flow Analysis and Intensity Estimation Using UCB-MRL-CSFL

by Zainab Saadoon Naser, Hend Marouane and Ahmed Fakhfakh

Vehicles 2025, 7(3), 72; https://doi.org/10.3390/vehicles7030072 - 11 Jul 2025

Viewed by 437

Abstract

Traffic congestion has increased significantly in today’s rapidly urbanizing world, influencing people’s daily lives. Traffic signal control systems (TSCSs) play an important role in alleviating congestion by optimizing traffic light timings and improving road efficiency. Yet traditional TSCSs neglected pedestrians, cyclists, and other [...] Read more.

Traffic congestion has increased significantly in today’s rapidly urbanizing world, influencing people’s daily lives. Traffic signal control systems (TSCSs) play an important role in alleviating congestion by optimizing traffic light timings and improving road efficiency. Yet traditional TSCSs neglected pedestrians, cyclists, and other non-monitored road users, degrading traffic signal optimization (TSO). Therefore, this framework proposes a multi-object-based traffic flow analysis and intensity estimation model for efficient TSO using Upper Confidence Bound Multi-agent Reinforcement Learning Cubic Spline Fuzzy Logic (UCB-MRL-CSFL). Initially, the real-time traffic videos undergo frame conversion and redundant frame removal, followed by preprocessing. Then, the lanes are detected; further, the objects are detected using Temporal Context You Only Look Once (TC-YOLO). Now, the object counting in each lane is carried out using the Cumulative Vehicle Motion Kalman Filter (CVMKF), followed by queue detection using Vehicle Density Mapping (VDM). Next, the traffic flow is analyzed by Feature Variant Optical Flow (FVOF), followed by traffic intensity estimation. Now, based on the siren flashlight colors, emergency vehicles are separated. Lastly, UCB-MRL-CSFL optimizes the Traffic Signals (TSs) based on the separated emergency vehicle, pedestrian information, and traffic intensity. Therefore, the proposed framework outperforms the other conventional methodologies for TSO by considering pedestrians, cyclists, and so on, with higher computational efficiency (94.45%). Full article

► Show Figures

Figure 1

22 pages, 7735 KiB

Open AccessArticle

Visual Perception of Peripheral Screen Elements: The Impact of Text and Background Colors

by Snježana Ivančić Valenko, Marko Čačić, Ivana Žiljak Stanimirović and Anja Zorko

Appl. Sci. 2025, 15(14), 7636; https://doi.org/10.3390/app15147636 - 8 Jul 2025

Viewed by 383

Abstract

Visual perception of screen elements depends on their color, font, and position in the user interface design. Objects in the central part of the screen are perceived more easily than those in the peripheral areas. However, the peripheral space is valuable for applications [...] Read more.

Visual perception of screen elements depends on their color, font, and position in the user interface design. Objects in the central part of the screen are perceived more easily than those in the peripheral areas. However, the peripheral space is valuable for applications like advertising and promotion and should not be overlooked. Optimizing the design of elements in this area can improve user attention to peripheral visual stimuli during focused tasks. This study aims to evaluate how different combinations of text and background color affect the visibility of moving textual stimuli in the peripheral areas of the screen, while attention is focused on a central task. This study investigates how background color, combined with white or black text, affects the attention of participants. It also identifies which background color makes a specific word most noticeable in the peripheral part of the screen. We designed quizzes to present stimuli with black or white text on various background colors in the peripheral regions of the screen. The background colors tested were blue, red, yellow, green, white, and black. While saturation and brightness were kept constant, the color tone was varied. Among ten combinations of background and text color, we aimed to determine the most noticeable combination in the peripheral part of the screen. The combination of white text on a blue background resulted in the shortest detection time (1.376 s), while black text on a white background achieved the highest accuracy rate at 79%. The results offer valuable insights for improving peripheral text visibility in user interfaces across various visual communication domains such as video games, television content, and websites, where peripheral information must remain noticeable despite centrally focused user attention and complex viewing conditions. Full article

► Show Figures

Figure 1

23 pages, 3946 KiB

Open AccessArticle

The Impact of Color Blindness on Player Engagement and Emotional Experiences: A Multimodal Study in a Game-Based Environment

by Merve Tillem and Ahmet Gün

Multimodal Technol. Interact. 2025, 9(6), 62; https://doi.org/10.3390/mti9060062 - 13 Jun 2025

Viewed by 589

Abstract

Color blindness can create challenges in recognizing visual cues, potentially affecting players’ performance, emotional involvement, and overall gaming experience. This study examines the impact of color blindness on player engagement and emotional experiences in digital games. The research aims to analyze how color-blind [...] Read more.

Color blindness can create challenges in recognizing visual cues, potentially affecting players’ performance, emotional involvement, and overall gaming experience. This study examines the impact of color blindness on player engagement and emotional experiences in digital games. The research aims to analyze how color-blind individuals engage with and emotionally respond to games, offering insights into more inclusive and accessible game design. An experiment-based study was conducted using a between-group design with a total of 13 participants, including 5 color-blind and 8 non-color-blind participants (aged 18–30). The sample was carefully selected to ensure participants had similar levels of digital gaming experience and familiarity with digital games, reducing potential biases related to skill or prior exposure. A custom-designed game, “Color Quest,” was developed to assess engagement and emotional responses. Emotional responses were measured through Emotion AI analysis, video recordings, and self-reported feedback forms. Participants were also asked to rate their engagement and emotional experience on a 1 to 5 scale, with additional qualitative feedback collected for deeper insights. The results indicate that color-blind players generally reported lower engagement levels compared to non-color-blind players. Although quantitative data did not reveal a direct correlation between color blindness and visual experience, self-reported feedback suggests that color-related design choices negatively impact emotional involvement and player immersion. Furthermore, in the survey responses from participants, color-blind individuals rated their experiences lower compared to individuals with normal vision. Participants emphasized that certain visual elements created difficulties in gameplay, and alternative sensory cues, such as audio feedback, helped mitigate these challenges. This study presents an experimental evaluation of color blindness in gaming, emphasizing how sensory adaptation strategies can support player engagement and emotional experience. This study contributes to game accessibility research by highlighting the importance of perceptual diversity and inclusive sensory design in enhancing player engagement for color-blind individuals. Full article

(This article belongs to the Special Issue Multimodal User Interfaces and Experiences: Challenges, Applications, and Perspectives—2nd Edition)

► Show Figures

Figure 1

17 pages, 3120 KiB

Open AccessArticle

LAAVOS: A DeAOT-Based Approach for Medaka Larval Ventricular Video Segmentation

by Kai Rao, Minghao Wang and Shutan Xu

Appl. Sci. 2025, 15(12), 6537; https://doi.org/10.3390/app15126537 - 10 Jun 2025

Viewed by 429

Abstract

Accurate segmentation of the ventricular region in embryonic heart videos of medaka fish (Oryzias latipes) holds significant scientific value for research on heart development mechanisms. However, existing medaka ventricular datasets are overly simplistic and fail to meet practical application requirements. And [...] Read more.

Accurate segmentation of the ventricular region in embryonic heart videos of medaka fish (Oryzias latipes) holds significant scientific value for research on heart development mechanisms. However, existing medaka ventricular datasets are overly simplistic and fail to meet practical application requirements. And the video frames contain multiple complex interfering factors, including optical interference from the filming environment, dynamic color changes caused by blood flow, significant diversity in ventricular scales, image blurring in certain video frames, high similarity in organ structures, and indistinct boundaries between the ventricles and atria. These challenges mean existing methods still face notable technical difficulties in medaka embryonic ventricular segmentation tasks. To address these challenges, this study first constructs a medaka embryonic ventricular video dataset containing 4200 frames with pixel-level annotations. Building upon this, we propose a semi-supervised video segmentation model based on the hierarchical propagation feature decoupling framework (DeAOT) and innovatively design an architecture that combines the LA-ResNet encoder with the AFPViS decoder, significantly improving the accuracy of medaka ventricular segmentation. Experimental results demonstrate that, compared to the traditional U-Net model, our method achieves a 13.48% improvement in the mean Intersection over Union (mIoU) metric. Additionally, compared to the state-of-the-art DeAOT method, it achieves a notable 4.83% enhancement in the comprehensive evaluation metric Jaccard and F-measure (J&F), providing reliable technical support for research on embryonic heart development. Full article

(This article belongs to the Special Issue Pattern Recognition in Video Processing)

► Show Figures

Figure 1

10 pages, 1365 KiB

Open AccessArticle

Elastographic Histogram Analysis as a Non-Invasive Tool for Detecting Early Intestinal Remodeling in Experimental IBD

by Rareș Crăciun, Marcel Tanțău and Cristian Tefas

J. Clin. Med. 2025, 14(11), 3992; https://doi.org/10.3390/jcm14113992 - 5 Jun 2025

Viewed by 456

Abstract

Background/Objectives: Inflammatory bowel disease (IBD), encompassing Crohn’s disease and ulcerative colitis, is characterized by cycles of inflammation and tissue remodeling that can culminate in fibrosis. Differentiating between early inflammatory and fibrotic bowel wall changes remains a diagnostic challenge due to overlapping imaging [...] Read more.

Background/Objectives: Inflammatory bowel disease (IBD), encompassing Crohn’s disease and ulcerative colitis, is characterized by cycles of inflammation and tissue remodeling that can culminate in fibrosis. Differentiating between early inflammatory and fibrotic bowel wall changes remains a diagnostic challenge due to overlapping imaging features. This study aimed to assess the potential of elastography, specifically pixel histogram analysis, as a non-invasive method to identify acute inflammatory changes in a rat model of 2,4,6-trinitrobenzenesulfonic (TNBS)-induced colitis. Methods: Female CRL:Wi rats were randomized into control and experimental groups, with the latter receiving intracolonic TNBS to induce acute colitis. On day 7 post-induction, all animals underwent ultrasonographic and strain elastographic assessment of the distal colon using a standardized protocol. Histogram-based analysis of red, green, and blue pixel distributions was performed on elastographic video frames. Results were compared with histologic grading of inflammation and fibrosis using hematoxylin-eosin and Masson’s trichrome staining. Results: Rats with TNBS-induced colitis exhibited significant weight loss, increased bowel wall thickness (31.5% vs. controls, p < 0.01), and elevated elastographic pixel intensity across all color channels (p < 0.05). Histologically, experimental animals showed severe inflammation and early submucosal fibrosis. A strong positive correlation was found between elastographic histogram values and histologic fibrosis scores (r = 0.86, p < 0.01), confirming the technique’s diagnostic relevance. Conclusions: Elastographic pixel histogram analysis is a reproducible, non-invasive approach capable of distinguishing acute inflammatory changes and early fibrotic remodeling in experimental colitis. These findings support its potential application as a diagnostic adjunct in the early assessment and monitoring of IBD-related bowel wall changes. Full article

(This article belongs to the Special Issue Novel Insights into the Diagnosis and Management of Inflammatory Bowel Disease (IBD))

► Show Figures

Figure 1

29 pages, 6716 KiB

Open AccessArticle

Mitigating Transmission Errors: A Forward Error Correction-Based Framework for Enhancing Objective Video Quality

by Muhammad Babar Imtiaz and Rabia Kamran

Sensors 2025, 25(11), 3503; https://doi.org/10.3390/s25113503 - 1 Jun 2025

Viewed by 776

Abstract

In video transmission, maintaining high visual quality under variable network conditions, including bandwidth and efficiency, is essential for optimal viewer experience. Channel errors or malicious attacks during transmission can cause degradation in video quality, affecting its secure transmission and putting its confidentiality and [...] Read more.

In video transmission, maintaining high visual quality under variable network conditions, including bandwidth and efficiency, is essential for optimal viewer experience. Channel errors or malicious attacks during transmission can cause degradation in video quality, affecting its secure transmission and putting its confidentiality and integrity at risk. This paper presents a novel approach to enhancing objective video quality by integrating an energy-efficient forward error correction (FEC) technique into video encoding and transmission processes. Moreover, it ensures that the video contents remain secure and unintelligible to unauthorized parties. This is achieved by combining H.264/AVC syntax-based encryption and decryption algorithms with error correction during the video coding process to provide end-to-end confidentiality. Unlike traditional error correction strategies, our approach dynamically adjusts redundancy levels based on real-time network conditions, optimizing bandwidth utilization without compromising quality. The proposed framework is evaluated across full reference objective video quality metrics, demonstrating significant improvements in the peak signal-to-noise ratio (PSNR) and PSNR₆₁₁ of the recovered videos. Experiments are carried out on multiple test video sequences with different video resolutions having various characteristics, i.e., colors, motions, and structures, and confirm that the FEC-based solution effectively detects and corrects packet loss and transmission errors without the need for retransmission, reducing the impact of channel noise and accidental disruptions on visual quality in challenging network environments. This study contributes to the development of resilient video transmission systems with reduced computational complexity of the codec and provides insights into the role of FEC in addressing quality degradation in modern multimedia applications where low latency is crucial. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

22 pages, 20735 KiB

Open AccessArticle

High-Throughput ORB Feature Extraction on Zynq SoC for Real-Time Structure-from-Motion Pipelines

by Panteleimon Stamatakis and John Vourvoulakis

J. Imaging 2025, 11(6), 178; https://doi.org/10.3390/jimaging11060178 - 28 May 2025

Viewed by 615

Abstract

This paper presents a real-time system for feature detection and description, the first stage in a structure-from-motion (SfM) pipeline. The proposed system leverages an optimized version of the ORB algorithm (oriented FAST and rotated BRIEF) implemented on the Digilent Zybo Z7020 FPGA board [...] Read more.

This paper presents a real-time system for feature detection and description, the first stage in a structure-from-motion (SfM) pipeline. The proposed system leverages an optimized version of the ORB algorithm (oriented FAST and rotated BRIEF) implemented on the Digilent Zybo Z7020 FPGA board equipped with the Xilinx Zynq-7000 SoC. The system accepts real-time video input (60 fps, 1920 × 1080 resolution, 24-bit color) via HDMI or a camera module. In order to support high frame rates for full-HD images, a double-data-rate pipeline scheme was adopted for Harris functions. Gray-scale video with features identified in red is exported through a separate HDMI port. Feature descriptors are calculated inside the FPGA by Zynq’s programmable logic and verified using Xilinx’s ILA IP block on a connected computer running Vivado. The implemented system achieves a latency of 192.7 microseconds, which is suitable for real-time applications. The proposed architecture is evaluated in terms of repeatability, matching retention and matching accuracy in several image transformations. It meets satisfactory accuracy and performance considering that there are slight changes between successive frames. This work paves the way for future research on the implementation of the remaining stages of a real-time SfM pipeline on the proposed hardware platform. Full article

(This article belongs to the Special Issue Recent Techniques in Image Feature Extraction)

► Show Figures

Figure 1

15 pages, 671 KiB

Open AccessFeature PaperArticle

A Simultaneous Decomposition for a Quaternion Tensor Quaternity with Applications

by Jia-Wei Huo, Yun-Ze Xu and Zhuo-Heng He

Mathematics 2025, 13(10), 1679; https://doi.org/10.3390/math13101679 - 20 May 2025

Viewed by 306

Abstract

Quaternion tensor decompositions have recently been the center of focus due to their wide potential applications in color data processing. In this paper, we establish a simultaneous decomposition for a quaternion tensor quaternity under Einstein product. The decomposition brings the quaternity of four [...] Read more.

Quaternion tensor decompositions have recently been the center of focus due to their wide potential applications in color data processing. In this paper, we establish a simultaneous decomposition for a quaternion tensor quaternity under Einstein product. The decomposition brings the quaternity of four quaternion tensors into a canonical form, which only has 0 and 1 entries. The structure of the canonical form is discussed in detail. Moreover, the proposed decomposition is applied to a new framework of color video encryption and decryption based on discrete wavelet transform. This new approach can realize simultaneous encryption and compression with high security. Full article

(This article belongs to the Special Issue Advanced Numerical Linear Algebra)

► Show Figures

Figure 1

26 pages, 11273 KiB

Open AccessArticle

DREFNet: Deep Residual Enhanced Feature GAN for VVC Compressed Video Quality Improvement

by Tanni Das and Kiho Choi

Mathematics 2025, 13(10), 1609; https://doi.org/10.3390/math13101609 - 14 May 2025

Viewed by 436

Abstract

In recent years, the use of video content has experienced exponential growth. The rapid growth of video content has led to an increased reliance on various video codecs for efficient compression and transmission. However, several challenges are associated with codecs such as H.265/High [...] Read more.

In recent years, the use of video content has experienced exponential growth. The rapid growth of video content has led to an increased reliance on various video codecs for efficient compression and transmission. However, several challenges are associated with codecs such as H.265/High Efficiency Video Coding and H.266/Versatile Video Coding (VVC) that can impact video quality and performance. One significant challenge is the trade-off between compression efficiency and visual quality. While advanced codecs can significantly reduce file sizes, they introduce artifacts such as blocking, blurring, and color distortion, particularly in high-motion scenes. Different compression tools in modern video codecs are vital for minimizing artifacts that arise during the encoding and decoding processes. While the advanced algorithms used by these modern codecs can effectively decrease file sizes and enhance compression efficiency, they frequently find it challenging to eliminate artifacts entirely. By utilizing advanced techniques such as post-processing after the initial decoding, this method can significantly improve visual clarity and restore details that may have been compromised during compression. In this paper, we introduce a Deep Residual Enhanced Feature Generative Adversarial Network as a post-processing method aimed at further improving the quality of reconstructed frames from the advanced codec VVC. By utilizing the benefits of Deep Residual Blocks and Enhanced Feature Blocks, the generator network aims to make the reconstructed frame as similar as possible to the original frame. The discriminator network, a crucial element of our proposed method, plays a vital role in guiding the generator by evaluating the authenticity of generated frames. By distinguishing between fake and original frames, the discriminator enables the generator to improve the quality of its output. This feedback mechanism ensures that the generator learns to create more realistic frames, ultimately enhancing the overall performance of the model. The proposed method shows significant gain for Random Access (RA) and All Intra (AI) configurations while improving Video Multimethod Assessment Fusion (VMAF) and Multi-Scale Structural Similarity Index Measure (MS-SSIM). Considering VMAF, our proposed method can obtain 13.05% and 11.09% Bjøntegaard Delta Rate (BD-Rate) gain for RA and AI configuration, respectively. In the case of the luma component MS-SSIM, RA and AI configurations get, respectively, 5.00% and 5.87% BD-Rate gain after employing our suggested proposed network. Full article

(This article belongs to the Special Issue Intelligent Computing with Applications in Computer Vision)

► Show Figures

Figure 1

18 pages, 3964 KiB

Open AccessArticle

Region-of-Interest Extraction Method to Increase Object-Detection Performance in Remote Monitoring System

by Hyeong-GI Jeon and Kyoung-Hee Lee

Appl. Sci. 2025, 15(10), 5328; https://doi.org/10.3390/app15105328 - 10 May 2025

Viewed by 546

Abstract

This study proposes an image data preprocessing method to improve the efficiency of transmitting and processing images for object detection in distributed IoT systems such as digital CCTV. The proposed method prepares a background image using Gaussian Mixture-based modeling with a series of [...] Read more.

This study proposes an image data preprocessing method to improve the efficiency of transmitting and processing images for object detection in distributed IoT systems such as digital CCTV. The proposed method prepares a background image using Gaussian Mixture-based modeling with a series of continuous images in the video. Then it is used as the reference to be compared with a target image to extract the ROIs by our DSSIM-based area filtering algorithm. The background areas beside the ROIs in the image are filled with a single color—either black or white to reduce data size, or a highly saturated color to improve object detection performance. Our implementation results confirm that the proposed method can considerably reduce the network overhead and the processing time at the server side. From additional experiments, we found that the model’s inference time and accuracy for object detection can be significantly improved when our two new ideas are applied: expanding ROI areas to improve the objectness of each object in the image and filling the background with a highly saturated color. Full article

(This article belongs to the Special Issue Communication Technology for Smart Mobility Systems)

► Show Figures

Figure 1

19 pages, 5898 KiB

Open AccessArticle

Multi-Module Combination for Underwater Image Enhancement

by Zhe Jiang, Huanhuan Wang, Gang He, Jiawang Chen, Wei Feng and Gaosheng Luo

Appl. Sci. 2025, 15(9), 5200; https://doi.org/10.3390/app15095200 - 7 May 2025

Viewed by 501

Abstract

Underwater observation and operation for divers and underwater robots still largely depend on optic methods, such as cameras videos, etc. However, due to the poor quality of images captured in murky waters, underwater operations in such areas are greatly hindered. In order to [...] Read more.

Underwater observation and operation for divers and underwater robots still largely depend on optic methods, such as cameras videos, etc. However, due to the poor quality of images captured in murky waters, underwater operations in such areas are greatly hindered. In order to solve the issue of degraded images, this paper proposes a multi-module combination method (UMMC) for underwater image enhancement. This is a new solution for processing a single image. Specifically, the process consists of five modules. With five separate modules working in tandem, UMMC provides the flexibility to address key challenges such as color distortion, haze, and low contrast. The UMMC framework starts with a color deviation detection module that intelligently separates images with and without color deviation, followed by a color and white balance correction module to restore accurate color. Effective defogging is then performed using a rank-one prior matrix-based approach, while a reference curve transformation adaptively enhances the contrast. Finally, the fusion module combines the visibility and contrast functions with reference to two weights to produce clear and natural results. A large number of experimental results demonstrate the effectiveness of the method proposed in this paper, which shows good performance compared to existing algorithms, both on real and synthetic data. Full article

► Show Figures

Figure 1

15 pages, 3719 KiB

Open AccessArticle

Enhancing Human Pose Transfer with Convolutional Block Attention Module and Facial Loss Optimization

by Hsu-Yung Cheng, Chun-Chen Chiang, Chi-Lun Jiang and Chih-Chang Yu

Electronics 2025, 14(9), 1855; https://doi.org/10.3390/electronics14091855 - 1 May 2025

Viewed by 519

Abstract

Pose transfer methods often struggle to simultaneously preserve fine-grained clothing textures and facial details, especially under large pose variations. To address these limitations, we propose a model based on the Multi-scale attention guided pose transfer model, with modifications to its residual block by [...] Read more.

Pose transfer methods often struggle to simultaneously preserve fine-grained clothing textures and facial details, especially under large pose variations. To address these limitations, we propose a model based on the Multi-scale attention guided pose transfer model, with modifications to its residual block by integrating the convolutional block attention module and changing the activation function from ReLU to Mish to capture more features related to clothing and skin color. Additionally, as the generated images had facial features differing from the original image, we propose two different facial feature loss functions to help the model learn more precise image features. According to the experimental results, the proposed method demonstrates superior performance compared to the Multi-scale Attention Guided Pose Transfer (MAGPT) on the DeepFashion dataset, achieving a 3.41% reduction in FID, a 0.65% improvement in SSIM, a 2% decrease in LPIPS, and a 2.7% decrease in LPIPS. Ultimately, only one reference image is required to enable users to transform into different action videos with the proposed system architecture. Full article

(This article belongs to the Special Issue Machine Learning Techniques for Image Processing)

► Show Figures

Figure 1

Search Results (374)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (374)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI