Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (539)

Search Parameters:
Keywords = perceptual features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 6315 KiB  
Article
A Kansei-Oriented Morphological Design Method for Industrial Cleaning Robots Integrating Extenics-Based Semantic Quantification and Eye-Tracking Analysis
by Qingchen Li, Yiqian Zhao, Yajun Li and Tianyu Wu
Appl. Sci. 2025, 15(15), 8459; https://doi.org/10.3390/app15158459 - 30 Jul 2025
Viewed by 112
Abstract
In the context of Industry 4.0, user demands for industrial robots have shifted toward diversification and experience-orientation. Effectively integrating users’ affective imagery requirements into industrial-robot form design remains a critical challenge. Traditional methods rely heavily on designers’ subjective judgments and lack objective data [...] Read more.
In the context of Industry 4.0, user demands for industrial robots have shifted toward diversification and experience-orientation. Effectively integrating users’ affective imagery requirements into industrial-robot form design remains a critical challenge. Traditional methods rely heavily on designers’ subjective judgments and lack objective data on user cognition. To address these limitations, this study develops a comprehensive methodology grounded in Kansei engineering that combines Extenics-based semantic analysis, eye-tracking experiments, and user imagery evaluation. First, we used web crawlers to harvest user-generated descriptors for industrial floor-cleaning robots and applied Extenics theory to quantify and filter key perceptual imagery features. Second, eye-tracking experiments captured users’ visual-attention patterns during robot observation, allowing us to identify pivotal design elements and assemble a sample repository. Finally, the semantic differential method collected users’ evaluations of these design elements, and correlation analysis mapped emotional needs onto stylistic features. Our findings reveal strong positive correlations between four core imagery preferences—“dignified,” “technological,” “agile,” and “minimalist”—and their corresponding styling elements. By integrating qualitative semantic data with quantitative eye-tracking metrics, this research provides a scientific foundation and novel insights for emotion-driven design in industrial floor-cleaning robots. Full article
(This article belongs to the Special Issue Intelligent Robotics in the Era of Industry 5.0)
Show Figures

Figure 1

21 pages, 2267 KiB  
Article
Dual-Branch Network for Blind Quality Assessment of Stereoscopic Omnidirectional Images: A Spherical and Perceptual Feature Integration Approach
by Zhe Wang, Yi Liu and Yang Song
Electronics 2025, 14(15), 3035; https://doi.org/10.3390/electronics14153035 - 30 Jul 2025
Viewed by 137
Abstract
Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this [...] Read more.
Stereoscopic omnidirectional images (SOIs) have gained significant attention for their immersive viewing experience by providing binocular depth with panoramic scenes. However, evaluating their visual quality remains challenging due to its unique spherical geometry, binocular disparity, and viewing conditions. To address these challenges, this paper proposes a dual-branch deep learning framework that integrates spherical structural features and perceptual binocular cues to assess the quality of SOIs without reference. Specifically, the global branch leverages spherical convolutions to capture wide-range spatial distortions, while the local branch utilizes a binocular difference module based on discrete wavelet transform to extract depth-aware perceptual information. A feature complementarity module is introduced to fuse global and local representations for final quality prediction. Experimental evaluations on two public SOIQA datasets—NBU-SOID and SOLID—demonstrate that the proposed method achieves state-of-the-art performance, with PLCC/SROCC values of 0.926/0.918 and 0.918/0.891, respectively. These results validate the effectiveness and robustness of our approach in stereoscopic omnidirectional image quality assessment tasks. Full article
(This article belongs to the Special Issue AI in Signal and Image Processing)
Show Figures

Figure 1

24 pages, 1300 KiB  
Article
That Came as No Surprise! The Processing of Prosody–Grammar Associations in Danish First and Second Language Users
by Sabine Gosselke Berthelsen and Line Burholt Kristensen
Languages 2025, 10(8), 181; https://doi.org/10.3390/languages10080181 - 28 Jul 2025
Viewed by 230
Abstract
In some languages, prosodic cues on word stems can be used to predict upcoming suffixes. Previous studies have shown that second language (L2) users can process such cues predictively in their L2 from approximately intermediate proficiency. This ability may depend on the mapping [...] Read more.
In some languages, prosodic cues on word stems can be used to predict upcoming suffixes. Previous studies have shown that second language (L2) users can process such cues predictively in their L2 from approximately intermediate proficiency. This ability may depend on the mapping of the L2 prosody onto first language (L1) perceptual and functional prosodic categories. Taking as an example the Danish stød, a complex prosodic cue, we investigate an acquisition context of a predictive cue where L2 users are unfamiliar with both its perceptual correlates and its functionality. This differs from previous studies on predictive prosodic cues in Swedish and Spanish, where L2 users were only unfamiliar with either the perceptual make-up or functionality of the cue. In a speeded number judgement task, L2 users of Danish with German as their L1 (N = 39) and L1 users of Danish (N = 40) listened to noun stems with a prosodic feature (stød or non-stød) that either matched or mismatched the inflectional suffix (singular vs. plural). While L1 users efficiently utilised stød predictively for rapid and accurate grammatical processing, L2 users showed no such behaviour. These findings underscore the importance of mapping between L1 and L2 prosodic categories in second language acquisition. Full article
Show Figures

Figure 1

20 pages, 3386 KiB  
Article
Design of Realistic and Artistically Expressive 3D Facial Models for Film AIGC: A Cross-Modal Framework Integrating Audience Perception Evaluation
by Yihuan Tian, Xinyang Li, Zuling Cheng, Yang Huang and Tao Yu
Sensors 2025, 25(15), 4646; https://doi.org/10.3390/s25154646 - 26 Jul 2025
Viewed by 369
Abstract
The rise of virtual production has created an urgent need for both efficient and high-fidelity 3D face generation schemes for cinema and immersive media, but existing methods are often limited by lighting–geometry coupling, multi-view dependency, and insufficient artistic quality. To address this, this [...] Read more.
The rise of virtual production has created an urgent need for both efficient and high-fidelity 3D face generation schemes for cinema and immersive media, but existing methods are often limited by lighting–geometry coupling, multi-view dependency, and insufficient artistic quality. To address this, this study proposes a cross-modal 3D face generation framework based on single-view semantic masks. It utilizes Swin Transformer for multi-level feature extraction and combines with NeRF for illumination decoupled rendering. We utilize physical rendering equations to explicitly separate surface reflectance from ambient lighting to achieve robust adaptation to complex lighting variations. In addition, to address geometric errors across illumination scenes, we construct geometric a priori constraint networks by mapping 2D facial features to 3D parameter space as regular terms with the help of semantic masks. On the CelebAMask-HQ dataset, this method achieves a leading score of SSIM = 0.892 (37.6% improvement from baseline) with FID = 40.6. The generated faces excel in symmetry and detail fidelity with realism and aesthetic scores of 8/10 and 7/10, respectively, in a perceptual evaluation with 1000 viewers. By combining physical-level illumination decoupling with semantic geometry a priori, this paper establishes a quantifiable feedback mechanism between objective metrics and human aesthetic evaluation, providing a new paradigm for aesthetic quality assessment of AI-generated content. Full article
(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)
Show Figures

Figure 1

10 pages, 678 KiB  
Article
Do Rare Genetic Conditions Exhibit a Specific Phonotype? A Comprehensive Description of the Vocal Traits Associated with Crisponi/Cold-Induced Sweating Syndrome Type 1
by Federico Calà, Elisabetta Sforza, Lucia D’Alatri, Lorenzo Frassineti, Claudia Manfredi, Roberta Onesimo, Donato Rigante, Marika Pane, Serenella Servidei, Guido Primiano, Giangiorgio Crisponi, Laura Crisponi, Chiara Leoni, Antonio Lanatà and Giuseppe Zampino
Genes 2025, 16(8), 881; https://doi.org/10.3390/genes16080881 - 26 Jul 2025
Viewed by 202
Abstract
Background: Perceptual analysis has highlighted that the voice characteristics of patients with rare congenital genetic syndromes differ from those of normophonic subjects. In this paper, we describe the voice phenotype, also called the phonotype, of patients with Crisponi/cold-induced sweating syndrome type 1 [...] Read more.
Background: Perceptual analysis has highlighted that the voice characteristics of patients with rare congenital genetic syndromes differ from those of normophonic subjects. In this paper, we describe the voice phenotype, also called the phonotype, of patients with Crisponi/cold-induced sweating syndrome type 1 (CS/CISS1). Methods: We conducted an observational study at the Department of Life Sciences and Public Health, Rome. Thirteen patients were included in this study (five males; mean age: 16 years; SD: 10.63 years; median age: 12 years; age range: 6–44 years), and five were adults (38%). We prospectively recorded and analyzed acoustical features of three corner vowels [a], [i], and [u]. For perceptual analysis, the GIRBAS (grade, instability, roughness, breathiness, asthenia, and strain) scale was utilized. Acoustic analysis was performed through BioVoice software. Results: We found that CS/CISS1 patients share a common phonotype characterized by articulation disorders and hyper-rhinophonia. Conclusions: This study contributes to delineating the voice of CS/CISS1 syndrome. The phonotype can represent one of the earliest indicators for detecting rare congenital conditions, enabling specialists to reduce diagnosis time and better define a spectrum of rare and ultra-rare diseases. Full article
(This article belongs to the Section Human Genomics and Genetic Diseases)
Show Figures

Figure 1

28 pages, 3794 KiB  
Article
A Robust System for Super-Resolution Imaging in Remote Sensing via Attention-Based Residual Learning
by Rogelio Reyes-Reyes, Yeredith G. Mora-Martinez, Beatriz P. Garcia-Salgado, Volodymyr Ponomaryov, Jose A. Almaraz-Damian, Clara Cruz-Ramos and Sergiy Sadovnychiy
Mathematics 2025, 13(15), 2400; https://doi.org/10.3390/math13152400 - 25 Jul 2025
Viewed by 192
Abstract
Deep learning-based super-resolution (SR) frameworks are widely used in remote sensing applications. However, existing SR models still face limitations, particularly in recovering contours, fine features, and textures, as well as in effectively integrating channel information. To address these challenges, this study introduces a [...] Read more.
Deep learning-based super-resolution (SR) frameworks are widely used in remote sensing applications. However, existing SR models still face limitations, particularly in recovering contours, fine features, and textures, as well as in effectively integrating channel information. To address these challenges, this study introduces a novel residual model named OARN (Optimized Attention Residual Network) specifically designed to enhance the visual quality of low-resolution images. The network operates on the Y channel of the YCbCr color space and integrates LKA (Large Kernel Attention) and OCM (Optimized Convolutional Module) blocks. These components can restore large-scale spatial relationships and refine textures and contours, improving feature reconstruction without significantly increasing computational complexity. The performance of OARN was evaluated using satellite images from WorldView-2, GaoFen-2, and Microsoft Virtual Earth. Evaluation was conducted using objective quality metrics, such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Edge Preservation Index (EPI), and Perceptual Image Patch Similarity (LPIPS), demonstrating superior results compared to state-of-the-art methods in both objective measurements and subjective visual perception. Moreover, OARN achieves this performance while maintaining computational efficiency, offering a balanced trade-off between processing time and reconstruction quality. Full article
Show Figures

Figure 1

24 pages, 4226 KiB  
Article
Digital Signal Processing of the Inharmonic Complex Tone
by Tatjana Miljković, Jelena Ćertić, Miloš Bjelić and Dragana Šumarac Pavlović
Appl. Sci. 2025, 15(15), 8293; https://doi.org/10.3390/app15158293 - 25 Jul 2025
Viewed by 167
Abstract
In this paper, a set of digital signal processing (DSP) procedures tailored for the analysis of complex musical tones with prominent inharmonicity is presented. These procedures are implemented within a MATLAB-based application and organized into three submodules. The application follows a structured DSP [...] Read more.
In this paper, a set of digital signal processing (DSP) procedures tailored for the analysis of complex musical tones with prominent inharmonicity is presented. These procedures are implemented within a MATLAB-based application and organized into three submodules. The application follows a structured DSP chain: basic signal manipulation; spectral content analysis; estimation of the inharmonicity coefficient and the number of prominent partials; design of a dedicated filter bank; signal decomposition into subchannels; subchannel analysis and envelope extraction; and, finally, recombination of the subchannels into a wideband signal. Each stage in the chain is described in detail, and the overall process is demonstrated through representative examples. The concept and the accompanying application are initially intended for rapid post-processing of recorded signals, offering a tool for enhanced signal annotation. Additionally, the built-in features for subchannel manipulation and recombination enable the preparation of stimuli for perceptual listening tests. The procedures have been tested on a set of recorded tones from various string instruments, including those with pronounced inharmonicity, such as the piano, harp, and harpsichord. Full article
(This article belongs to the Special Issue Musical Acoustics and Sound Perception)
Show Figures

Figure 1

25 pages, 6911 KiB  
Article
Image Inpainting Algorithm Based on Structure-Guided Generative Adversarial Network
by Li Zhao, Tongyang Zhu, Chuang Wang, Feng Tian and Hongge Yao
Mathematics 2025, 13(15), 2370; https://doi.org/10.3390/math13152370 - 24 Jul 2025
Viewed by 287
Abstract
To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a [...] Read more.
To address the challenges of image inpainting in scenarios with extensive or irregular missing regions—particularly detail oversmoothing, structural ambiguity, and textural incoherence—this paper proposes an Image Structure-Guided (ISG) framework that hierarchically integrates structural priors with semantic-aware texture synthesis. The proposed methodology advances a two-stage restoration paradigm: (1) Structural Prior Extraction, where adaptive edge detection algorithms identify residual contours in corrupted regions, and a transformer-enhanced network reconstructs globally consistent structural maps through contextual feature propagation; (2) Structure-Constrained Texture Synthesis, wherein a multi-scale generator with hybrid dilated convolutions and channel attention mechanisms iteratively refines high-fidelity textures under explicit structural guidance. The framework introduces three innovations: (1) a hierarchical feature fusion architecture that synergizes multi-scale receptive fields with spatial-channel attention to preserve long-range dependencies and local details simultaneously; (2) spectral-normalized Markovian discriminator with gradient-penalty regularization, enabling adversarial training stability while enforcing patch-level structural consistency; and (3) dual-branch loss formulation combining perceptual similarity metrics with edge-aware constraints to align synthesized content with both semantic coherence and geometric fidelity. Our experiments on the two benchmark datasets (Places2 and CelebA) have demonstrated that our framework achieves more unified textures and structures, bringing the restored images closer to their original semantic content. Full article
Show Figures

Figure 1

27 pages, 8957 KiB  
Article
DFAN: Single Image Super-Resolution Using Stationary Wavelet-Based Dual Frequency Adaptation Network
by Gyu-Il Kim and Jaesung Lee
Symmetry 2025, 17(8), 1175; https://doi.org/10.3390/sym17081175 - 23 Jul 2025
Viewed by 285
Abstract
Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this [...] Read more.
Single image super-resolution is the inverse problem of reconstructing a high-resolution image from its low-resolution counterpart. Although recent Transformer-based architectures leverage global context integration to improve reconstruction quality, they often overlook frequency-specific characteristics, resulting in the loss of high-frequency information. To address this limitation, we propose the Dual Frequency Adaptive Network (DFAN). DFAN first decomposes the input into low- and high-frequency components via Stationary Wavelet Transform. In the low-frequency branch, Swin Transformer layers restore global structures and color consistency. In contrast, the high-frequency branch features a dedicated module that combines Directional Convolution with Residual Dense Blocks, precisely reinforcing edges and textures. A frequency fusion module then adaptively merges these complementary features using depthwise and pointwise convolutions, achieving a balanced reconstruction. During training, we introduce a frequency-aware multi-term loss alongside the standard pixel-wise loss to explicitly encourage high-frequency preservation. Extensive experiments on the Set5, Set14, BSD100, Urban100, and Manga109 benchmarks show that DFAN achieves up to +0.64 dBpeak signal-to-noise ratio, +0.01 structural similarity index measure, and −0.01learned perceptual image patch similarity over the strongest frequency-domain baselines, while also delivering visibly sharper textures and cleaner edges. By unifying spatial and frequency-domain advantages, DFAN effectively mitigates high-frequency degradation and enhances SISR performance. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

23 pages, 1580 KiB  
Article
Elucidating White Matter Contributions to the Cognitive Architecture of Affective Prosody Recognition: Evidence from Right Hemisphere Stroke
by Meyra S. Jackson, Yuto Uchida, Shannon M. Sheppard, Kenichi Oishi, Ciprian Crainiceanu, Argye E. Hillis and Alexandra Z. Durfee
Brain Sci. 2025, 15(7), 769; https://doi.org/10.3390/brainsci15070769 - 19 Jul 2025
Viewed by 362
Abstract
Background/Objectives: Successful discourse relies not only on linguistic but also on prosodic information. Difficulty recognizing emotion conveyed through prosody (receptive affective aprosodia) following right hemisphere stroke (RHS) significantly disrupts communication participation and personal relationships. Growing evidence suggests that damage to white matter [...] Read more.
Background/Objectives: Successful discourse relies not only on linguistic but also on prosodic information. Difficulty recognizing emotion conveyed through prosody (receptive affective aprosodia) following right hemisphere stroke (RHS) significantly disrupts communication participation and personal relationships. Growing evidence suggests that damage to white matter in addition to gray matter structures impairs affective prosody recognition. The current study investigates lesion–symptom associations in receptive affective aprosodia during RHS recovery by assessing whether disruptions in distinct white matter structures impact different underlying affective prosody recognition skills. Methods: Twenty-eight adults with RHS underwent neuroimaging and behavioral testing at acute, subacute, and chronic timepoints. Fifty-seven healthy matched controls completed the same behavioral testing, which comprised tasks targeting affective prosody recognition and underlying perceptual, cognitive, and linguistic skills. Linear mixed-effects models and multivariable linear regression were used to assess behavioral performance recovery and lesion–symptom associations. Results: Controls outperformed RHS participants on behavioral tasks earlier in recovery, and RHS participants’ affective prosody recognition significantly improved from acute to chronic testing. Affective prosody and emotional facial expression recognition were affected by external capsule and inferior fronto-occipital fasciculus lesions while sagittal stratum lesions impacted prosodic feature recognition. Accessing semantic representations of emotions implicated the superior longitudinal fasciculus. Conclusions: These findings replicate previously observed associations between right white matter tracts and affective prosody recognition and further identify lesion–symptom associations of underlying prosodic recognition skills throughout recovery. Investigation into prosody’s behavioral components and how they are affected by injury can help further intervention development and planning. Full article
(This article belongs to the Special Issue Language, Communication and the Brain—2nd Edition)
Show Figures

Figure 1

26 pages, 7178 KiB  
Article
Super-Resolution Reconstruction of Formation MicroScanner Images Based on the SRGAN Algorithm
by Changqiang Ma, Xinghua Qi, Liangyu Chen, Yonggui Li, Jianwei Fu and Zejun Liu
Processes 2025, 13(7), 2284; https://doi.org/10.3390/pr13072284 - 17 Jul 2025
Viewed by 319
Abstract
Formation MicroScanner Image (FMI) technology is a key method for identifying fractured reservoirs and optimizing oil and gas exploration, but its inherent insufficient resolution severely constrains the fine characterization of geological features. This study innovatively applies a Super-Resolution Generative Adversarial Network (SRGAN) to [...] Read more.
Formation MicroScanner Image (FMI) technology is a key method for identifying fractured reservoirs and optimizing oil and gas exploration, but its inherent insufficient resolution severely constrains the fine characterization of geological features. This study innovatively applies a Super-Resolution Generative Adversarial Network (SRGAN) to the super-resolution reconstruction of FMI logging image to address this bottleneck problem. By collecting FMI logging image of glutenite from a well in Xinjiang, a training set containing 24,275 images was constructed, and preprocessing strategies such as grayscale conversion and binarization were employed to optimize input features. Leveraging SRGAN’s generator-discriminator adversarial mechanism and perceptual loss function, high-quality mapping from low-resolution FMI logging image to high-resolution images was achieved. This study yields significant results: in RGB image reconstruction, SRGAN achieved a Peak Signal-to-Noise Ratio (PSNR) of 41.39 dB, surpassing the optimal traditional method (bicubic interpolation) by 61.6%; its Structural Similarity Index (SSIM) reached 0.992, representing a 34.1% improvement; in grayscale image processing, SRGAN effectively eliminated edge blurring, with the PSNR (40.15 dB) and SSIM (0.990) exceeding the suboptimal method (bilinear interpolation) by 36.6% and 9.9%, respectively. These results fully confirm that SRGAN can significantly restore edge contours and structural details in FMI logging image, with performance far exceeding traditional interpolation methods. This study not only systematically verifies, for the first time, SRGAN’s exceptional capability in enhancing FMI resolution, but also provides a high-precision data foundation for reservoir parameter inversion and geological modeling, holding significant application value for advancing the intelligent exploration of complex hydrocarbon reservoirs. Full article
Show Figures

Figure 1

23 pages, 396 KiB  
Article
Navigating Hybrid Work: An Optimal Office–Remote Mix and the Manager–Employee Perception Gap in IT
by Milos Loncar, Jovanka Vukmirovic, Aleksandra Vukmirovic, Dragan Vukmirovic and Ratko Lasica
Sustainability 2025, 17(14), 6542; https://doi.org/10.3390/su17146542 - 17 Jul 2025
Viewed by 481
Abstract
The transition to hybrid work has become a defining feature of the post-pandemic IT sector, yet organizations lack empirical benchmarks for balancing flexibility with performance and well-being. This study addresses this gap by identifying an optimal hybrid work structure and exposing systematic perception [...] Read more.
The transition to hybrid work has become a defining feature of the post-pandemic IT sector, yet organizations lack empirical benchmarks for balancing flexibility with performance and well-being. This study addresses this gap by identifying an optimal hybrid work structure and exposing systematic perception gaps between employees and managers. Grounded in Self-Determination Theory and the Job Demands–Resources model, our research analyses survey data from 1003 employees and 252 managers across 46 countries. The findings identify a hybrid “sweet spot” of 6–10 office days per month. Employees in this window report significantly higher perceived efficiency (Odds Ratio (OR) ≈ 2.12) and marginally lower office-related stress. Critically, the study uncovers a significant perception gap: contrary to the initial hypothesis, managers are nearly twice as likely as employees to rate hybrid work as most efficient (OR ≈ 1.95) and consistently evaluate remote-work resources more favourably (OR ≈ 2.64). This “supervisor-optimism bias” suggests a disconnect between policy design and frontline experience. The study concludes that while a light-to-moderate hybrid model offers clear benefits, organizations must actively address this perceptual divide and remedy resource shortages to realize the potential of hybrid work fully. This research provides data-driven guidelines for creating sustainable, high-performance work environments in the IT sector. Full article
Show Figures

Figure 1

29 pages, 1184 KiB  
Article
Perception-Based H.264/AVC Video Coding for Resource-Constrained and Low-Bit-Rate Applications
by Lih-Jen Kau, Chin-Kun Tseng and Ming-Xian Lee
Sensors 2025, 25(14), 4259; https://doi.org/10.3390/s25144259 - 8 Jul 2025
Viewed by 373
Abstract
With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while [...] Read more.
With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while minimizing bit rate and processing overhead. Although newer video coding standards have emerged, H.264/AVC remains the dominant compression format in many deployed systems, particularly in commercial CCTV surveillance, due to its compatibility, stability, and widespread hardware support. Motivated by these practical demands, this paper proposes a perception-based video coding algorithm specifically tailored for low-bit-rate H.264/AVC applications. By targeting regions most relevant to the human visual system, the proposed method enhances perceptual quality while optimizing resource usage, making it particularly suitable for embedded systems and bandwidth-limited communication channels. In general, regions containing human faces and those exhibiting significant motion are of primary importance for human perception and should receive higher bit allocation to preserve visual quality. To this end, macroblocks (MBs) containing human faces are detected using the Viola–Jones algorithm, which leverages AdaBoost for feature selection and a cascade of classifiers for fast and accurate detection. This approach is favored over deep learning-based models due to its low computational complexity and real-time capability, making it ideal for latency- and resource-constrained IoT and edge environments. Motion-intensive macroblocks were identified by comparing their motion intensity against the average motion level of preceding reference frames. Based on these criteria, a dynamic quantization parameter (QP) adjustment strategy was applied to assign finer quantization to perceptually important regions of interest (ROIs) in low-bit-rate scenarios. The experimental results show that the proposed method achieves superior subjective visual quality and objective Peak Signal-to-Noise Ratio (PSNR) compared to the standard JM software and other state-of-the-art algorithms under the same bit rate constraints. Moreover, the approach introduces only a marginal increase in computational complexity, highlighting its efficiency. Overall, the proposed algorithm offers an effective balance between visual quality and computational performance, making it well suited for video transmission in bandwidth-constrained, resource-limited IoT and edge computing environments. Full article
Show Figures

Figure 1

21 pages, 33500 KiB  
Article
Location Research and Picking Experiment of an Apple-Picking Robot Based on Improved Mask R-CNN and Binocular Vision
by Tianzhong Fang, Wei Chen and Lu Han
Horticulturae 2025, 11(7), 801; https://doi.org/10.3390/horticulturae11070801 - 6 Jul 2025
Viewed by 437
Abstract
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and [...] Read more.
With the advancement of agricultural automation technologies, apple-harvesting robots have gradually become a focus of research. As their “perceptual core,” machine vision systems directly determine picking success rates and operational efficiency. However, existing vision systems still exhibit significant shortcomings in target detection and positioning accuracy in complex orchard environments (e.g., uneven illumination, foliage occlusion, and fruit overlap), which hinders practical applications. This study proposes a visual system for apple-harvesting robots based on improved Mask R-CNN and binocular vision to achieve more precise fruit positioning. The binocular camera (ZED2i) carried by the robot acquires dual-channel apple images. An improved Mask R-CNN is employed to implement instance segmentation of apple targets in binocular images, followed by a template-matching algorithm with parallel epipolar constraints for stereo matching. Four pairs of feature points from corresponding apples in binocular images are selected to calculate disparity and depth. Experimental results demonstrate average coefficients of variation and positioning accuracy of 5.09% and 99.61%, respectively, in binocular positioning. During harvesting operations with a self-designed apple-picking robot, the single-image processing time was 0.36 s, the average single harvesting cycle duration reached 7.7 s, and the comprehensive harvesting success rate achieved 94.3%. This work presents a novel high-precision visual positioning method for apple-harvesting robots. Full article
(This article belongs to the Section Fruit Production Systems)
Show Figures

Figure 1

24 pages, 6164 KiB  
Article
Transformer–GCN Fusion Framework for Mineral Prospectivity Mapping: A Geospatial Deep Learning Approach
by Le Gao, Gnanachandrasamy Gopalakrishnan, Adel Nasri, Youhong Li, Yuying Zhang, Xiaoying Ou and Kele Xia
Minerals 2025, 15(7), 711; https://doi.org/10.3390/min15070711 - 3 Jul 2025
Viewed by 460
Abstract
Mineral prospectivity mapping (MPM) is a pivotal technique in geoscientific mineral resource exploration. To address three critical challenges in current deep convolutional neural network applications for geoscientific mineral resource prediction—(1) model bias induced by imbalanced distribution of ore deposit samples, (2) deficiency in [...] Read more.
Mineral prospectivity mapping (MPM) is a pivotal technique in geoscientific mineral resource exploration. To address three critical challenges in current deep convolutional neural network applications for geoscientific mineral resource prediction—(1) model bias induced by imbalanced distribution of ore deposit samples, (2) deficiency in global feature extraction due to excessive reliance on local spatial correlations, and (3) diminished discriminative capability caused by feature smoothing in deep networks—this study innovatively proposes a T-GCN model integrating Transformer with graph convolutional neural networks (GCNs). The model achieves breakthrough performance through three key technological innovations: firstly, constructing a global perceptual field via Transformer’s self-attention mechanism to effectively capture long-range geological relationships; secondly, combining GCNs’ advantages in topological feature extraction to realize multi-scale feature fusion; and thirdly, designing a feature enhancement module to mitigate deep network degradation. In practical application to the PangXD ore district, the T-GCN model achieved a prediction accuracy of 97.27%, representing a 3.76 percentage point improvement over the best comparative model, and successfully identified five prospective mineralization zones, demonstrating its superior performance and application value under complex geological conditions. Full article
Show Figures

Figure 1

Back to TopTop