MDPI - Publisher of Open Access Journals

17 pages, 8857 KB

Open AccessArticle

An Interpretable Deep Learning System for Fine-Grained Classification and Longitudinal Tracking of Neonatal Auricular Deformities

by Yihui Feng, Xujun Hu, Xiwen Zhang, Xiaobao Ma, Jialin Xie, Jianyong Chen and Yangyang Yuan

Biology 2026, 15(13), 985; https://doi.org/10.3390/biology15130985 (registering DOI) - 23 Jun 2026

Abstract

Early non-invasive correction of neonatal auricular deformities is highly dependent on timely and precise diagnosis. However, clinical practice is often compromised by the subjectivity of visual assessments and the lack of objective tracking metrics, which frequently leads to missed optimal treatment windows. To [...] Read more.

Early non-invasive correction of neonatal auricular deformities is highly dependent on timely and precise diagnosis. However, clinical practice is often compromised by the subjectivity of visual assessments and the lack of objective tracking metrics, which frequently leads to missed optimal treatment windows. To address these challenges, we developed an interpretable deep learning-based diagnostic system for the automated screening and fine-grained classification of these deformities. Methodologically, a large-scale, multi-source dataset (n = 4644) was curated to support model training. The system pairs an automated object detector (YOLOv11) for background-reduced region-of-interest isolation with a cascaded classification pipeline optimized via ConvNeXt-Tiny. Crucially, we introduced a supervised contrastive learning module to project high-dimensional morphological features into a continuous severity score, enabling quantitative longitudinal tracking of therapeutic efficacy. To evaluate generalization and robustness, the framework underwent rigorous evaluation across three independent real-world cohorts and one controlled synthetic stress test. The system achieved 88.2% accuracy (Area Under the Curve (AUC): 0.949) in binary screening and 87.4% accuracy (macro-AUC: 0.976) in multi-class subtyping on the internal baseline. To enhance interpretability and build clinical trust, Gradient-weighted Class Activation Mapping (Grad-CAM) was utilized to explore the spatial distribution of the model’s attention, which frequently aligned with key anatomical landmarks. Furthermore, the learned severity scores robustly quantified post-intervention improvements (p = 0.0004), effectively capturing subtle anatomical normalization. While validation for rare subtypes remains underpowered, and the severity score currently functions mainly as a learned morphological similarity index requiring future clinical calibration, this study ultimately provides an objective and standardized web-based tool to facilitate the early intervention and precision management of neonatal auricular anomalies. Full article

(This article belongs to the Special Issue AI Deep Learning Approach to Study Biological Questions (3rd Edition))

► Show Figures

Figure 1

26 pages, 477 KB

Open AccessArticle

A Low-Cost RGB-D Sensing Front-End for Stable 3D Hand Landmark Reconstruction Using MediaPipe and ZED2 Stereo Depth

by Laixin Peng, Tiansheng Liu and Bingwei He

Sensors 2026, 26(12), 3730; https://doi.org/10.3390/s26123730 - 11 Jun 2026

Viewed by 213

Abstract

Stable three-dimensional hand landmark reconstruction using low-cost RGB-D sensors is important for human–computer interaction, robot teleoperation, and vision-based motion analysis. RGB-based hand landmark detectors provide stable semantic 2D landmarks, but their depth output is not a metric measurement in the physical camera coordinate [...] Read more.

Stable three-dimensional hand landmark reconstruction using low-cost RGB-D sensors is important for human–computer interaction, robot teleoperation, and vision-based motion analysis. RGB-based hand landmark detectors provide stable semantic 2D landmarks, but their depth output is not a metric measurement in the physical camera coordinate system. Stereo cameras can provide metric depth, but direct landmark-level back-projection is sensitive to invalid pixels, local depth holes, boundary noise, and partial occlusion. To address these problems, this paper presents a lightweight RGB-D sensing front-end that combines MediaPipe semantic hand landmarks with ZED2 stereo depth. The proposed pipeline detects 21 semantic hand landmarks in the RGB image, obtains landmark-level metric depth from the aligned ZED2 depth map using local median sampling, reconstructs 3D landmarks by camera back-projection, and further applies exponential moving average filtering and a bone-length consistency constraint. Experiments were conducted on a self-collected SVO dataset containing 13 hand actions and 26 recorded sequences, and an additional checkerboard-based reference-distance validation was performed to evaluate the metric depth sampling and 3D back-projection component. Compared with single-pixel sampling, the

5 \times 5

local median strategy slightly increased the valid-depth ratio from 0.9731 to 0.9738 and reduced the temporal smoothness metric from 1.7163 mm to 1.6902 mm. To further justify the temporal filtering choice, an additional comparison with the 1 Euro Filter was conducted using the reconstructed win5 trajectories. The 1 Euro Filter produced stronger smoothing, reducing the temporal smoothness metric to 0.196 mm, but also reduced the path-length ratio to 0.484, indicating substantial motion attenuation. EMA0.7 was therefore retained as a more balanced setting, reducing the temporal smoothness metric to 0.826 mm while maintaining a path-length ratio of 0.803. The BL0.5 bone-length constraint reduced the bone-length standard deviation from 2.0727 mm to 1.1995 mm with limited trajectory modification. The final configuration provides a practical low-cost RGB-D front-end for stable 3D hand landmark reconstruction under controlled indoor conditions. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

20 pages, 2215 KB

Open AccessArticle

Frame Selection Strategies for Video Deepfake Detection: Benchmarking Accuracy and Runtime Trade-Offs

by Artūras Serackis, Mindaugas Jankauskas, Anastasija Grubinskienė and Vytautas Abromavičius

Appl. Sci. 2026, 16(11), 5364; https://doi.org/10.3390/app16115364 - 27 May 2026

Viewed by 339

Abstract

This study evaluates frame selection during inference as an independent factor in video deepfake detection while keeping the downstream detectors fixed. We compare twelve frame selection strategies, ranging from simple temporal and quality baselines to landmark aware policies, using four validated pretrained detectors: [...] Read more.

This study evaluates frame selection during inference as an independent factor in video deepfake detection while keeping the downstream detectors fixed. We compare twelve frame selection strategies, ranging from simple temporal and quality baselines to landmark aware policies, using four validated pretrained detectors: Self-Blended Images (SBIs), Frequency-Enhanced Self-Blended Images (FSBIs), Generative Convolutional Vision Transformer (GenConViT), and GenD. The primary experiment is a complete factorial benchmark with 300 videos and five frame budgets (2, 4, 8, 16, and 32 selected frames), which provides the reference results at 32 frames. To address sample size limitations, an additional validation experiment uses a deduplicated split of 1180 Celeb-DF++ and FaceForensics++ videos, with complete results for 2, 4, and 8 selected frames and a reported subset for 16 selected frames. In the complete 300-video benchmark, 32 frames achieved the strongest average AUC, while 8 and 16 frames recovered most of the attainable performance with lower runtime. The best single validated configuration was GenD with Shot-aware sampling at 32 frames, yielding an AUC of 0.9607 and a balanced accuracy of 0.9133. The study therefore does not claim that smaller budgets universally outperform 32 frames; instead, it quantifies the tradeoff between accuracy and runtime and shows that frame selection remains a meaningful design variable under constrained inference budgets. Full article

(This article belongs to the Special Issue Integration of AI in Signal and Image Processing)

► Show Figures

Figure 1

32 pages, 25709 KB

Open AccessArticle

Landmark-Based Features for Vehicle Trajectory Anomaly Detection from Traffic Video in Urban Intersections—A Case Study

by Nicolae Cleju and Constantin Catargiu

Sensors 2026, 26(10), 3027; https://doi.org/10.3390/s26103027 - 11 May 2026

Viewed by 1114

Abstract

We study trajectory feature representations in the context of detecting spatially anomalous vehicle trajectories in urban intersections, using trajectory data from video streams captured by camera monitoring systems. These trajectories are extracted using an object detection pipeline and have particular characteristics like short [...] Read more.

We study trajectory feature representations in the context of detecting spatially anomalous vehicle trajectories in urban intersections, using trajectory data from video streams captured by camera monitoring systems. These trajectories are extracted using an object detection pipeline and have particular characteristics like short lengths, variable endpoints, and other viewpoint-dependent detection artifacts, which make existing spatial feature approaches less effective. We introduce two feature representations adapted for intersection-level trajectories, based on distances to a fixed set of landmark points, which provide fixed-length vectors compatible with common tabular anomaly detector algorithms. We evaluate using a dataset of 5378 labeled trajectories collected from camera recordings in one deployment site, as well as on other existing city-wide benchmark datasets, showing that, in the evaluated setting, the proposed feature representations improve upon several existing spatial features and enable better detection of both shape and placement anomalies. Full article

(This article belongs to the Special Issue Sensors and Sensing Technologies for Traffic, Driving and Transportation)

► Show Figures

Figure 1

50 pages, 1347 KB

Open AccessReview

Sensory Neuroimmunology: Bidirectional Neuro-Immune Circuits Governing Pain, Itch, Inflammation, and Host Defense at Barrier Surfaces

by Reza Mosaddeghi-Heris, Nasrin Forghani, Negin Safari Dehnavi, Maryam Saberivand, Amir Tahavvori, Sohrab Azin, Niloofar Taheri and Paolo Martelletti

Biology 2026, 15(10), 756; https://doi.org/10.3390/biology15100756 - 9 May 2026

Viewed by 613

Abstract

Sensory neurons at barrier tissues were once seen as passive detectors of environmental stimuli. However, in the last five years, increasing evidence has challenged this view, redefining these cells as active immune sentinels that directly affect tissue immunity in the skin, lungs, and [...] Read more.

Sensory neurons at barrier tissues were once seen as passive detectors of environmental stimuli. However, in the last five years, increasing evidence has challenged this view, redefining these cells as active immune sentinels that directly affect tissue immunity in the skin, lungs, and gastrointestinal tract. Nociceptors and pruriceptors express various immune-sensing receptors, including Toll-like receptors, cytokine receptors, and alarmin sensors, which allow them to directly detect pathogens, allergens, and tissue damage. When activated, sensory neurons quickly release neuropeptides such as calcitonin gene-related peptide (CGRP), substance P, vasoactive intestinal peptide (VIP), and PACAP (pituitary adenylate cyclase-activating polypeptide), which guide immune cell recruitment, activation, and resolution. Reciprocally, immune-derived mediators, including IL-33, IL-31, thymic stromal lymphopoietin (TSLP), IL-4/IL-13, and TNF-α, modulate neuronal excitability and plasticity, forming bidirectional neuroimmune circuits that control inflammation, host defense, pain, and itch. Landmark studies published in 2024–2025, including neuronal control of gut Treg function and the identification of sensory nerve immune niches, have further refined this framework and revealed tissue-specific circuit specialization. This review synthesizes recent insights from molecular, cellular, and systems levels into the sensory neuroimmune axis, emphasizes its protective versus pathogenic roles, and critically evaluates emerging therapeutic strategies and safety concerns, positioning sensory neuroimmunology as a unifying framework for tissue barrier homeostasis and disease. Full article

(This article belongs to the Special Issue Paper Collection: Understanding Immune Systems)

► Show Figures

Figure 1

19 pages, 3945 KB

Open AccessArticle

LiDAR-Free 3D Auto-Labeling via Radar–Visual Spatio-Temporal Consistency

by Boning Zhu, Zhiqun Hu and Zhaoming Lu

Sensors 2026, 26(10), 2956; https://doi.org/10.3390/s26102956 - 8 May 2026

Viewed by 665

Abstract

Vision foundation models (VFMs) enable high-quality 2D instance masks, yet their lifted pseudo-point clouds suffer from scale ambiguity, structural noise, and temporal inconsistency, limiting their utility in 3D annotation. Existing automatic labeling methods either rely on expensive light detection and ranging (LiDAR) sensors [...] Read more.

Vision foundation models (VFMs) enable high-quality 2D instance masks, yet their lifted pseudo-point clouds suffer from scale ambiguity, structural noise, and temporal inconsistency, limiting their utility in 3D annotation. Existing automatic labeling methods either rely on expensive light detection and ranging (LiDAR) sensors or fail to enforce physical plausibility in dynamic roadside scenes. This study proposes a LiDAR-free radar–visual auto-labeling framework that leverages cross-modal spatio-temporal consistency between millimeter-wave radar trajectories and visual pseudo-point clouds to self-correct 3D geometry. The method first associates radar points, 2D masks, and pseudo-point clouds into object-centric sequences. Then, an uncertainty-aware pose fusion module combines motion-derived and structure-derived orientations using automatically solved road priors. Finally, the pseudo-point cloud is refined in canonical space by optimizing stable semantic landmarks from temporally consistent masks and propagating their corrections globally. Evaluated on a real-world roadside dataset, the method achieves 49.1% bird’s-eye-view (BEV) intersection over union (IoU) and 43.0% 3D IoU, outperforming a radar–camera fusion baseline by 5.5/5.9 points. Downstream experiments further show that the generated pseudo-labels and semantic enhancement are useful under the evaluated detector configurations, while broader validation remains future work. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

21 pages, 22338 KB

Open AccessArticle

Nighttime Driver Fatigue Detection Based on Real-Time Joint Face and Facial Landmarks Detection

by Zhuofan Huang, Shangkun Liu, Jingli Huang and Jie Huang

Modelling 2026, 7(2), 60; https://doi.org/10.3390/modelling7020060 - 21 Mar 2026

Viewed by 874

Abstract

Driver fatigue detection (DFD) in low-light nighttime driving environments is crucial for road safety, but it remains challenging due to degraded image quality and computational constraints. This paper proposes a real-time three-stage framework specifically designed for nighttime driver fatigue detection, integrating low-light image [...] Read more.

Driver fatigue detection (DFD) in low-light nighttime driving environments is crucial for road safety, but it remains challenging due to degraded image quality and computational constraints. This paper proposes a real-time three-stage framework specifically designed for nighttime driver fatigue detection, integrating low-light image enhancement, joint face and facial landmark detection, and geometry-based fatigue judgment. In the initial stage, the framework utilizes the Zero-Reference Deep Curve Estimation (Zero-DCE) algorithm to improve the visual quality of input images under low-light conditions. Subsequently, a novel lightweight single-stage detector, You Only Look Once for Joint Face and Facial Landmark Detection (YOLOJFF), is introduced for efficient joint localization. Finally, fatigue judgment is performed in real-time by calculating the Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR) from the detected landmarks and using a sliding time window strategy. Experimental results demonstrate that the enhancement module significantly improves detection performance. The YOLOJFF model achieves a favorable balance, with 90.9% precision, 87.6% mean Average Precision (mAP), and 5.2 Normalized Mean Error (NME), while requiring only 3.7 million (M) parameters and running at 107.5 FPS. The proposed framework provides a robust and efficient solution for real-time DFD in nighttime scenarios. Full article

► Show Figures

Figure 1

25 pages, 2630 KB

Open AccessArticle

Lightweight and Real-Time Driver Fatigue Detection Based on MG-YOLOv8 with Facial Multi-Feature Fusion

by Chengming Chen, Xinyue Liu, Meng Zhou, Zhijian Li, Zhanqi Du and Yandan Lin

J. Imaging 2025, 11(11), 385; https://doi.org/10.3390/jimaging11110385 - 1 Nov 2025

Cited by 2 | Viewed by 2439

Abstract

Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8 [...] Read more.

Driver fatigue is a primary factor in traffic accidents and poses a serious threat to road safety. To address this issue, this paper proposes a multi-feature fusion fatigue detection method based on an improved YOLOv8 model. First, the method uses an enhanced YOLOv8 model to achieve high-precision face detection. Then, it crops the detected face regions. Next, the lightweight PFLD (Practical Facial Landmark Detector) model performs keypoint detection on the cropped images, extracting 68 facial feature points and calculating key indicators related to fatigue status. These indicators include the eye aspect ratio (EAR), eyelid closure percentage (PERCLOS), mouth aspect ratio (MAR), and head posture ratio (HPR). To mitigate the impact of individual differences on detection accuracy, the paper introduces a novel sliding window model that combines a dynamic threshold adjustment strategy with an exponential weighted moving average (EWMA) algorithm. Based on this framework, blink frequency (BF), yawn frequency (YF), and nod frequency (NF) are calculated to extract time-series behavioral features related to fatigue. Finally, the driver’s fatigue state is determined using a comprehensive fatigue assessment algorithm. Experimental results on the WIDER FACE and YAWDD datasets demonstrate this method’s significant advantages in improving detection accuracy and computational efficiency. By striking a better balance between real-time performance and accuracy, the proposed method shows promise for real-world driving applications. Full article

(This article belongs to the Special Issue Towards Deeper Understanding of Image and Video Processing and Analysis)

► Show Figures

Figure 1

10 pages, 1132 KB

Open AccessArticle

Photon-Counting Computed Tomography of the Paranasal Sinuses Improves Intraoperative Accuracy of Image-Guided Surgery

by Benjamin Philipp Ernst, Iris Burck, Stefanie Schliwa, Sven Becker, Tobias Albrecht, Thomas J. Vogl, Jan-Erik Scholtz, Anna Levi, Andreas German Loth, Friederike Bärhold, Sebastian Strieth, Matthias F. Froelich, Alexander Hertel, Yannik Christian Layer, Daniel Kuetting and Jonas Eckrich

Diagnostics 2025, 15(21), 2777; https://doi.org/10.3390/diagnostics15212777 - 31 Oct 2025

Viewed by 1398

Abstract

Background: Computed tomography (CT)-based image-guided surgery (IGS) is of great importance in functional endoscopic sinus surgery (FESS) and requires IGS-specific imaging protocols to ensure high intraoperative accuracy. This study aimed to compare photon-counting CT (PCCT), dual-energy dual-source CT (DECT), and spectral detector CT [...] Read more.

Background: Computed tomography (CT)-based image-guided surgery (IGS) is of great importance in functional endoscopic sinus surgery (FESS) and requires IGS-specific imaging protocols to ensure high intraoperative accuracy. This study aimed to compare photon-counting CT (PCCT), dual-energy dual-source CT (DECT), and spectral detector CT (SDCT) of the paranasal sinuses with respect to image quality, IGS accuracy and radiation dose. Methods: A formalin-fixed cadaver skull was examined using PCCT, DECT and SDCT at 100 kV tube voltage with descending tube currents (mAs). The setup of electromagnetic IGS was evaluated using a visual analog scale. Accuracy was analyzed endoscopically using defined anatomical landmarks. Diagnostic image quality as well as bone and soft tissue noise were assessed qualitatively using a 5-point Likert scale and quantitatively by determination of signal-to-noise ratio. Radiation dose was evaluated using the dose length product. Results: While PCCT datasets could be registered and navigated accurately down to 10 mAs (1.5 mm error at 10 mAs), both DECT and SDCT exhibited significantly increased inaccuracies below 40 mAs (4.35/5.15 mm for DECT/SDCT at 25 mAs). Using PCCT therefore enabled a 45% radiation dose reduction at the minimally required dose length product using PCCT. Quantitative and qualitative image quality were superior for PCCT compared to DECT and SDCT. Conclusions: PCCT provides excellent accuracy of anatomical landmarks in IGS with superior image quality of the paranasal sinuses in low-mA scans and substantially reduced radiation exposure. Full article

(This article belongs to the Special Issue Innovations in Medical Imaging for Precision Diagnostics)

► Show Figures

Figure 1

18 pages, 5145 KB

Open AccessArticle

Spatio-Temporal Patterns and Sentiment Analysis of Ting, Tai, Lou, and Ge Ancient Chinese Architecture Buildings

by Jinghan Xie, Jinghang Wu and Zhongyong Xiao

Buildings 2025, 15(10), 1652; https://doi.org/10.3390/buildings15101652 - 14 May 2025

Cited by 2 | Viewed by 1402

Abstract

Ting, Tai, Lou, and Ge are types of ancient buildings that represent traditional Chinese architecture and culture. They are primarily constructed using mortise and tenon joints, complemented by brick and stone foundations, showcasing traditional architectural craftsmanship. However, research aimed at conserving, inheriting, and [...] Read more.

Ting, Tai, Lou, and Ge are types of ancient buildings that represent traditional Chinese architecture and culture. They are primarily constructed using mortise and tenon joints, complemented by brick and stone foundations, showcasing traditional architectural craftsmanship. However, research aimed at conserving, inheriting, and rejuvenating these buildings is limited, despite their status as Provincial Cultural Relic Protection Units of China. Therefore, the aim of this study was to reveal the spatial distribution of Ting, Tai, Lou, and Ge buildings across China, as well as the factors driving differences in their spatial distribution. Tourist experiences and building popularity were also explored. The spatial analysis method (e.g., Standard deviation ellipse and Geographic detector), Word cloud generation, and sentiment analysis, which uses Natural Language Processing techniques to identify subjective emotions in text, were applied to investigated the research issues. The key findings of this study are as follows. The ratio of Ting, Tai, Lou, and Ge buildings in Southeast China to that in Northwest China divided by the “Heihe–Tengchong” Line, an important demographic boundary in China with the ratio of permanent residents in the two areas remaining stable at 94:6, was 94.6:5.4. Geographic detector analysis revealed that six of the seven natural and socioeconomic factors (topography, waterways, roads, railways, population, and carbon dioxide emissions) had a significant influence on the spatial heterogeneity of these cultural heritage buildings in China, with socioeconomic factors, particularly population, having a greater influence on building spatial distributions. All seven factors (including the normalized difference vegetation index, an indicator used to assess vegetation health and coverage) were significant in Southeast China, whereas all factors were non-significant in Northwest China, which may be explained by the small number of buildings in the latter region. The average rating scores and heat scores for Ting, Tai, Lou, and Ge buildings were 4.35 (out of 5) and 3 (out of 10), respectively, reflecting an imbalance between service quality and popularity. According to the percentages of positive and negative reviews, Lou buildings have much better tourism services than other buildings, indicating a need to improve services to attract more tourists to Ting, Tai, and Ge buildings. Four main types of words were used with high frequency in the tourism reviews collected form Ctrip, a popular online travel platform in China: (1) historical stories; (2) tourism; (3) culture; and (4) cities/provinces. Ting and Tai buildings showed similar word clouds, as did Lou and Ge buildings, with only the former including historical stories. Conversely, landmark was a high-frequency word only in the reviews of Lou and Ge buildings. Specific suggestions were proposed based on the above findings to promote tourism and revive ancient Chinese architecture. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

23 pages, 34671 KB

Open AccessArticle

SSN: Scale Selection Network for Multi-Scale Object Detection in Remote Sensing Images

by Zhili Lin and Biao Leng

Remote Sens. 2024, 16(19), 3697; https://doi.org/10.3390/rs16193697 - 4 Oct 2024

Cited by 5 | Viewed by 2732

Abstract

The rapid growth of deep learning technology has made object detection in remote sensing images an important aspect of computer vision, finding applications in military surveillance, maritime rescue, and environmental monitoring. Nonetheless, the capture of remote sensing images at high altitudes causes significant [...] Read more.

The rapid growth of deep learning technology has made object detection in remote sensing images an important aspect of computer vision, finding applications in military surveillance, maritime rescue, and environmental monitoring. Nonetheless, the capture of remote sensing images at high altitudes causes significant scale variations, resulting in a heterogeneous range of object scales. These varying scales pose significant challenges for detection algorithms. To solve the scale variation problem, traditional detection algorithms compute multi-layer feature maps. However, this approach introduces significant computational redundancy. Inspired by the mechanism of cognitive scaling mechanisms handling multi-scale information, we propose a novel Scale Selection Network (SSN) to eliminate computational redundancy through scale attentional allocation. In particular, we have devised a lightweight Landmark Guided Scale Attention Network, which is capable of predicting potential scales in an image. The detector only needs to focus on the selected scale features, which greatly reduces the inference time. Additionally, a fast Reversible Scale Semantic Flow Preserving strategy is proposed to directly generate multi-scale feature maps for detection. Experiments demonstrate that our method facilitates the acceleration of image pyramid-based detectors by approximately 5.3 times on widely utilized remote sensing object detection benchmarks. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

27 pages, 2648 KB

Open AccessReview

Enigma of Pyramidal Neurons: Chirality-Centric View on Biological Evolution. Congruence to Molecular, Cellular, Physiological, Cognitive, and Psychological Functions

by Victor Vasilyevich Dyakin and Nika Viktorovna Dyakina-Fagnano

Symmetry 2024, 16(3), 355; https://doi.org/10.3390/sym16030355 - 15 Mar 2024

Cited by 3 | Viewed by 4731

Abstract

The mechanism of brain information processing unfolds within spatial and temporal domains inherently linked to the concept of space–time symmetry. Biological evolution, beginning with the prevalent molecular chirality, results in the handedness of human cognitive and psychological functions (the phenomena known as biochirality). [...] Read more.

The mechanism of brain information processing unfolds within spatial and temporal domains inherently linked to the concept of space–time symmetry. Biological evolution, beginning with the prevalent molecular chirality, results in the handedness of human cognitive and psychological functions (the phenomena known as biochirality). The key element in the chain of chirality transfer from the downstream to upstream processes is the pyramidal neuron (PyrN) morphology–function paradigm (archetype). The most apparent landmark of PyrNs is the geometry of the cell soma. However, “why/how PyrN’s soma gains the shape of quasi-tetrahedral symmetry” has never been explicitly articulated. Resolving the above inquiry is only possible based on the broad-view assumption that encoding 3D space requires specific 3D geometry of the neuronal detector and corresponding network. Accordingly, our hypothesis states that if the primary function of PyrNs, at the organism level, is sensory space symmetry perception, then the pyramidal shape of soma is the best evolutionary-selected geometry to support sensory-motor coupling. The biological system’s non-equilibrium (NE) state is fundamentally linked to an asymmetric, non-racemic, steady state of molecular constituents. The chiral theory of pyramidal soma shape conceptually agrees that living systems have evolved as non-equilibrium systems that exchange energy with the environment. The molecular mechanism involved in developing PyrN’s soma is studied in detail. However, the crucial missing element—the reference to the fundamental link between molecular chirality and the function of spatial navigation—is the main obstacle to resolving the question in demand: why did PyrNs’ soma gain the shape of quasi-tetrahedral symmetry? Full article

(This article belongs to the Section Life Sciences)

► Show Figures

Graphical abstract

15 pages, 2169 KB

Open AccessArticle

FPIRST: Fatigue Driving Recognition Method Based on Feature Parameter Images and a Residual Swin Transformer

by Weichu Xiao, Hongli Liu, Ziji Ma, Weihong Chen and Jie Hou

Sensors 2024, 24(2), 636; https://doi.org/10.3390/s24020636 - 19 Jan 2024

Cited by 8 | Viewed by 2451

Abstract

Fatigue driving is a serious threat to road safety, which is why accurately identifying fatigue driving behavior and warning drivers in time are of great significance in improving traffic safety. However, accurately recognizing fatigue driving is still challenging due to large intra-class variations [...] Read more.

Fatigue driving is a serious threat to road safety, which is why accurately identifying fatigue driving behavior and warning drivers in time are of great significance in improving traffic safety. However, accurately recognizing fatigue driving is still challenging due to large intra-class variations in facial expression, continuity of behaviors, and illumination conditions. A fatigue driving recognition method based on feature parameter images and a residual Swin Transformer is proposed in this paper. First, the face region is detected through spatial pyramid pooling and a multi-scale feature output module. Then, a multi-scale facial landmark detector is used to locate 23 key points on the face. The aspect ratios of the eyes and mouth are calculated based on the coordinates of these key points, and a feature parameter matrix for fatigue driving recognition is obtained. Finally, the feature parameter matrix is converted into an image, and the residual Swin Transformer network is presented to recognize fatigue driving. Experimental results on the HNUFD dataset show that the proposed method achieves an accuracy of 96.512%, thus outperforming state-of-the-art methods. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

14 pages, 3807 KB

Open AccessArticle

Region-Aware Deep Feature-Fused Network for Robust Facial Landmark Localization

by Xuxin Lin and Yanyan Liang

Mathematics 2023, 11(19), 4026; https://doi.org/10.3390/math11194026 - 22 Sep 2023

Cited by 2 | Viewed by 2165

Abstract

In facial landmark localization, facial region initialization usually plays an important role in guiding the model to learn critical face features. Most facial landmark detectors assume a well-cropped face as input and may underperform in real applications if the input is unexpected. To [...] Read more.

In facial landmark localization, facial region initialization usually plays an important role in guiding the model to learn critical face features. Most facial landmark detectors assume a well-cropped face as input and may underperform in real applications if the input is unexpected. To alleviate this problem, we present a region-aware deep feature-fused network (RDFN). The RDFN consists of a region detection subnetwork and a region-wise landmark localization subnetwork to explicitly solve the input initialization problem and derive the landmark score maps, respectively. To exploit the association between tasks, we develop a cross-task feature fusion scheme to extract multi-semantic region features while trading off their importance in different dimensions via global channel attention and global spatial attention. Furthermore, we design a within-task feature fusion scheme to capture the multi-scale context and improve the gradient flow for the landmark localization subnetwork. At the inference stage, a location reweighting strategy is employed to transform the score maps into 2D landmark coordinates. Extensive experimental results demonstrate that our method has competitive performance compared to recent state-of-the-art methods, achieving NMEs of 3.28%, 1.48%, and 3.43% on the 300W, AFLW, and COFW datasets, respectively. Full article

(This article belongs to the Special Issue Application of Advanced Computing and Artificial Intelligence in Engineering and Science)

► Show Figures

Figure 1

20 pages, 5367 KB

Open AccessArticle

Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs

by Ahmed Mateen Buttar, Usama Ahmad, Abdu H. Gumaei, Adel Assiri, Muhammad Azeem Akbar and Bader Fahad Alkhamees

Mathematics 2023, 11(17), 3729; https://doi.org/10.3390/math11173729 - 30 Aug 2023

Cited by 90 | Viewed by 16948

Abstract

A speech impairment limits a person’s capacity for oral and auditory communication. A great improvement in communication between the deaf and the general public would be represented by a real-time sign language detector. This work proposes a deep learning-based algorithm that can identify [...] Read more.

A speech impairment limits a person’s capacity for oral and auditory communication. A great improvement in communication between the deaf and the general public would be represented by a real-time sign language detector. This work proposes a deep learning-based algorithm that can identify words from a person’s gestures and detect them. There have been many studies on this topic, but the development of static and dynamic sign language recognition models is still a challenging area of research. The difficulty is in obtaining an appropriate model that addresses the challenges of continuous signs that are independent of the signer. Different signers’ speeds, durations, and many other factors make it challenging to create a model with high accuracy and continuity. For the accurate and effective recognition of signs, this study uses two different deep learning-based approaches. We create a real-time American Sign Language detector using the skeleton model, which reliably categorizes continuous signs in sign language in most cases using a deep learning approach. In the second deep learning approach, we create a sign language detector for static signs using YOLOv6. This application is very helpful for sign language users and learners to practice sign language in real time. After training both algorithms separately for static and continuous signs, we create a single algorithm using a hybrid approach. The proposed model, consisting of LSTM with MediaPipe holistic landmarks, achieves around 92% accuracy for different continuous signs, and the YOLOv6 model achieves 96% accuracy over different static signs. Throughout this study, we determine which approach is best for sequential movement detection and for the classification of different signs according to sign language and shows remarkable accuracy in real time. Full article

(This article belongs to the Special Issue New Advances in Computer Vision and Deep Learning)

► Show Figures

Figure 1

Search Results (43)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (43)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI