MDPI - Publisher of Open Access Journals

20 pages, 3850 KB

Open AccessArticle

Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking

by Laihao Song, Litao Han, Jiayan Wang, Hengjian Feng and Ran Ji

ISPRS Int. J. Geo-Inf. 2026, 15(3), 136; https://doi.org/10.3390/ijgi15030136 - 21 Mar 2026

Viewed by 314

Real-time, precise monitoring of the number and distribution of indoor personnel is crucial for building safety management, operational optimization, and personnel scheduling. However, narrow entrances and high-density passageways often lead to missed detections, false positives, and tracking failures in pedestrian detection, thereby reducing [...] Read more.

Real-time, precise monitoring of the number and distribution of indoor personnel is crucial for building safety management, operational optimization, and personnel scheduling. However, narrow entrances and high-density passageways often lead to missed detections, false positives, and tracking failures in pedestrian detection, thereby reducing cross-line counting accuracy. Additionally, edge devices deployed in practical scenarios frequently process multiple video streams simultaneously, resulting in computational resource constraints. To address these challenges, this paper proposes a lightweight, enhanced multi-object pedestrian tracking and counting method tailored for indoor scenarios by optimizing deep learning models. Firstly, modular optimizations are applied to the YOLOv8n model to construct a more lightweight detector, RL_YOLOv8, reducing computational overhead while maintaining accuracy. Secondly, correlated pedestrian auxiliary prediction and pedestrian position change constraints are employed to mitigate ID switching, tracking interruptions, and trajectory jumps in dense scenes. Finally, a buffer zone auxiliary counting strategy is designed to further reduce missed detections of pedestrians crossing lines. Experimental results demonstrate that compared to the original detection-and-tracking-based line-crossing counting method, the improved approach effectively enhances counting accuracy and real-time performance, better meeting the requirements of practical intelligent security and crowd monitoring systems. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 3rd Edition)

► Show Figures

Figure 1

45 pages, 2842 KB

Open AccessArticle

A Taxonomy of Generative Models with a Focus on Diffusion Models and Denoising Techniques

by Aditi Singh, Nikhil Kumar Chatta, Yuvaraj Vagula, Abul Ehtesham, Saket Kumar and Tala Talaei Khoei

Electronics 2026, 15(6), 1293; https://doi.org/10.3390/electronics15061293 - 19 Mar 2026

Viewed by 501

Abstract

Diffusion models have emerged as a powerful class of generative models, demonstrating impressive results across visual domains such as image and video synthesis. This survey provides a comprehensive taxonomy of generative models, with a particular focus on diffusion models and their applications in [...] Read more.

Diffusion models have emerged as a powerful class of generative models, demonstrating impressive results across visual domains such as image and video synthesis. This survey provides a comprehensive taxonomy of generative models, with a particular focus on diffusion models and their applications in enhancing visual fidelity for text-to-image and text-to-video generation. We discuss the theoretical foundations of diffusion models, including their formulation through stochastic differential equations, and analyze the forward noising and reverse denoising processes that enable stable training and high-quality generation. The survey further categorizes diffusion architectures, including pixel-space and latent-space models, and examines their design choices, training strategies, and trade-offs across different resolution regimes. In addition, we review noise characteristics in real-world imaging domains and discuss their implications for diffusion-based models. Denoising strategies are analyzed by distinguishing between in-model denoising mechanisms and external denoising techniques used in preprocessing and post-processing pipelines. The survey also summarizes commonly used datasets and evaluation metrics for generative modeling, providing a practical perspective on benchmarking and model comparison. Finally, we discuss current challenges, including computational efficiency, scalability, and robustness to diverse noise distributions, and outline potential directions for future research. This survey aims to provide a structured reference for understanding diffusion models and their applications in visual generation tasks. Full article

(This article belongs to the Special Issue Autonomous Intelligence: Concepts and Applications of Agentic AI)

► Show Figures

Figure 1

18 pages, 1050 KB

Open AccessArticle

Research on Fire Smoke Recognition Algorithm with Image Enhancement for Unconventional Scenarios in Under-Construction Nuclear Power Plants

by Tingren Wang, Guangwei Liu, Kai Yu and Baolin Yao

Fire 2026, 9(3), 128; https://doi.org/10.3390/fire9030128 - 17 Mar 2026

Viewed by 499

Abstract

Accurate identification of fire smoke is a key link in realizing early fire prevention and control. Traditional intelligent video and image processing technologies are significantly restricted by environmental factors, with weak anti-interference capabilities and limitations in distinguishing fire smoke, leading to a high [...] Read more.

Accurate identification of fire smoke is a key link in realizing early fire prevention and control. Traditional intelligent video and image processing technologies are significantly restricted by environmental factors, with weak anti-interference capabilities and limitations in distinguishing fire smoke, leading to a high false alarm rate of fires. To address this problem, this paper proposes an unconventional visual field smoke detection method based on image enhancement. The method innovatively improves the Retinex algorithm by integrating improved guided filtering, adaptive brightness correction, and CLAHE-WWGIF joint processing, which realizes targeted optimization for the unique interference factors of under-construction nuclear power plants such as water mist, low illumination, and equipment occlusion. First, an improved Retinex algorithm is used to process the image to improve the image brightness and contrast, retain edge details while avoiding halo artifacts, reduce the impact of noise, and optimize visual features. Then, the sample data set is integrated, and the YOLOv11 target detection algorithm is used to achieve accurate identification and positioning of smoke targets. Experimental data shows that the fire identification method achieves an accuracy rate of 93.6% and 92.3% for fire smoke identification in interference-prone scenarios such as dark nights and water mist, respectively, and the response time to fire smoke is only 1.8 s and 2.1 s. In practical on-site applications at nuclear power plant construction sites, the method is integrated into an “edge computing + distributed deployment” hardware system, which realizes real-time smoke detection in core areas such as nuclear islands and conventional islands with a false alarm rate of less than 5% and a detection delay of ≤300 ms, meeting the ultra-strict safety monitoring requirements of nuclear power projects. Experiments show that this method can be effectively applied to smoke detection scenarios under unconventional visual fields, accurately identify smoke, provide reliable technical support for fire smoke identification under unconventional visual fields, significantly reduce the false alarm rate of fire detection, and provide technical support for the safety of under-construction nuclear power plants. Full article

(This article belongs to the Special Issue Fire Risk Management and Emergency Prevention)

► Show Figures

Figure 1

22 pages, 327 KB

Open AccessArticle

From Participants to Community Partners: A Novel Community-Based Participatory Research (CBPR) Approach to Autistic-Led Inquiry in Digital and Virtual Environments

by Vivian Darlene Grillo, Margherita Zani, Vittoria Veronesi and Paola Venuti

Healthcare 2026, 14(6), 702; https://doi.org/10.3390/healthcare14060702 - 10 Mar 2026

Viewed by 452

Abstract

Background/Objectives: Autism research has often interpreted autistic sociality through neurotypical norms, limiting ecological accounts of autistic meaning-making and context-sensitive support needs. Social virtual environments (SVEs), such as VRChat, allow modulation of sensory exposure, social distance, and participation pace, potentially enabling autistic-led interaction [...] Read more.

Background/Objectives: Autism research has often interpreted autistic sociality through neurotypical norms, limiting ecological accounts of autistic meaning-making and context-sensitive support needs. Social virtual environments (SVEs), such as VRChat, allow modulation of sensory exposure, social distance, and participation pace, potentially enabling autistic-led interaction with greater autonomy and predictability. This study examined how autistic young adults co-construct meanings around social interaction, identity, and self-regulation in peer-led discussions within an SVE; identified context-sensitive processes relevant to well-being; and evaluated the feasibility and acceptability of SVEs as a participatory research setting. Methods: Sixteen autistic young adults (18–38 years; DSM-5-TR, Level 1) participated in nine remote sessions conducted in VRChat, coordinated via a co-designed Discord server. The peer-led discussions were audio-video recorded, transcribed, and anonymized. Data were analyzed using reflexive thematic analysis, combining inductive session-level coding, cross-session thematic clustering, and participatory refinement with community partners. Results: Autistic experience was framed as a context-dependent negotiation of interpretive risk, interactional workload, masking-related energy costs, and epistemic injustice, alongside future-oriented accounts emphasizing access, dignity, and systemic redesign. Observational memos documented multimodal participation, distributed peer facilitation, and accessibility-relevant sensitivities to environmental stability. Community partners reported positive experiences and supported the acceptability of private-world VRChat sessions. Conclusions: Peer-led discussions in an SVE can support ecologically grounded, participant-centered qualitative research, offering methodological opportunities to study autistic meaning-making under conditions that reduce demands and risks. Full article

20 pages, 9856 KB

Open AccessArticle

Dynamic Characteristics Analysis of the Slumping-Disintegrated Evolution Process of a Tower-Column Unstable Rock Mass: A Case Study of the Large-Scale Collapse of Zengziyan in Jinfo Mountain

by Fuchuan Zhou, Xinrong Liu, Dandan Zuo, Hongmei Tang, Yuntao Zhou and Xueyan Guo

Appl. Sci. 2026, 16(5), 2282; https://doi.org/10.3390/app16052282 - 26 Feb 2026

Viewed by 250

Abstract

Studying the slumping disintegration, movement speed, impact intensity, accumulation characteristics, and energy conversion laws of tower-column unstable rock masses (TCURM) is crucial for high-altitude rockfall hazard risk evaluation. Existing PFC-based rockfall simulations rarely target the unique “top-hard-bottom-weak” structural characteristics of TCURM and lack [...] Read more.

Studying the slumping disintegration, movement speed, impact intensity, accumulation characteristics, and energy conversion laws of tower-column unstable rock masses (TCURM) is crucial for high-altitude rockfall hazard risk evaluation. Existing PFC-based rockfall simulations rarely target the unique “top-hard-bottom-weak” structural characteristics of TCURM and lack in-depth integration of on-site monitoring videos to verify dynamic evolution processes. Taking the large-scale collapse of W12^# unstable rock mass at Zengziyan, Jinfo Mountain in Chongqing as an example, a combination method of orthogonal test and PFC^3D discrete element simulation is used. Mesoscopic parameters are calibrated via comparison with on-site video and investigation data, accurately reproducing the entire slumping disintegration process and revealing its dynamic characteristics. Results confirm the simulation is basically consistent with field data, verifying the model and parameter rationality. The total duration from instability to stagnation is 121 s (15 s to impact the secondary steep cliff base, 106 s for debris accumulation). Movement speed time-histories of deteriorated and non-deteriorated zones are generally consistent, both exhibiting a “double-peak” feature. Rockfall impact force first increases, stabilizes in the middle, and declines to stability afterward, with a maximum of 2.1 × 10⁹ N. The kinetic energy curve also shows a “double-peak” distribution, closely related to the on-site two-level steep cliff morphology. The findings provide important references for analyzing the dynamic evolution of such rockfalls and designing disaster prevention/mitigation engineering. Full article

(This article belongs to the Special Issue Dynamics of Geohazards)

► Show Figures

Figure 1

28 pages, 2555 KB

Open AccessArticle

Deep Learning-Based Video Watermarking: A Robust Framework for Spatial–Temporal Embedding and Retrieval

by Antonio Cedillo-Hernandez, Lydia Velazquez-Garcia, Francisco Javier Garcia-Ugalde and Manuel Cedillo-Hernandez

Future Internet 2026, 18(2), 104; https://doi.org/10.3390/fi18020104 - 16 Feb 2026

Cited by 1 | Viewed by 669

Abstract

This paper introduces a deep learning-based framework for video watermarking that achieves robust, imperceptible, and fast embedding under a wide range of visual and temporal conditions. The proposed method is organized into seven modules that collaboratively perform frame encoding, semantic region analysis, block [...] Read more.

This paper introduces a deep learning-based framework for video watermarking that achieves robust, imperceptible, and fast embedding under a wide range of visual and temporal conditions. The proposed method is organized into seven modules that collaboratively perform frame encoding, semantic region analysis, block selection, watermark transformation, and spatiotemporal injection, followed by decoding and multi-objective optimization. A key component of the framework is its ability to learn a visual importance map, which guides a saliency-based block selection strategy. This allows the model to embed the watermark in perceptually redundant regions while minimizing distortion. To enhance resilience, the watermark is distributed across multiple frames, leveraging temporal redundancy to improve recovery under frame loss, insertion, and reordering. Experimental evaluations conducted on a large-scale video dataset demonstrate that the proposed method achieves high fidelity, while preserving low decoding error rates under compression, noise, and temporal distortions. The proposed method operates processing 38 video frames per second on a standard GPU. Additional ablation studies confirm the contribution of each module to the system’s robustness. This framework offers a promising solution for watermarking in streaming, surveillance, and content verification applications. Full article

(This article belongs to the Section Big Data and Augmented Intelligence)

► Show Figures

Graphical abstract

17 pages, 318 KB

Open AccessEntry

Artificial Intelligence and the Transformation of the Media System

by Georgiana Camelia Stănescu

Encyclopedia 2026, 6(2), 45; https://doi.org/10.3390/encyclopedia6020045 - 10 Feb 2026

Viewed by 1606

Definition

Artificial intelligence is increasingly being used in all branches of the media system and has transformed the way specialists in this field work in recent years. Currently, applications of artificial intelligence are used across a range of processes involved in the production, editing, [...] Read more.

Artificial intelligence is increasingly being used in all branches of the media system and has transformed the way specialists in this field work in recent years. Currently, applications of artificial intelligence are used across a range of processes involved in the production, editing, distribution, and consumption of media content. These include technologies such as generative chatbots, automated transcription, writing, translation, and editing tools, as well as applications for image and video creation. All of these types of applications have taken over a significant portion of the traditional activities carried out by media professionals. From a technological point of view, these uses primarily rely on machine learning, natural language processing, and computer vision techniques, complemented by generative models that automatically analyze, generate, and interpret text, sound, and images. Although these technologies contribute to increased efficiency, faster work, and reduced operating costs, they also pose significant risks, particularly regarding the spread of false information. From a theoretical perspective, artificial intelligence goes beyond the status of a technological tool, being conceptualized as a communicational actor that actively intervenes in the generation, structuring, and circulation of messages, influencing the relationships between producers, content, and audiences in the current media environment. Full article

(This article belongs to the Collection Encyclopedia of Social Sciences)

► Show Figures

Graphical abstract

30 pages, 12207 KB

Open AccessArticle

Automatic Identification and Segmentation of Diffuse Aurora from Untrimmed All-Sky Auroral Videos

by Qian Wang, Peiqi Hao and Han Pan

Remote Sens. 2026, 18(3), 402; https://doi.org/10.3390/rs18030402 - 25 Jan 2026

Viewed by 534

Abstract

Diffuse aurora is a widespread and long-lasting auroral emission that plays an important role in diagnosing magnetosphere-ionosphere coupling and magnetospheric plasma transport. Despite its scientific significance, diffuse aurora remains challenging to identify automatically in all-sky imager (ASI) observations due to its weak optical [...] Read more.

Diffuse aurora is a widespread and long-lasting auroral emission that plays an important role in diagnosing magnetosphere-ionosphere coupling and magnetospheric plasma transport. Despite its scientific significance, diffuse aurora remains challenging to identify automatically in all-sky imager (ASI) observations due to its weak optical intensity, indistinct boundaries, and gradual temporal evolution. These characteristics, together with frequent cloud contamination, limit the effectiveness of conventional keogram-based or morphology-driven detection approaches and hinder large-scale statistical analyses based on long-term optical datasets. In this study, we propose an automated framework for the identification and temporal segmentation of diffuse aurora from untrimmed all-sky auroral videos. The framework consists of a frame-level coarse identification module that combines weak morphological information with inter-frame temporal dynamics to detect candidate diffuse-auroral intervals, and a snippet-level segmentation module that dynamically aggregates temporal information to capture the characteristic gradual onset-plateau-decay evolution of diffuse aurora. Bidirectional temporal modeling is employed to improve boundary localization, while an adaptive mixture-of-experts mechanism reduces redundant temporal variations and enhances discriminative features relevant to diffuse emission. The proposed method is evaluated using multi-year 557.7 nm ASI observations acquired at the Arctic Yellow River Station. Quantitative experiments demonstrate state-of-the-art performance, achieving 96.3% frame-wise accuracy and an Edit score of 87.7%. Case studies show that the method effectively distinguishes diffuse aurora from cloud-induced pseudo-diffuse structures and accurately resolves gradual transition boundaries that are ambiguous in keograms. Based on the automated identification results, statistical distributions of diffuse aurora occurrence, duration, and diurnal variation are derived from continuous observations spanning 2003–2009. The proposed framework enables robust and fully automated processing of large-scale all-sky auroral images, providing a practical tool for remote sensing-based auroral monitoring and supporting objective statistical studies of diffuse aurora and related magnetospheric processes. Full article

(This article belongs to the Special Issue Advances in Near-Earth Space and Atmospheric Physics from Ground-Based and Satellite Observations)

► Show Figures

Figure 1

13 pages, 1497 KB

Open AccessArticle

A Spatio-Temporal Model for Intelligent Vehicle Navigation Using Big Data and SparkML LSTM

by Imad El Mallahi, Jamal Riffi, Hamid Tairi, Mostafa El Mallahi and Mohamed Adnane Mahraz

World Electr. Veh. J. 2026, 17(1), 54; https://doi.org/10.3390/wevj17010054 - 22 Jan 2026

Viewed by 402

Abstract

The rapid development of autonomous driving systems has increased the demand for scalable frameworks capable of modeling vehicle motion patterns in complex traffic environments. This paper proposes a big data spatio-temporal modeling architecture that integrates Apache Spark version 4.0.1 (SparkML) with Long Short-Term [...] Read more.

The rapid development of autonomous driving systems has increased the demand for scalable frameworks capable of modeling vehicle motion patterns in complex traffic environments. This paper proposes a big data spatio-temporal modeling architecture that integrates Apache Spark version 4.0.1 (SparkML) with Long Short-Term Memory (LSTM) networks to analyze and classify vehicle trajectory patterns. The proposed SparkML–LSTM framework exploits Spark’s distributed processing capabilities and LSTM’s strength in sequential learning to handle large-scale traffic trajectory data efficiently. Experiments were conducted using the DETRAC dataset, which is a large-scale benchmark for vehicle detection and multi-object tracking consisting of more than 10 h of video captured at 24 different locations. The videos were recorded at 25 frames per second with a resolution of 960 × 540 pixels and annotated across more than 140,000 frames, covering 8.250 vehicles and approximately 1.21 million bounding box annotations. The dataset provides detailed annotations, including vehicle categories (Car, Bus, Van, Others), weather conditions (Sunny, Cloudy, Rainy, Night), occlusion ratio, truncation ratio, and vehicle scale. Based on the extracted trajectory features, vehicle motion patterns were categorized into predefined movement classes derived from trajectory dynamics. The experimental results demonstrate strong classification performance. These findings suggest that the proposed SparkML–LSTM architecture is effective for large-scale spatio-temporal trajectory modeling and traffic behavior analysis, and can serve as a foundation for higher-level decision-making modules in intelligent transportation system. Full article

(This article belongs to the Section Automated and Connected Vehicles)

► Show Figures

Figure 1

21 pages, 75033 KB

Open AccessArticle

From Stones to Screen: Open-Source 3D Modeling and AI Video Generation for Reconstructing the Coëby Necropolis

by Jean-Baptiste Barreau and Philippe Gouézin

Heritage 2026, 9(1), 24; https://doi.org/10.3390/heritage9010024 - 10 Jan 2026

Viewed by 1095

Abstract

This study presents a comprehensive digital workflow for the archaeological investigation and heritage enhancement of the Coëby megalithic necropolis (Brittany, France). Dating to the Middle Neolithic, between the 4th and 3rd millennia BC, this chronology is established through stratigraphy, material culture, and radiocarbon [...] Read more.

This study presents a comprehensive digital workflow for the archaeological investigation and heritage enhancement of the Coëby megalithic necropolis (Brittany, France). Dating to the Middle Neolithic, between the 4th and 3rd millennia BC, this chronology is established through stratigraphy, material culture, and radiocarbon dating. Focusing on cairns TRED 8 and TRED 9, which are two excavation units, we combined field archaeology, photogrammetry, and topographic data with open-source 3D geometric modeling to reconstruct the monuments’ original volumes and test construction hypotheses. The methodology leveraged the free software Blender (version 3.0.1) and its Bagapie extension for the procedural simulation of lithic block distribution within the tumular masses, ensuring both metric accuracy and realistic texturing. Beyond static reconstruction, the research explores innovative dynamic and narrative visualization techniques. We employed the FILM model for smooth video interpolation of the construction sequences and utilized the Wan 2.1 AI model to generate immersive video scenes of Neolithic life based on archaeologically informed prompts. The entire process, from data acquisition to final visualization, was conducted using free and open-source tools, guaranteeing full methodological reproducibility and alignment with open science principles. Our results include detailed 3D reconstructions that elucidate the complex architectural sequences of the cairns, as well as dynamic visualizations that enhance the understanding of their construction logic. This study demonstrates the analytical potential of open-source 3D modelling and AI-based visualisation for megalithic archaeology. Full article

(This article belongs to the Topic 3D Documentation of Natural and Cultural Heritage)

► Show Figures

Figure 1

16 pages, 235 KB

Open AccessEntry

Popular Culture in a Digital Society: Nine Paradoxes

by Sue Spaid

Encyclopedia 2026, 6(1), 12; https://doi.org/10.3390/encyclopedia6010012 - 6 Jan 2026

Viewed by 1070

Definition

This entry, which identifies nine paradoxes particular to popular culture in a digital society, begins by distinguishing art and culture, since scholars have historically relied on these terms to differentiate popular culture, mass culture, and mass art. Digital societies, which exist both online [...] Read more.

This entry, which identifies nine paradoxes particular to popular culture in a digital society, begins by distinguishing art and culture, since scholars have historically relied on these terms to differentiate popular culture, mass culture, and mass art. Digital societies, which exist both online and offline, are awash in digital products such as LED signs, digital imagery, video games, film, podcasts, and social media. In a digital society, popular culture is effectively “mass art,” which exhibits five properties: (1) digital media’s low-cost products and low-skill tools are (2) created and distributed to appeal to as broad a cultural sector as possible (qualitative) and thus aim to (3) attract consumers (quantitative) who capably enjoy and deploy cultural content both (4) offline and online, yet “popularity” ultimately depends on (5) efforts to maximize unity and minimize fragmentation. Except for localized events, popular culture has largely disappeared, while mass art will likely flourish until human beings clamor once again for firsthand experiences or go extinct. The next frontier will be finding ways to prevent artificial intelligence from producing cultural products, not because they will be terrible, undesirable, or fake, but because the culture-making process itself engenders human wellbeing. Full article

(This article belongs to the Section Social Sciences)

17 pages, 3550 KB

Open AccessArticle

Edge Intelligence-Based Rail Transit Equipment Inspection System

by Lijia Tian, Hongli Zhao, Li Zhu, Hailin Jiang and Xinjun Gao

Sensors 2026, 26(1), 236; https://doi.org/10.3390/s26010236 - 30 Dec 2025

Cited by 1 | Viewed by 720

Abstract

The safe operation of rail transit systems relies heavily on the efficient and reliable maintenance of their equipment, as any malfunction or abnormal operation may pose serious risks to transportation safety. Traditional manual inspection methods are often characterized by high costs, low efficiency, [...] Read more.

The safe operation of rail transit systems relies heavily on the efficient and reliable maintenance of their equipment, as any malfunction or abnormal operation may pose serious risks to transportation safety. Traditional manual inspection methods are often characterized by high costs, low efficiency, and susceptibility to human error. To address these limitations, this paper presents a rail transit equipment inspection system based on Edge Intelligence (EI) and 5G technology. The proposed system adopts a cloud–edge–end collaborative architecture that integrates Computer Vision (CV) techniques to automate inspection tasks; specifically, a fine-tuned YOLOv8 model is employed for object detection of personnel and equipment, while a ResNet-18 network is utilized for equipment status classification. By implementing an ETSI MEC-compliant framework on edge servers (NVIDIA Jetson AGX Orin), the system enhances data processing efficiency and network performance, while further strengthening security through the use of a 5G private network that isolates critical infrastructure data from the public internet, and improving robustness via distributed edge nodes that eliminate single points of failure. The proposed solution has been deployed and evaluated in real-world scenarios on Beijing Metro Line 6. Experimental results demonstrate that the YOLOv8 model achieves a mean Average Precision (mAP@0.5) of 92.7% ± 0.4% for equipment detection, and the ResNet-18 classifier attains 95.8% ± 0.3% accuracy in distinguishing normal and abnormal statuses. Compared with a cloud-centric architecture, the EI-based system reduces the average end-to-end latency for anomaly detection tasks by 45% (28.5 ms vs. 52.1 ms) and significantly lowers daily bandwidth consumption by approximately 98.1% (from 40.0 GB to 0.76 GB) through an event-triggered evidence upload strategy involving images and short video clips, highlighting its superior real-time performance, security, robustness, and bandwidth efficiency. Full article

(This article belongs to the Special Issue Artificial Intelligence of Things for Future Networks and Service Management)

► Show Figures

Figure 1

24 pages, 3319 KB

Open AccessArticle

NovAc-DL: Novel Activity Recognition Based on Deep Learning in the Real-Time Environment

by Saksham Singla, Sheral Singla, Karan Singla, Priya Kansal, Sachin Kansal, Alka Bishnoi and Jyotindra Narayan

Big Data Cogn. Comput. 2026, 10(1), 11; https://doi.org/10.3390/bdcc10010011 - 29 Dec 2025

Viewed by 843

Abstract

Real-time fine-grained human activity recognition (HAR) remains a challenging problem due to rapid spatial–temporal variations, subtle motion differences, and dynamic environmental conditions. Addressing this difficulty, we propose NovAc-DL, a unified deep learning framework designed to accurately classify short human-like actions, specifically, “pour” and [...] Read more.

Real-time fine-grained human activity recognition (HAR) remains a challenging problem due to rapid spatial–temporal variations, subtle motion differences, and dynamic environmental conditions. Addressing this difficulty, we propose NovAc-DL, a unified deep learning framework designed to accurately classify short human-like actions, specifically, “pour” and “stir” from sequential video data. The framework integrates adaptive time-distributed convolutional encoding with temporal reasoning modules to enable robust recognition under realistic robotic-interaction conditions. A balanced dataset of 2000 videos was curated and processed through a consistent spatiotemporal pipeline. Three architectures, LRCN, CNN-TD, and ConvLSTM, were systematically evaluated. CNN-TD achieved the best performance, reaching 98.68% accuracy with the lowest test loss (0.0236), outperforming the other models in convergence speed, generalization, and computational efficiency. Grad-CAM visualizations further confirm that NovAc-DL reliably attends to motion-salient regions relevant to pouring and stirring gestures. These results establish NovAc-DL as a high-precision real-time-capable solution for deployment in healthcare monitoring, industrial automation, and collaborative robotics. Full article

► Show Figures

Figure 1

30 pages, 10600 KB

Open AccessArticle

Edge-to-Cloud Continuum Orchestrator Based on Heterogeneous Nodes for Urban Traffic Monitoring

by Pietro Ruiu, Andrea Lagorio, Claudio Rubattu, Matteo Anedda, Michele Sanna and Mauro Fadda

Future Internet 2025, 17(12), 574; https://doi.org/10.3390/fi17120574 - 13 Dec 2025

Viewed by 1130

Abstract

This paper presents an edge-to-cloud orchestrator capable of supporting services running at the edge on heterogeneous nodes based on general-purpose processing units and Field Programmable Gate Array (FPGA) platform (i.e., AMD Kria K26 SoM) in an urban environment, integrated with a series of [...] Read more.

This paper presents an edge-to-cloud orchestrator capable of supporting services running at the edge on heterogeneous nodes based on general-purpose processing units and Field Programmable Gate Array (FPGA) platform (i.e., AMD Kria K26 SoM) in an urban environment, integrated with a series of cloud-based services and capable of minimizing energy consumption. A use case of vehicle traffic monitoring is considered in a mobility scenario involving computing nodes equipped with video acquisition systems to evaluate the feasibility of the system. Since the use case concerns the monitoring of vehicular traffic by AI-based images and video processing, specific support for application orchestration in the form of containers was required. The development concerned the feasibility of managing containers with hardware acceleration derived from the Vitis AI design flow, leveraged to accelerate AI inference on the AMD Kria K26 SoM. A Kubernetes-based controller node was designed to facilitate the tracking and monitoring of specific vehicles. These vehicles may either be flagged by law enforcement authorities due to legal concerns or identified by the system itself through detection mechanisms deployed in computing nodes. Strategically distributed across the city, these nodes continuously analyze traffic, identifying vehicles that match the search criteria. Using containerized microservices and Kubernetes orchestration, the infrastructure ensures that tracking operations remain uninterrupted even in high-traffic scenarios. Full article

(This article belongs to the Special Issue Convergence of IoT, Edge and Cloud Systems)

► Show Figures

Figure 1

17 pages, 3112 KB

Open AccessArticle

Predicting Axillary Lymph Node Metastasis of Breast Cancer Using Joint Pre-Trained Fine-Tuning and Contrastive Learning for Contrast-Enhanced Ultrasound

by Rong Huang, Mengshi Tang, Lin Pan, Shaohua Zheng, Shu Chen and Yijie Chen

Bioengineering 2025, 12(12), 1335; https://doi.org/10.3390/bioengineering12121335 - 8 Dec 2025

Viewed by 604

Abstract

Objectives: Breast cancer is one of the most common malignant tumors among women worldwide, and accurate assessment of axillary lymph node metastasis (ALNM) is crucial for determining treatment strategies. Compared to conventional ultrasound, contrast-enhanced ultrasound (CEUS) can observe blood perfusion and microcirculation [...] Read more.

Objectives: Breast cancer is one of the most common malignant tumors among women worldwide, and accurate assessment of axillary lymph node metastasis (ALNM) is crucial for determining treatment strategies. Compared to conventional ultrasound, contrast-enhanced ultrasound (CEUS) can observe blood perfusion and microcirculation changes in primary breast tumors, making it a more ideal diagnostic method for ALNM. Methods: To address the issues that CEUS video sequences require a high level of diagnostic experience from clinicians, and the process is time-consuming and labor-intensive, making it challenging to generate large datasets for deep learning models, we proposed a method for predicting breast cancer ALNM that combines pre-trained fine-tuning with contrastive learning. First, within a text-video contrastive learning framework, we fine-tuned pre-trained weights from a large general dataset using a small-scale proprietary dataset. Second, during the fine-tuning phase, we employed random prompt optimization to specifically adjust the text encoder according to the characteristics of breast CEUS videos, and optimized the extracted text and video representations through an adaptive fine-tuning optimizer to better fit the current data distribution. Results: Experimental results demonstrated that our method achieved a sensitivity of 0.792 and a specificity of 0.8. Conclusions: The study demonstrates that the proposed method effectively leverages CEUS to aid in ALNM diagnosis, highlighting its potential to improve the accuracy of early breast cancer screening and to facilitate the development of more personalized treatment plans for patients. Full article

(This article belongs to the Special Issue Advances in Medical 3D Vision: Voxels and Beyond)

► Show Figures

Graphical abstract

Search Results (240)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (240)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI