Saved Queries

Artificial intelligence (AI)-generated content (AIGC) is an innovative technology that utilizes machine learning, AI models, reward modeling, and natural language processing (NLP) to create diverse digital content such as videos, images, and text. It has the potential to support various human activities with significant implications in teaching and learning, facilitating heuristic teaching for educators. By using AIGC, teachers can create extensive knowledge content and effectively design instructional strategies to guide students, aligning with heuristic teaching. However, incorporating AIGC into heuristic teaching has controversies and concerns, which potentially mislead outcomes. Nevertheless, leveraging AIGC greatly benefits teachers in enhancing heuristic teaching. When integrating AIGC to support heuristic teaching, challenges and risks must be acknowledged and addressed. These challenges include the need for users to possess sufficient knowledge reserves to identify incorrect information and content generated by AIGC, the importance of avoiding excessive reliance on AIGC, ensuring users maintain control over their actions rather than being driven by AIGC, and the necessity of scrutinizing and verifying the accuracy of information and knowledge generated by AIGC to preserve its effectiveness. Full article

►▼ Show Figures

Figure 1

12 pages, 480 KiB

Open AccessArticle

A Novel Deep Learning Model for Predicting Colorectal Anastomotic Leakage: A Pioneer Multicenter Transatlantic Study

by Miguel Mascarenhas, Francisco Mendes, Filipa Fonseca, Eduardo Carvalho, Andre Santos, Daniela Cavadas, Guilherme Barbosa, Antonio Pinto da Costa, Miguel Martins, Abdullah Bunaiyan, Maísa Vasconcelos, Marley Ribeiro Feitosa, Shay Willoughby, Shakil Ahmed, Muhammad Ahsan Javed, Nilza Ramião, Guilherme Macedo and Manuel Limbert

J. Clin. Med. 2025, 14(15), 5462; https://doi.org/10.3390/jcm14155462 - 3 Aug 2025

Viewed by 56

Abstract

Background/Objectives: Colorectal anastomotic leak (CAL) is one of the most severe postoperative complications in colorectal surgery, impacting patient morbidity and mortality. Current risk assessment methods rely on clinical and intraoperative factors, but no real-time predictive tool exists. This study aimed to develop an artificial intelligence model based on intraoperative laparoscopic recording of the anastomosis for CAL prediction. Methods: A convolutional neural network (CNN) was trained with annotated frames from colorectal surgery videos across three international high-volume centers (Instituto Português de Oncologia de Lisboa, Hospital das Clínicas de Ribeirão Preto, and Royal Liverpool University Hospital). The dataset included a total of 5356 frames from 26 patients, 2007 with CAL and 3349 showing normal anastomosis. Four CNN architectures (EfficientNetB0, EfficientNetB7, ResNet50, and MobileNetV2) were tested. The models’ performance was evaluated using their sensitivity, specificity, accuracy, and area under the receiver operating characteristic (AUROC) curve. Heatmaps were generated to identify key image regions influencing predictions. Results: The best-performing model achieved an accuracy of 99.6%, AUROC of 99.6%, sensitivity of 99.2%, specificity of 100.0%, PPV of 100.0%, and NPV of 98.9%. The model reliably identified CAL-positive frames and provided visual explanations through heatmaps. Conclusions: To our knowledge, this is the first AI model developed to predict CAL using intraoperative video analysis. Its accuracy suggests the potential to redefine surgical decision-making by providing real-time risk assessment. Further refinement with a larger dataset and diverse surgical techniques could enable intraoperative interventions to prevent CAL before it occurs, marking a paradigm shift in colorectal surgery. Full article

(This article belongs to the Special Issue Updates in Digestive Diseases and Endoscopy)

►▼ Show Figures

Figure 1

10 pages, 1055 KiB

Open AccessArticle

Artificial Intelligence and Hysteroscopy: A Multicentric Study on Automated Classification of Pleomorphic Lesions

by Miguel Mascarenhas, Carla Peixoto, Ricardo Freire, Joao Cavaco Gomes, Pedro Cardoso, Inês Castro, Miguel Martins, Francisco Mendes, Joana Mota, Maria João Almeida, Fabiana Silva, Luis Gutierres, Bruno Mendes, João Ferreira, Teresa Mascarenhas and Rosa Zulmira

Cancers 2025, 17(15), 2559; https://doi.org/10.3390/cancers17152559 - 3 Aug 2025

Viewed by 126

Abstract

Background/Objectives: The integration of artificial intelligence (AI) in medical imaging is rapidly advancing, yet its application in gynecologic use remains limited. This proof-of-concept study presents the development and validation of a convolutional neural network (CNN) designed to automatically detect and classify endometrial polyps. Methods: A multicenter dataset (n = 3) comprising 65 hysteroscopies was used, yielding 33,239 frames and 37,512 annotated objects. Still frames were extracted from full-length videos and annotated for the presence of histologically confirmed polyps. A YOLOv1-based object detection model was used with a 70–20–10 split for training, validation, and testing. Primary performance metrics included recall, precision, and mean average precision at an intersection over union (IoU) ≥ 0.50 (mAP50). Frame-level classification metrics were also computed to evaluate clinical applicability. Results: The model achieved a recall of 0.96 and precision of 0.95 for polyp detection, with a mAP50 of 0.98. At the frame level, mean recall was 0.75, precision 0.98, and F1 score 0.82, confirming high detection and classification performance. Conclusions: This study presents a CNN trained on multicenter, real-world data that detects and classifies polyps simultaneously with high diagnostic and localization performance, supported by explainable AI features that enhance its clinical integration and technological readiness. Although currently limited to binary classification, this study demonstrates the feasibility and potential of AI to reduce diagnostic subjectivity and inter-observer variability in hysteroscopy. Future work will focus on expanding the model’s capabilities to classify a broader range of endometrial pathologies, enhance generalizability, and validate performance in real-time clinical settings. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence Methodologies and Applications in Cancer)

►▼ Show Figures

Figure 1

24 pages, 23817 KiB

Open AccessArticle

Dual-Path Adversarial Denoising Network Based on UNet

by Jinchi Yu, Yu Zhou, Mingchen Sun and Dadong Wang

Sensors 2025, 25(15), 4751; https://doi.org/10.3390/s25154751 - 1 Aug 2025

Viewed by 197

Abstract

Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a novel three-module architecture for image denoising, comprising a generator, a dual-path-UNet-based denoiser, and a discriminator. The generator creates synthetic noise patterns to augment training data, while the dual-path-UNet denoiser uses multiple receptive field modules to preserve fine details and dense feature fusion to maintain global structural integrity. The discriminator provides adversarial feedback to enhance denoising performance. This dual-path adversarial training mechanism addresses the limitations of traditional methods by simultaneously capturing both local details and global structures. Experiments on the SIDD, DND, and PolyU datasets demonstrate superior performance. We compare our architecture with the latest state-of-the-art GAN variants through comprehensive qualitative and quantitative evaluations. These results confirm the effectiveness of noise removal with minimal loss of critical image details. The proposed architecture enhances image denoising capabilities in complex noise scenarios, providing a robust solution for applications that require high image fidelity. By enhancing adaptability to various types of noise while maintaining structural integrity, this method provides a versatile tool for image processing tasks that require preserving detail. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

20 pages, 4569 KiB

Open AccessArticle

Lightweight Vision Transformer for Frame-Level Ergonomic Posture Classification in Industrial Workflows

by Luca Cruciata, Salvatore Contino, Marianna Ciccarelli, Roberto Pirrone, Leonardo Mostarda, Alessandra Papetti and Marco Piangerelli

Sensors 2025, 25(15), 4750; https://doi.org/10.3390/s25154750 - 1 Aug 2025

Viewed by 205

Abstract

Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly on raw RGB images without requiring skeleton reconstruction, joint angle estimation, or image segmentation. A single ViT model simultaneously classifies eight anatomical regions, enabling efficient multi-label posture assessment. Training is supervised using a multimodal dataset acquired from synchronized RGB video and full-body inertial motion capture, with ergonomic risk labels derived from RULA scores computed on joint kinematics. The system is validated on realistic, simulated industrial tasks that include common challenges such as occlusion and posture variability. Experimental results show that the ViT model achieves state-of-the-art performance, with F1-scores exceeding 0.99 and AUC values above 0.996 across all regions. Compared to previous CNN-based system, the proposed model improves classification accuracy and generalizability while reducing complexity and enabling real-time inference on edge devices. These findings demonstrate the model’s potential for unobtrusive, scalable ergonomic risk monitoring in real-world manufacturing environments. Full article

(This article belongs to the Special Issue Secure and Decentralised IoT Systems)

►▼ Show Figures

Figure 1

28 pages, 5699 KiB

Open AccessArticle

Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputs

by Hyuk Soo Cho, Kamran Latif, Abubakar Sharafat and Jongwon Seo

Appl. Sci. 2025, 15(15), 8505; https://doi.org/10.3390/app15158505 (registering DOI) - 31 Jul 2025

Viewed by 128

Abstract

Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management. Full article

(This article belongs to the Special Issue AI-Based Machinery Health Monitoring)

►▼ Show Figures

Figure 1

19 pages, 3130 KiB

Open AccessArticle

Deep Learning-Based Instance Segmentation of Galloping High-Speed Railway Overhead Contact System Conductors in Video Images

by Xiaotong Yao, Huayu Yuan, Shanpeng Zhao, Wei Tian, Dongzhao Han, Xiaoping Li, Feng Wang and Sihua Wang

Sensors 2025, 25(15), 4714; https://doi.org/10.3390/s25154714 - 30 Jul 2025

Viewed by 210

Abstract

The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping status of conductors is crucial, and instance segmentation techniques, by delineating the pixel-level contours of each conductor, can significantly aid in the identification and study of galloping phenomena. This work expands upon the YOLO11-seg model and introduces an instance segmentation approach for galloping video and image sensor data of OCS conductors. The algorithm, designed for the stripe-like distribution of OCS conductors in the data, employs four-direction Sobel filters to extract edge features in horizontal, vertical, and diagonal orientations. These features are subsequently integrated with the original convolutional branch to form the FDSE (Four Direction Sobel Enhancement) module. It integrates the ECA (Efficient Channel Attention) mechanism for the adaptive augmentation of conductor characteristics and utilizes the FL (Focal Loss) function to mitigate the class-imbalance issue between positive and negative samples, hence enhancing the model’s sensitivity to conductors. Consequently, segmentation outcomes from neighboring frames are utilized, and mask-difference analysis is performed to autonomously detect conductor galloping locations, emphasizing their contours for the clear depiction of galloping characteristics. Experimental results demonstrate that the enhanced YOLO11-seg model achieves 85.38% precision, 77.30% recall, 84.25% AP@0.5, 81.14% F1-score, and a real-time processing speed of 44.78 FPS. When combined with the galloping visualization module, it can issue real-time alerts of conductor galloping anomalies, providing robust technical support for railway OCS safety monitoring. Full article

(This article belongs to the Section Industrial Sensors)

►▼ Show Figures

Figure 1

16 pages, 5245 KiB

Open AccessArticle

Automatic Detection of Foraging Hens in a Cage-Free Environment with Computer Vision Technology

by Samin Dahal, Xiao Yang, Bidur Paneru, Anjan Dhungana and Lilong Chai

Poultry 2025, 4(3), 34; https://doi.org/10.3390/poultry4030034 - 30 Jul 2025

Viewed by 191

Abstract

Foraging behavior in hens is an important indicator of animal welfare. It involves both the search for food and exploration of the environment, which provides necessary enrichment. In addition, it has been inversely linked to damaging behaviors such as severe feather pecking. Conventional studies rely on manual observation to investigate foraging location, duration, timing, and frequency. However, this approach is labor-intensive, time-consuming, and subject to human bias. Our study developed computer vision-based methods to automatically detect foraging hens in a cage-free research environment and compared their performance. A cage-free room was divided into four pens, two larger pens measuring 2.9 m × 2.3 m with 30 hens each and two smaller pens measuring 2.3 m × 1.8 m with 18 hens each. Cameras were positioned vertically, 2.75 m above the floor, recording the videos at 15 frames per second. Out of 4886 images, 70% were used for model training, 20% for validation, and 10% for testing. We trained multiple You Only Look Once (YOLO) object detection models from YOLOv9, YOLOv10, and YOLO11 series for 100 epochs each. All the models achieved precision, recall, and mean average precision at 0.5 intersection over union (mAP@0.5) above 75%. YOLOv9c achieved the highest precision (83.9%), YOLO11x achieved the highest recall (86.7%), and YOLO11m achieved the highest mAP@0.5 (89.5%). These results demonstrate the use of computer vision to automatically detect complex poultry behavior, such as foraging, making it more efficient. Full article

►▼ Show Figures

Figure 1

18 pages, 2688 KiB

Open AccessArticle

Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking

by Jie Zhao, Ying Gao, Chunjuan Bo and Dong Wang

Sensors 2025, 25(15), 4691; https://doi.org/10.3390/s25154691 - 29 Jul 2025

Viewed by 124

Abstract

Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

►▼ Show Figures

Figure 1

33 pages, 11684 KiB

Open AccessArticle

Face Spoofing Detection with Stacking Ensembles in Work Time Registration System

by Rafał Klinowski and Mirosław Kordos

Appl. Sci. 2025, 15(15), 8402; https://doi.org/10.3390/app15158402 - 29 Jul 2025

Viewed by 114

Abstract

This paper introduces a passive face-authenticity detection system, designed for integration into an employee work time registration platform. The system is implemented as a stacking ensemble of multiple models. Each model independently assesses whether a camera is capturing a live human face or a spoofed representation, such as a photo or video. The ensemble comprises a convolutional neural network (CNN), a smartphone bezel-detection algorithm to identify faces displayed on electronic devices, a face context analysis module, and additional CNNs for image processing. The outputs of these models are aggregated by a neural network that delivers the final classification decision. We examined various combinations of models within the ensemble and compared the performance of our approach against existing methods through experimental evaluation. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

►▼ Show Figures

Figure 1

24 pages, 1408 KiB

Open AccessSystematic Review

Fear Detection Using Electroencephalogram and Artificial Intelligence: A Systematic Review

by Bladimir Serna, Ricardo Salazar, Gustavo A. Alonso-Silverio, Rosario Baltazar, Elías Ventura-Molina and Antonio Alarcón-Paredes

Brain Sci. 2025, 15(8), 815; https://doi.org/10.3390/brainsci15080815 - 29 Jul 2025

Viewed by 356

Abstract

Background/Objectives: Fear detection through EEG signals has gained increasing attention due to its applications in affective computing, mental health monitoring, and intelligent safety systems. This systematic review aimed to identify the most effective methods, algorithms, and configurations reported in the literature for detecting fear from EEG signals using artificial intelligence (AI). Methods: Following the PRISMA 2020 methodology, a structured search was conducted using the string (“fear detection” AND “artificial intelligence” OR “machine learning” AND NOT “fnirs OR mri OR ct OR pet OR image”). After applying inclusion and exclusion criteria, 11 relevant studies were selected. Results: The review examined key methodological aspects such as algorithms (e.g., SVM, CNN, Decision Trees), EEG devices (Emotiv, Biosemi), experimental paradigms (videos, interactive games), dominant brainwave bands (beta, gamma, alpha), and electrode placement. Non-linear models, particularly when combined with immersive stimulation, achieved the highest classification accuracy (up to 92%). Beta and gamma frequencies were consistently associated with fear states, while frontotemporal electrode positioning and proprietary datasets further enhanced model performance. Conclusions: EEG-based fear detection using AI demonstrates high potential and rapid growth, offering significant interdisciplinary applications in healthcare, safety systems, and affective computing. Full article

(This article belongs to the Special Issue Neuropeptides, Behavior and Psychiatric Disorders)

►▼ Show Figures

Figure 1

19 pages, 305 KiB

Open AccessArticle

Gender Inequalities and Precarious Work–Life Balance in Italian Academia: Emergency Remote Work and Organizational Change During the COVID-19 Lockdown

by Annalisa Dordoni

Soc. Sci. 2025, 14(8), 471; https://doi.org/10.3390/socsci14080471 - 29 Jul 2025

Viewed by 294

Abstract

The COVID-19 pandemic has exposed and intensified structural tensions surrounding work−life balance, precarity, and gender inequalities in academia. This paper examines the spatial, temporal, and emotional disruptions experienced by early-career and precarious researchers in Italy during the first national lockdown (March–April 2020) and their engagement in remote academic work. Adopting an exploratory and qualitative approach, the study draws on ten narrative video interviews and thirty participant-generated images to investigate how structural dimensions—such as gender, class, caregiving responsibilities, and the organizational culture of the neoliberal university—shaped these lived experiences. The findings highlight the implosion of boundaries between paid work, care, family life, and personal space and how this disarticulation exacerbated existing inequalities, particularly for women and caregivers. By interpreting both visual and narrative data through a sociological lens on gender, work, and organizations, the paper contributes to current debates on the transformation of academic labor and the reshaping of temporal work regimes through the everyday use of digital technologies in contemporary neoliberal capitalism. It challenges the individualization of discourses on productivity and flexibility and calls for gender-sensitive, structurally informed policies that support equitable and sustainable transitions in work and family life, in line with European policy frameworks. Full article

(This article belongs to the Special Issue Intersections Between Work–Life Balance and Gender Policies: Equality and Sustainability for Caring and Family Transition)

11 pages, 556 KiB

Open AccessArticle

Added Value of SPECT/CT in Radio-Guided Occult Localization (ROLL) of Non-Palpable Pulmonary Nodules Treated with Uniportal Video-Assisted Thoracoscopy

by Demetrio Aricò, Lucia Motta, Giulia Giacoppo, Michelangelo Bambaci, Paolo Macrì, Stefania Maria, Francesco Barbagallo, Nicola Ricottone, Lorenza Marino, Gianmarco Motta, Giorgia Leone, Carlo Carnaghi, Vittorio Gebbia, Domenica Caponnetto and Laura Evangelista

J. Clin. Med. 2025, 14(15), 5337; https://doi.org/10.3390/jcm14155337 - 29 Jul 2025

Viewed by 240

Abstract

Background/Objectives: The extensive use of computed tomography (CT) has led to a significant increase in the detection of small and non-palpable pulmonary nodules, necessitating the use of invasive methods for definitive diagnosis. Video-assisted thoracoscopic surgery (VATS) has become the preferred procedure for nodule resections; however, intraoperative localization remains challenging, especially for deep or subsolid lesions. This study explores whether SPECT/CT improves the technical and clinical outcomes of radio-guided occult lesion localization (ROLL) before uniportal video-assisted thoracoscopic surgery (u-VATS). Methods: This is a retrospective study involving consecutive patients referred for the resection of pulmonary nodules who underwent CT-guided ROLL followed by u-VATS between September 2017 and December 2024. From January 2023, SPECT/CT was systematically added after planar imaging. The cohort was divided into a planar group and a planar + SPECT/CT group. The inclusion criteria involved nodules sized ≤ 2 cm, with ground glass or solid characteristics, located at a depth of <6 cm from the pleural surface. ^99mTc-MAA injected activity, timing, the classification of planar and SPECT/CT image findings (focal uptake, multisite with focal uptake, multisite without focal uptake), spillage, and post-procedure complications were evaluated. Statistical analysis was performed, with continuous data expressed as the median and categorical data as the number. Comparisons were made using chi-square tests for categorical variables and the Mann–Whitney U test for procedural duration. Cohen’s kappa coefficient was calculated to assess agreement between imaging modalities. Results: In total, 125 patients were selected for CT-guided radiotracer injection followed by uniportal-VATS. The planar group and planar + SPECT/CT group comprised 60 and 65 patients, respectively. Focal uptake was detected in 68 (54%), multisite with focal uptake in 46 (36.8%), and multisite without focal uptake in 11 patients (8.8%). In comparative analyses between planar and SPECT/CT imaging in 65 patients, 91% exhibited focal uptake, revealing significant differences in classification for 40% of the patients. SPECT/CT corrected the classification of 23 patients initially categorized as multisite with focal uptake to focal uptake, improving localization accuracy. The mean procedure duration was 39 min with SPECT/CT. Pneumothorax was more frequently detected with SPECT/CT (43% vs. 1.6%). The intraoperative localization success rate was 96%. Conclusions: SPECT/CT imaging in the ROLL procedure for detecting pulmonary nodules before u-VATs demonstrates a significant advantage in reclassifying radiotracer positioning compared to planar imaging. Considering its limited impact on surgical success rates and additional procedural time, SPECT/CT should be reserved for technically challenging cases. Larger sample sizes, multicentric and prospective randomized studies, and formal cost–utility analyses are warranted. Full article

(This article belongs to the Section Nuclear Medicine & Radiology)

►▼ Show Figures

Figure 1

17 pages, 1603 KiB

Open AccessPerspective

A Perspective on Quality Evaluation for AI-Generated Videos

by Zhichao Zhang, Wei Sun and Guangtao Zhai

Sensors 2025, 25(15), 4668; https://doi.org/10.3390/s25154668 - 28 Jul 2025

Viewed by 285

Abstract

Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames but also by temporal coherence across frames and precise semantic alignment with the intended message. The foundational role of sensor technologies is critical, as they determine the physical plausibility of AIGC outputs. In this perspective, we argue that multimodal large language models (MLLMs) are poised to become the cornerstone of next-generation video quality assessment (VQA). By jointly encoding cues from multiple modalities such as vision, language, sound, and even depth, the MLLM can leverage its powerful language understanding capabilities to assess the quality of scene composition, motion dynamics, and narrative consistency, overcoming the fragmentation of hand-engineered metrics and the poor generalization ability of CNN-based methods. Furthermore, we provide a comprehensive analysis of current methodologies for assessing AIGC video quality, including the evolution of generation models, dataset design, quality dimensions, and evaluation frameworks. We argue that advances in sensor fusion enable MLLMs to combine low-level physical constraints with high-level semantic interpretations, further enhancing the accuracy of visual quality assessment. Full article

(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)

►▼ Show Figures

Figure 1

20 pages, 2776 KiB

Open AccessArticle

Automatic 3D Reconstruction: Mesh Extraction Based on Gaussian Splatting from Romanesque–Mudéjar Churches

by Nelson Montas-Laracuente, Emilio Delgado Martos, Carlos Pesqueira-Calvo, Giovanni Intra Sidola, Ana Maitín, Alberto Nogales and Álvaro José García-Tejedor

Appl. Sci. 2025, 15(15), 8379; https://doi.org/10.3390/app15158379 - 28 Jul 2025

Viewed by 213

Abstract

This research introduces an automated 3D virtual reconstruction system tailored for architectural heritage (AH) applications, contributing to the ongoing paradigm shift from traditional CAD-based workflows to artificial intelligence-driven methodologies. It reviews recent advancements in machine learning and deep learning—particularly neural radiance fields (NeRFs) and its successor, Gaussian splatting (GS)—as state-of-the-art techniques in the domain. The study advocates for replacing point cloud data in heritage building information modeling workflows with image-based inputs, proposing a novel “photo-to-BIM” pipeline. A proof-of-concept system is presented, capable of processing photographs or video footage of ancient ruins—specifically, Romanesque–Mudéjar churches—to automatically generate 3D mesh reconstructions. The system’s performance is assessed using both objective metrics and subjective evaluations of mesh quality. The results confirm the feasibility and promise of image-based reconstruction as a viable alternative to conventional methods. The study successfully developed a system for automated 3D mesh reconstruction of AH from images. It applied GS and Mip-splatting for NeRFs, proving superior in noise reduction for subsequent mesh extraction via surface-aligned Gaussian splatting for efficient 3D mesh reconstruction. This photo-to-mesh pipeline signifies a viable step towards HBIM. Full article

(This article belongs to the Special Issue Intelligent Techniques and 3D Virtual Reconstruction for Architectural Heritage)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 67.

Go to page 1 2 3 4 5

Search Results (3,346)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI