Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (676)

Search Parameters:
Keywords = multi-scale feature aggregation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 4820 KB  
Article
SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes
by Chenhao Yang, Yueming Jiang and Chunyan Song
Sensors 2026, 26(1), 125; https://doi.org/10.3390/s26010125 - 24 Dec 2025
Abstract
Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely [...] Read more.
Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely small target scale, difficult localization, and severe background interference. To address these issues, this paper proposes a small face detector for open complex scenes, SFE-DETR, which aims to simultaneously improve detection accuracy and computational efficiency. The backbone network of the model adopts an inverted residual shift convolution and dilated reparameterization structure, which enhances shallow features and enables deep feature self-adaptation, thereby better preserving small-scale information and reducing the number of parameters. Additionally, a multi-head multi-scale self-attention mechanism is introduced to fuse multi-scale convolutional features with channel-wise weighting, capturing fine-grained facial features while suppressing background noise. Moreover, a redesigned SFE-FPN introduces high-resolution layers and incorporates a novel feature fusion module consisting of local, large-scale, and global branches, efficiently aggregating multi-level features and significantly improving small face detection performance. Experimental results on two challenging small face detection datasets show that SFE-DETR reduces parameters by 28.1% compared to the original RT-DETR-R18 model, achieving a mAP50 of 94.7% and AP-s of 42.1% on the SCUT-HEAD dataset, and a mAP50 of 86.3% on the WIDER FACE (Hard) subset. These results demonstrate that SFE-DETR achieves optimal detection performance among models of the same scale while maintaining efficiency. Full article
(This article belongs to the Section Optical Sensors)
26 pages, 1775 KB  
Article
SAR-to-Optical Remote Sensing Image Translation Method Based on InternImage and Cascaded Multi-Head Attention
by Cheng Xu and Yingying Kong
Remote Sens. 2026, 18(1), 55; https://doi.org/10.3390/rs18010055 - 24 Dec 2025
Abstract
Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has [...] Read more.
Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has become a research hotspot in recent years to enhance the interpretability of SAR images. This paper proposes a deep learning-based method for SAR-to-optical remote sensing image translation. The network comprises three parts: a global representor, a generator with cascaded multi-head attention, and a multi-scale discriminator. The global representor, built upon InternImage with deformable convolution v3 (DCNv3) as its core operator, leverages its global receptive field and adaptive spatial aggregation capabilities to extract global semantic features from SAR images. The generator follows the classic “encoder-bottleneck-decoder” structure, where the encoder focuses on extracting local detail features from SAR images. The cascaded multi-head attention module within the bottleneck layer optimizes local detail features and facilitates feature interaction between global semantics and local details. The discriminator adopts a multi-scale structure based on the local receptive field PatchGAN, enabling joint global and local discrimination. Furthermore, for the first time in SAR image translation tasks, structural similarity index metric (SSIM) loss is combined with adversarial loss, perceptual loss, and feature matching loss as the loss function. A series of experiments demonstrate the effectiveness and reliability of the proposed method. Compared to mainstream image translation methods, our method ultimately generates higher-quality optical remote sensing images that are semantically consistent, texturally authentic, clearly detailed, and visually reasonable appearances. Full article
28 pages, 3628 KB  
Review
ADFF-Net: An Attention-Based Dual-Stream Feature Fusion Network for Respiratory Sound Classification
by Bing Zhu, Lijun Chen, Xiaoling Li, Songnan Zhao, Shaode Yu and Qiurui Sun
Technologies 2026, 14(1), 12; https://doi.org/10.3390/technologies14010012 - 24 Dec 2025
Abstract
Deep learning-based respiratory sound classification (RSC) has emerged as a promising non-invasive approach to assist clinical diagnosis. However, existing methods often face challenges, such as sub-optimal feature representation and limited model expressiveness. To address these issues, we propose an Attention-based Dual-stream Feature Fusion [...] Read more.
Deep learning-based respiratory sound classification (RSC) has emerged as a promising non-invasive approach to assist clinical diagnosis. However, existing methods often face challenges, such as sub-optimal feature representation and limited model expressiveness. To address these issues, we propose an Attention-based Dual-stream Feature Fusion Network (ADFF-Net). Built upon the pre-trained Audio Spectrogram Transformer, ADFF-Net takes Mel-filter bank and Mel-spectrogram features as dual-stream inputs, while an attention-based fusion module with a skip connection is introduced to preserve both the raw energy and the relevant tonal variations within the multi-scale time–frequency representation. Extensive experiments on the ICBHI2017 database with the official train–test split show that, despite critical failure in sensitivity of 42.91%, ADFF-Net achieves state-of-the-art performance in terms of aggregated metrics in the four-class RSC task, with an overall accuracy of 64.95%, specificity of 81.39%, and harmonic score of 62.14%. The results confirm the effectiveness of the proposed attention-based dual-stream acoustic feature fusion module for the RSC task, while also highlighting substantial room for improving the detection of abnormal respiratory events. Furthermore, we outline several promising research directions, including addressing class imbalance, enriching signal diversity, advancing network design, and enhancing model interpretability. Full article
Show Figures

Figure 1

25 pages, 7265 KB  
Article
Hazy Aware-YOLO: An Enhanced UAV Object Detection Model for Foggy Weather via Wavelet Convolution and Attention-Based Optimization
by Lin Wang, Binjie Zhang, Qinyan Tan, Dejun Duan and Yulei Wang
Automation 2026, 7(1), 3; https://doi.org/10.3390/automation7010003 - 24 Dec 2025
Abstract
Foggy weather critically undermines the autonomous perception capabilities of unmanned aerial vehicles (UAVs) by degrading image contrast, obscuring object structures, and impairing small target recognition, which often leads to significant performance deterioration in conventional detection models. To address these challenges in automated UAV [...] Read more.
Foggy weather critically undermines the autonomous perception capabilities of unmanned aerial vehicles (UAVs) by degrading image contrast, obscuring object structures, and impairing small target recognition, which often leads to significant performance deterioration in conventional detection models. To address these challenges in automated UAV operations, this study introduces Hazy Aware-YOLO (HA-YOLO), an enhanced detection framework based on YOLO11, specifically engineered for reliable object detection under low-visibility conditions. The proposed model incorporates wavelet convolution to suppress haze-induced noise and enhance multi-scale feature fusion. Furthermore, a novel Context-Enhanced Hybrid Self-Attention (CEHSA) module is developed, which sequentially combines channel attention aggregation (CAA) with multi-head self-attention (MHSA) to capture local contextual cues while mitigating global noise interference. Extensive evaluations demonstrate that HA-YOLO and its variants achieve superior detection precision and robustness compared to the baseline YOLO11, while maintaining model efficacy. In particular, when benchmarked against state-of-the-art detectors, HA-YOLO exhibits a better balance between detection accuracy and complexity, offering a practical and efficient solution for real-world autonomous UAV perception tasks in adverse weather. Full article
(This article belongs to the Section Smart Transportation and Autonomous Vehicles)
Show Figures

Figure 1

26 pages, 5101 KB  
Article
Cross-Modal Adaptive Fusion and Multi-Scale Aggregation Network for RGB-T Crowd Density Estimation and Counting
by Jian Liu, Zuodong Niu, Yufan Zhang and Lin Tang
Appl. Sci. 2026, 16(1), 161; https://doi.org/10.3390/app16010161 - 23 Dec 2025
Abstract
Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T [...] Read more.
Crowd counting is a significant task in computer vision. By combining the rich texture information from RGB images with the insensitivity to illumination changes offered by thermal imaging, the applicability of models in real-world complex scenarios can be enhanced. Current research on RGB-T crowd counting primarily focuses on feature fusion strategies, multi-scale structures, and the exploration of novel network architectures such as Vision Transformer and Mamba. However, existing approaches face two key challenges: limited robustness to illumination shifts and insufficient handling of scale discrepancies. To address these challenges, this study aims to develop a robust RGB-T crowd counting framework that remains stable under illumination shifts, through introduces two key innovations beyond existing fusion and multi-scale approaches: (1) a cross-modal adaptive fusion module (CMAFM) that actively evaluates and fuses reliable cross-modal features under varying scenarios by simulating a dynamic feature selection and trust allocation mechanism; and (2) a multi-scale aggregation module (MSAM) that unifies features with different receptive fields to an intermediate scale and performs weighted fusion to enhance modeling capability for cross-modal scale variations. The proposed method achieves relative improvements of 1.57% in GAME(0) and 0.78% in RMSE on the DroneRGBT dataset compared to existing methods, and improvements of 2.48% and 1.59% on the RGBT-CC dataset, respectively. It also demonstrates higher stability and robustness under varying lighting conditions. This research provides an effective solution for building stable and reliable all-weather crowd counting systems, with significant application prospects in smart city security and management. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

23 pages, 4147 KB  
Article
GCEA-YOLO: An Enhanced YOLOv11-Based Network for Smoking Behavior Detection in Oilfield Operation Areas
by Qing Liu, Xiaojing Wan, Yuzhou Sheng, Shuo Wang and Bo Wei
Sensors 2026, 26(1), 103; https://doi.org/10.3390/s26010103 - 23 Dec 2025
Abstract
Smoking in oilfield operation areas poses a severe risk of fire and explosion accidents, threatening production safety, workers’ lives, and the surrounding ecological environment. Such behavior represents a typical preventable unsafe human action. Detecting smoking behaviors among oilfield workers can fundamentally prevent such [...] Read more.
Smoking in oilfield operation areas poses a severe risk of fire and explosion accidents, threatening production safety, workers’ lives, and the surrounding ecological environment. Such behavior represents a typical preventable unsafe human action. Detecting smoking behaviors among oilfield workers can fundamentally prevent such safety incidents. To address the challenges of low detection accuracy for small objects and frequent missed or false detections under extreme industrial environments, this paper proposes a GCEA-YOLO network based on YOLOv11 for smoking behavior detection. First, a CSP-EDLAN module is introduced to enhance fine-grained feature learning. Second, to reduce model complexity while preserving critical spatial information, an ADown module is incorporated. Third, an enhanced feature fusion module is integrated to achieve effective multiscale feature aggregation. Finally, an EfficientHead module is employed to generate high-precision and lightweight detection results. The experimental results demonstrate that, compared with YOLOv11n, GCEA-YOLO achieves improvements of 20.8% in precision, 6.9% in recall, and 15.1% in mean average precision (mAP). Overall, GCEA-YOLO significantly outperforms YOLOv11n. Full article
(This article belongs to the Topic AI Sensors and Transducers)
Show Figures

Figure 1

22 pages, 3023 KB  
Article
Enhancing Continuous Sign Language Recognition via Spatio-Temporal Multi-Scale Deformable Correlation
by Yihan Jiang, Degang Yang and Chen Chen
Appl. Sci. 2026, 16(1), 124; https://doi.org/10.3390/app16010124 - 22 Dec 2025
Abstract
Deep learning-based sign language recognition plays a pivotal role in facilitating communication for the deaf community. Current approaches, while effective, often introduce redundant information and incur excessive computational overhead through global feature interactions. To address these limitations, this paper introduces a Deformable Correlation [...] Read more.
Deep learning-based sign language recognition plays a pivotal role in facilitating communication for the deaf community. Current approaches, while effective, often introduce redundant information and incur excessive computational overhead through global feature interactions. To address these limitations, this paper introduces a Deformable Correlation Network (DCA) designed for efficient temporal modeling in continuous sign language recognition. The DCA integrates a Deformable Correlation (DC) module that leverages spatio-temporal driven offsets to adjust the sampling range adaptively, thereby minimizing interference. Additionally, a multi-scale local sampling strategy, guided by motion prior, enhances temporal modeling capability while reducing computational costs. Furthermore, an attention-based Correlation Matrix Filter (CMF) is proposed to suppress interference elements by accounting for feature motion patterns. A long-term temporal enhancement module, based on spatial aggregation, efficiently leverages global temporal information to model the performer’s holistic limb motion trajectories. Extensive experiments on three benchmark datasets demonstrate significant performance improvements, with a reduction in Word Error Rate (WER) of up to 7.0% on the CE-CSL dataset, showcasing the superiority and competitive advantage of the proposed DCA algorithm. Full article
Show Figures

Figure 1

24 pages, 3597 KB  
Article
Research on HVAC Energy Consumption Prediction Based on TCN-BiGRU-Attention
by Limin Wang, Jiangtao Dai, Jumin Zhao, Wei Gao and Dengao Li
Energies 2025, 18(24), 6603; https://doi.org/10.3390/en18246603 - 17 Dec 2025
Viewed by 129
Abstract
HVAC (Heating, Ventilation and Air Conditioning) system in buildings is a major component of energy consumption, and realizing high-precision energy consumption prediction is of great significance for intelligent building management. Aiming at the problems of insufficient modeling ability of nonlinear features and insufficient [...] Read more.
HVAC (Heating, Ventilation and Air Conditioning) system in buildings is a major component of energy consumption, and realizing high-precision energy consumption prediction is of great significance for intelligent building management. Aiming at the problems of insufficient modeling ability of nonlinear features and insufficient portrayal of long time-series dependencies in prediction methods, this paper proposes an HVAC energy consumption prediction model that combines time-sequence convolutional network (TCN), bi-directional gated recurrent unit (BiGRU), and Attention mechanism. The model takes advantage of TCN’s parallel computing and multi-scale feature extraction, BiGRU’s bidirectional temporal dependency modeling, and Attention’s weight assignment of key features to effectively improve the prediction accuracy. In this work, the HVAC load is represented by the building-level electricity meter readings of office buildings equipped with centralized, electrically driven heating, ventilation, and air-conditioning systems. Therefore, the proposed method is mainly applicable to building-level HVAC energy consumption prediction scenarios where aggregated hourly electricity or cooling energy measurements are available, rather than to the control of individual terminal units. The experimental results show that the model in this paper achieves better performance compared to the method on ASHRAE dataset, the proposed model outperforms the baseline by 2.3%, 22.2%, and 34.7% in terms of MAE, RMSE, and MAPE, respectively, on the one-year time-by-time data of the office building, and meanwhile it is significant 54.1% on the MSE metrics. Full article
Show Figures

Figure 1

23 pages, 2619 KB  
Article
LITransformer: Transformer-Based Vehicle Trajectory Prediction Integrating Spatio-Temporal Attention Networks with Lane Topology and Dynamic Interaction
by Yuanchao Zhong, Zhiming Gui, Zhenji Gao, Xinyu Wang and Jiawen Wei
Electronics 2025, 14(24), 4950; https://doi.org/10.3390/electronics14244950 - 17 Dec 2025
Viewed by 225
Abstract
Vehicle trajectory prediction is a pivotal technology in intelligent transportation systems. Existing methods encounter challenges in effectively modeling lane topology and dynamic interaction relationships in complex traffic scenarios, limiting prediction accuracy and reliability. This paper presents Lane Interaction Transformer (LITransformer), a lane-informed trajectory [...] Read more.
Vehicle trajectory prediction is a pivotal technology in intelligent transportation systems. Existing methods encounter challenges in effectively modeling lane topology and dynamic interaction relationships in complex traffic scenarios, limiting prediction accuracy and reliability. This paper presents Lane Interaction Transformer (LITransformer), a lane-informed trajectory prediction framework that builds on spatio–temporal graph attention networks and Transformer-based global aggregation. Rather than introducing entirely new network primitives, LITransformer focuses on two design aspects: (i) a lane topology encoder that fuses geometric and semantic lane features via direction-sensitive, multi-scale dilated graph convolutions, converting vectorized lane data into rich topology-aware representations; and (ii) an Interaction-Aware Graph Attention mechanism (IAGAT) that explicitly models four types of interactions between vehicles and lane infrastructure (V2V, V2N, N2V, N2N), with gating-based fusion of structured road constraints and dynamic spatio–temporal features. The overall architecture employs a Transformer module to aggregate global scene context and a multi-modal decoding head to generate diverse trajectory hypotheses with confidence estimation. Extensive experiments on the Argoverse dataset show that LITransformer achieves a minADE of 0.76 and a minFDE of 1.20, and significantly outperforms representative baselines such as LaneGCN and HiVT. These results demonstrate that explicitly incorporating lane topology and interaction-aware spatio-temporal modeling can significantly improve the accuracy and reliability of vehicle trajectory prediction in complex real-world traffic scenarios. Full article
(This article belongs to the Special Issue Autonomous Vehicles: Sensing, Mapping, and Positioning)
Show Figures

Figure 1

20 pages, 8786 KB  
Article
Learning to Count Crowds from Low-Altitude Aerial Views via Point-Level Supervision and Feature-Adaptive Fusion
by Junzhe Mao, Lin Nai, Jinqi Bai, Chang Liu and Liangfeng Xu
Appl. Sci. 2025, 15(24), 13211; https://doi.org/10.3390/app152413211 - 17 Dec 2025
Viewed by 101
Abstract
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting [...] Read more.
Counting small, densely clustered objects from low-altitude aerial views is challenging due to large scale variations, complex backgrounds, and severe occlusion, which often degrade the performance of fully supervised or density-regression methods. To address these issues, we propose a weakly supervised crowd counting framework that leverages point-level supervision and a feature-adaptive fusion strategy to enhance perception under low-altitude aerial views. The network comprises a front-end feature extractor and a back-end fusion module. The front-end adopts the first 13 convolutional layers of VGG16-BN to capture multi-scale semantic features while preserving crucial spatial details. The back-end integrates a Feature-Adaptive Fusion module and a Multi-Scale Feature Aggregation module: the former dynamically adjusts fusion weights across scales to improve robustness to scale variation, and the latter aggregates multi-scale representations to better capture targets in dense, complex scenes. Point-level annotations serve as weak supervision to substantially reduce labeling cost while enabling accurate localization of small individual instances. Experiments on several public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, and UCF_CC_50, demonstrate that our method surpasses existing mainstream approaches, effectively mitigating scale variation, background clutter, and occlusion, and providing an efficient and scalable weakly supervised solution for small-object counting. Full article
Show Figures

Figure 1

22 pages, 5552 KB  
Article
MSA-UNet: Multiscale Feature Aggregation with Attentive Skip Connections for Precise Building Extraction
by Guobiao Yao, Yan Chen, Wenxiao Sun, Zeyu Zhang, Yifei Tang and Jingxue Bi
ISPRS Int. J. Geo-Inf. 2025, 14(12), 497; https://doi.org/10.3390/ijgi14120497 - 17 Dec 2025
Viewed by 132
Abstract
An accurate and reliable extraction of building structures from high-resolution (HR) remote sensing images is an important research topic in 3D cartography and smart city construction. However, despite the strong overall performance of recent deep learning models, limitations remain in handling significant variations [...] Read more.
An accurate and reliable extraction of building structures from high-resolution (HR) remote sensing images is an important research topic in 3D cartography and smart city construction. However, despite the strong overall performance of recent deep learning models, limitations remain in handling significant variations in building scales and complex architectural forms, which may lead to inaccurate boundaries or difficulties in extracting small or irregular structures. Therefore, the present study proposes MSA-UNet, a reliable semantic segmentation framework that leverages multiscale feature aggregation and attentive skip connections for an accurate extraction of building footprints. This framework is constructed based on the U-Net architecture, incorporating VGG16 as a replacement for the original encoder structure, which enhances its ability to capture low-discriminative features. To further improve the representation of image buildings with different scales and shapes, a serial coarse-to-fine feature aggregation mechanism was used. Additionally, a novel skip connection was built between the encoder and decoder layers to enable adaptive weights. Furthermore, a dual-attention mechanism, implemented through the convolutional block attention module, was integrated to enhance the focus of the network on building regions. Extensive experiments conducted on the WHU and Inria building datasets validated the effectiveness of MSA-UNet. On the WHU dataset, the model demonstrated a state-of-the-art performance with a mean Intersection over Union (mIoU) of 94.26%, accuracy of 98.32%, F1-score of 96.57%, and mean Pixel accuracy (mPA) of 96.85%, corresponding to gains of 1.41% in mIoU over the baseline U-Net. On the more challenging Inria dataset, MSA-UNet achieved an mIoU of 85.92%, indicating a consistent improvement of up to 1.9% over the baseline U-Net. These results confirmed that MSA-UNet markedly improved the accuracy and boundary integrity of building extraction from HR data, outperforming existing classic models in terms of segmentation quality and robustness. Full article
(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)
Show Figures

Figure 1

22 pages, 12312 KB  
Article
ES-YOLO: Multi-Scale Port Ship Detection Combined with Attention Mechanism in Complex Scenes
by Lixiang Cao, Jia Xi, Zixuan Xie, Teng Feng and Xiaomin Tian
Sensors 2025, 25(24), 7630; https://doi.org/10.3390/s25247630 - 16 Dec 2025
Viewed by 248
Abstract
With the rapid development of remote sensing technology and deep learning, the port ship detection based on a single-stage algorithm has achieved remarkable results in optical imagery. However, most of the existing methods are designed and verified in specific scenes, such as fixed [...] Read more.
With the rapid development of remote sensing technology and deep learning, the port ship detection based on a single-stage algorithm has achieved remarkable results in optical imagery. However, most of the existing methods are designed and verified in specific scenes, such as fixed viewing angle, uniform background, or open sea, which makes it difficult to deal with the problem of ship detection in complex environments, such as cloud occlusion, wave fluctuation, complex buildings in the harbor, and multi-ship aggregation. To this end, ES-YOLO framework is proposed to solve the limitations of ship detection. A novel edge perception channel, Spatial Attention Mechanism (EACSA), is proposed to enhance the extraction of edge information and improve the ability to capture feature details. A lightweight spatial–channel decoupled down-sampling module (LSCD) is designed to replace the down-sampling structure of the original network and reduce the complexity of the down-sampling stage. A new hierarchical scale structure is designed to balance the detection effect of different scale differences. In this paper, a remote sensing ship dataset, TJShip, is constructed based on Gaofen-2 images, which covers multi-scale targets from small fishing boats to large cargo ships. The TJShip dataset was adopted as the data source, and the ES-YOLO model was employed to conduct ablation and comparison experiments. The results show that the introduction of EACSA attention mechanism, LSCD, and multi-scale structure improves the mAP of ship detection by 0.83%, 0.54%, and 1.06%, respectively, compared with the baseline model, also performing well in precision, recall and F1. Compared with Faster R-CNN, RetinaNet, YOLOv5, YOLOv7, and YOLOv8 methods, the results show that the ES-YOLO model improves the mAP by 46.87%, 8.14%, 1.85%, 1.75%, and 0.86%, respectively, under the same experimental conditions, which provides research ideas for ship detection. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

19 pages, 3961 KB  
Article
Risk-Aware Multi-Horizon Forecasting of Airport Departure Flow Using a Patch-Based Time-Series Transformer
by Xiangzhi Zhou, Shanmei Li and Siqing Li
Aerospace 2025, 12(12), 1107; https://doi.org/10.3390/aerospace12121107 - 15 Dec 2025
Viewed by 142
Abstract
Airport traffic flow prediction is a basic requirement for air traffic management. Building an effective airport traffic flow prediction model helps reveal how traffic demand evolves over time and supports short-term planning. At the same time, a large amount of air traffic data [...] Read more.
Airport traffic flow prediction is a basic requirement for air traffic management. Building an effective airport traffic flow prediction model helps reveal how traffic demand evolves over time and supports short-term planning. At the same time, a large amount of air traffic data supports using deep learning to learn traffic patterns with stable and accurate performance. In practice, airports need forecasts at short time intervals and need to know the departure flow and its uncertainty 1–2 h in advance. To meet this need, we treat airport departure flow prediction as a multi-step probabilistic forecasting problem on a multi-airport dataset that is organized by airport and time. Scheduled departure counts, recent taxi-out time statistics (P50/P90 over 30- and 60-minute windows), and calendar variables are put on the same time scale and standardized separately for each airport. Based on these data, we propose an end-to-end multi-step forecasting method built on PatchTST. The method uses patch partitioning and a Transformer encoder to extract temporal features from the past 48 h of multivariate history and directly outputs the 10th, 50th, and 90th percentile forecasts of departure flow for each 10 min step in the next 120 min. In this way, the model provides both point forecasts and prediction intervals. Experiments were conducted on 80 airports with the highest departure volumes, using April–July for training, August for validation, September for testing, and October for robustness evaluation. The results show that at a 10 min interval, the model achieves an MAE of 0.411 and an RMSE of 0.713 on the test set. The error increases smoothly with the forecast horizon and remains stable within the 60–120 min range. When the forecasts are aggregated to 1 h intervals in time or aggregated by airport clusters in space, the point forecast errors decrease further, and the average empirical coverage is 0.78 and the width of the percentile-based intervals is 1.29, which can meet the risk-awareness requirements of tactical operations management. The proposed method is relatively simple and also provides a unified modeling framework for later including external factors such as weather, runway configuration, and operational procedures, and for applications across different airports and years. Full article
(This article belongs to the Special Issue AI, Machine Learning and Automation for Air Traffic Control (ATC))
Show Figures

Figure 1

19 pages, 4163 KB  
Article
A Query-Based Progressive Aggregation Network for 3D Medical Image Segmentation
by Wei Peng, Guoqing Hu, Ji Li and Chengzhi Lyu
Appl. Sci. 2025, 15(24), 13153; https://doi.org/10.3390/app152413153 - 15 Dec 2025
Viewed by 258
Abstract
Accurate 3D medical image segmentation is crucial for knowledge-driven clinical decision-making and computer-aided diagnosis. However, current deep learning methods often fail to effectively integrate local structural details from Convolutional Neural Networks (CNNs) with global semantic context from Transformers due to semantic inconsistency and [...] Read more.
Accurate 3D medical image segmentation is crucial for knowledge-driven clinical decision-making and computer-aided diagnosis. However, current deep learning methods often fail to effectively integrate local structural details from Convolutional Neural Networks (CNNs) with global semantic context from Transformers due to semantic inconsistency and poor cross-scale feature alignment. To address this, Progressive Query Aggregation Network (PQAN), a novel framework that incorporates knowledge-guided feature interaction mechanisms, is proposed. PQAN employs two complementary query modules: Structural Feature Query, which uses anatomical morphology for boundary-aware representation, and Content Feature Query, which enhances semantic alignment between encoding and decoding stages. To enhance texture perception, a Texture Attention (TA) module based on Sobel operators adds directional edge awareness and fine-detail enhancement. Moreover, a Progressive Aggregation Strategy with Forward and Backward Cross-Stage Attention gradually aligns and refines multi-scale features, thereby reducing semantic deviations during CNN-Transformer fusion. Experiments on public benchmarks demonstrate that PQAN outperforms state-of-the-art models in both global accuracy and boundary segmentation. On the BTCV and FLARE datasets, PQAN had average Dice scores of 0.926 and 0.816, respectively. These results demonstrate PQAN’s ability to capture complex anatomical structures, small targets, and ambiguous organ boundaries, resulting in an interpretable and scalable solution for real-world clinical deployment. Full article
Show Figures

Figure 1

31 pages, 25297 KB  
Article
AET-FRAP—A Periodic Reshape Transformer Framework for Rock Fracture Early Warning Using Acoustic Emission Multi-Parameter Time Series
by Donghui Yang, Zechao Zhang, Zichu Yang, Yongqi Li and Linhuan Jin
Sensors 2025, 25(24), 7580; https://doi.org/10.3390/s25247580 - 13 Dec 2025
Viewed by 273
Abstract
The timely identification of rock fractures is crucial in deep subterranean engineering. However, it remains necessary to identify reliable warning indicators and establish effective warning levels. This study introduces the Acoustic Emission Transformer for FRActure Prediction (AET-FRAP) multi-input time series forecasting framework, which [...] Read more.
The timely identification of rock fractures is crucial in deep subterranean engineering. However, it remains necessary to identify reliable warning indicators and establish effective warning levels. This study introduces the Acoustic Emission Transformer for FRActure Prediction (AET-FRAP) multi-input time series forecasting framework, which employs acoustic emission feature parameters. First, Empirical Mode Decomposition (EMD) combined with Fast Fourier Transform (FFT) is employed to identify and filter periodicities among diverse indicators and select input channels with enhanced informative value, with the aim of predicting cumulative energy. Thereafter, the one-dimensional sequence is transformed into a two-dimensional tensor based on its predominant period via spectral analysis. This is coupled with InceptionNeXt—an efficient multiscale convolution and amplitude spectrum-weighted aggregate—to enhance pattern identification across various timeframes. A secondary criterion is created based on the prediction sequence, employing cosine similarity and kurtosis to collaboratively identify abrupt changes. This transforms single-point threshold detection into robust sequence behavior pattern identification, indicating clearly quantifiable trigger criteria. AET-FRAP exhibits improvements in accuracy relative to long short-term memory (LSTM) on uniaxial compression test data, with R2 approaching 1 and reductions in Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). It accurately delineates energy accumulation spikes in the pre-fracture period and provides advanced warning. The collaborative thresholds effectively reduce noise-induced false alarms, demonstrating significant stability and engineering significance. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

Back to TopTop