Saved Queries

This study introduces an enhanced Intrusion Detection System (IDS) framework for Denial-of-Service (DoS) attacks, utilizing network traffic inter-arrival time (IAT) analysis. By examining the timing between packets and other statistical features, we detected patterns of malicious activity, allowing early and effective DoS threat mitigation. We generate real DoS traffic, including normal, Internet Control Message Protocol (ICMP), Smurf attack, and Transmission Control Protocol (TCP) classes, and develop nine predictive algorithms, combining traditional machine learning and advanced deep learning techniques with optimization methods, including the synthetic minority sampling technique (SMOTE) and grid search (GS). Our findings reveal that while traditional machine learning achieved moderate accuracy, it struggled with imbalanced datasets. In contrast, Deep Neural Network (DNN) models showed significant improvements with optimization, with DNN combined with GS (DNN-GS) reaching 89% accuracy. However, we also used Recurrent Neural Networks (RNNs) combined with SMOTE and GS (RNN-SMOTE-GS), which emerged as the best-performing with a precision of 97%, demonstrating the effectiveness of combining SMOTE and GS and highlighting the critical role of advanced optimization techniques in enhancing the detection capabilities of IDS models for the accurate classification of various types of network traffic and attacks. Full article

►▼ Show Figures

Figure 1

25 pages, 3231 KiB

Open AccessArticle

A Cost-Sensitive Small Vessel Detection Method for Maritime Remote Sensing Imagery

by Zhuhua Hu, Wei Wu, Ziqi Yang, Yaochi Zhao, Lewei Xu, Lingkai Kong, Yunpei Chen, Lihang Chen and Gaosheng Liu

Remote Sens. 2025, 17(14), 2471; https://doi.org/10.3390/rs17142471 - 16 Jul 2025

Abstract

Vessel detection technology based on marine remote sensing imagery is of great importance. However, it often faces challenges, such as small vessel targets, cloud occlusion, insufficient data volume, and severely imbalanced class distribution in datasets. These issues result in conventional models failing to meet the accuracy requirements for practical applications. In this paper, we first construct a novel remote sensing vessel image dataset that includes various complex scenarios and enhance the data volume and diversity through data augmentation techniques. Secondly, we address the class imbalance between foreground (small vessels) and background in remote sensing imagery from two perspectives: the sensitivity of IoU metrics to small object localization errors and the innovative design of a cost-sensitive loss function. Specifically, at the dataset level, we select vessel targets appearing in the original dataset as templates and randomly copy–paste several instances onto arbitrary positions. This enriches the diversity of target samples per image and mitigates the impact of data imbalance on the detection task. At the algorithm level, we introduce the Normalized Wasserstein Distance (NWD) to compute the similarity between bounding boxes. This enhances the importance of small target information during training and strengthens the model’s cost-sensitive learning capabilities. Ablation studies reveal that detection performance is optimal when the weight assigned to the NWD metric in the model’s loss function matches the overall proportion of small objects in the dataset. Comparative experiments show that the proposed NWD-YOLO achieves Precision, Recall, and

{AP}_{50}

scores of 0.967, 0.958, and 0.971, respectively, meeting the accuracy requirements of real-world applications. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

19 pages, 2270 KiB

Open AccessArticle

IoMT Architecture for Fully Automated Point-of-Care Molecular Diagnostic Device

by Min-Gin Kim, Byeong-Heon Kil, Mun-Ho Ryu and Jong-Dae Kim

Sensors 2025, 25(14), 4426; https://doi.org/10.3390/s25144426 - 16 Jul 2025

Abstract

The Internet of Medical Things (IoMT) is revolutionizing healthcare by integrating smart diagnostic devices with cloud computing and real-time data analytics. The emergence of infectious diseases, including COVID-19, underscores the need for rapid and decentralized diagnostics to facilitate early intervention. Traditional centralized laboratory testing introduces delays, limiting timely medical responses. While point-of-care molecular diagnostic (POC-MD) systems offer an alternative, challenges remain in cost, accessibility, and network inefficiencies. This study proposes an IoMT-based architecture for fully automated POC-MD devices, leveraging WebSockets for optimized communication, enhancing microfluidic cartridge efficiency, and integrating a hardware-based emulator for real-time validation. The system incorporates DNA extraction and real-time polymerase chain reaction functionalities into modular, networked components, improving flexibility and scalability. Although the system itself has not yet undergone clinical validation, it builds upon the core cartridge and detection architecture of a previously validated cartridge-based platform for Chlamydia trachomatis and Neisseria gonorrhoeae (CT/NG). These pathogens were selected due to their global prevalence, high asymptomatic transmission rates, and clinical importance in reproductive health. In a previous clinical study involving 510 patient specimens, the system demonstrated high concordance with a commercial assay with limits of detection below 10 copies/μL, supporting the feasibility of this architecture for point-of-care molecular diagnostics. By addressing existing limitations, this system establishes a new standard for next-generation diagnostics, ensuring rapid, reliable, and accessible disease detection. Full article

(This article belongs to the Special Issue Advances in Sensors and IoT for Health Monitoring)

►▼ Show Figures

Figure 1

22 pages, 5031 KiB

Open AccessArticle

Numerical Simulation and Analysis of Micropile-Raft Joint Jacking Technology for Rectifying Inclined Buildings Due to Uneven Settlement

by Ming Xie, Li’e Yin, Zhangdong Wang, Fangbo Xu, Xiangdong Wu and Mengqi Xu

Buildings 2025, 15(14), 2485; https://doi.org/10.3390/buildings15142485 - 15 Jul 2025

Abstract

To address the issue of structural tilting caused by uneven foundation settlement in soft soil areas, this study combined a specific engineering case to conduct numerical simulations of the rectification process for an inclined reinforced concrete building using ABAQUS finite element software. Micropile-raft combined jacking technology was employed, applying staged jacking forces (2400 kN for Axis A, 2200 kN for Axis B, and 1700 kN for Axis C) with precise control through 20 incremental steps. The results demonstrate that this technology effectively halted structural tilting, reducing the maximum inclination rate from 0.51% to 0.05%, significantly below the standard limit. Post-rectification, the peak structural stress decreased by 42%, and displacements were markedly reduced. However, the jacking process led to a notable increase in the column axial forces and directional changes in beam bending moments, reflecting the dynamic redistribution of internal forces. The study confirms that micropile-raft combined jacking technology offers both controllability and safety, while optimized counterforce pile layouts enhance the long-term stability of the rectification system. Based on stress and displacement cloud analysis, a monitoring scheme is proposed, forming an integrated “rectification-monitoring-reinforcement” solution, which provides a technical framework for building rectification in soft soil regions. Full article

(This article belongs to the Special Issue Building Foundation Analysis: Soil–Structure Interaction—2nd Edition)

►▼ Show Figures

Figure 1

16 pages, 5188 KiB

Open AccessArticle

An Object Detection Algorithm for Orchard Vehicles Based on AGO-PointPillars

by Pengyu Ren, Xuyun Qiu, Qi Gao and Yumin Song

Agriculture 2025, 15(14), 1529; https://doi.org/10.3390/agriculture15141529 - 15 Jul 2025

Abstract

With the continuous expansion of the orchard planting area, there is an urgent need for autonomous orchard vehicles that can reduce the labor intensity of fruit farmers and improve the efficiency of operations to assist operators in the process of orchard operations. An object detection system that can accurately identify potholes, trees, and other orchard objects is essential to achieve unmanned operation of the orchard vehicle. Aiming to improve upon existing object detection algorithms, which have the problem of low object recognition accuracy in orchard operation scenes, we propose an orchard vehicle object detection algorithm based on Attention-Guided Orchard PointPillars (AGO-PointPillars). Firstly, we use an RGB-D camera as the sensing hardware to collect the orchard road information and convert the depth image data obtained by the RGB-D camera into 3D point cloud data. Then, Efficient Channel Attention (ECA) and Efficient Up-Convolution Block (EUCB) are introduced based on the PointPillars, which can enhance the ability of feature extraction for orchard objects. Finally, we establish an orchard object detection dataset and validate the proposed algorithm. The results show that, compared to the PointPillars, the AGO-PointPillars proposed in this study has an average detection accuracy improvement of 4.64% for typical orchard objects such as potholes and trees, which can prove the reliability of our algorithm. Full article

(This article belongs to the Section Agricultural Technology)

►▼ Show Figures

Figure 1

26 pages, 3369 KiB

Open AccessArticle

Zero-Day Threat Mitigation via Deep Learning in Cloud Environments

by Sebastian Ignacio Berrios Vasquez, Pamela Alejandra Hermosilla Monckton, Dante Ivan Leiva Muñoz and Hector Allende

Appl. Sci. 2025, 15(14), 7885; https://doi.org/10.3390/app15147885 - 15 Jul 2025

Abstract

The growing sophistication of cyber threats has increased the need for advanced detection techniques, particularly in cloud computing environments. Zero-day threats pose a critical risk due to their ability to bypass traditional security mechanisms. This study proposes a deep learning model called mixed vision transformer (MVT), which converts binary files into images and applies deep attention mechanisms for classification. The model was trained using the MaLeX dataset in a simulated Docker environment. It achieved an accuracy between 70% and 80%, with better performance in detecting malware compared with benign files. The proposed MVT approach not only demonstrates its potential to significantly enhance zero-day threat detection in cloud environments but also sets a foundation for robust and adaptive solutions to emerging cybersecurity challenges. Full article

(This article belongs to the Special Issue Machine Learning and AI Techniques for Intrusion Detection and Prevention)

►▼ Show Figures

Figure 1

21 pages, 1118 KiB

Open AccessReview

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia

AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025

Abstract

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article

►▼ Show Figures

Figure 1

32 pages, 8202 KiB

Open AccessArticle

A Machine Learning-Based Method for Lithology Identification of Outcrops Using TLS-Derived Spectral and Geometric Features

by Yanlin Shao, Peijin Li, Ran Jing, Yaxiong Shao, Lang Liu, Kunpeng Zhao, Binqing Gan, Xiaolei Duan and Longfan Li

Remote Sens. 2025, 17(14), 2434; https://doi.org/10.3390/rs17142434 - 14 Jul 2025

Viewed by 58

Abstract

Lithological identification of outcrops in complex geological settings plays a crucial role in hydrocarbon exploration and geological modeling. To address the limitations of traditional field surveys, such as low efficiency and high risk, we proposed an intelligent lithology recognition method, SG-RFGeo, for terrestrial laser scanning (TLS) outcrop point clouds, which integrates spectral and geometric features. The workflow involves several key steps. First, lithological recognition units are created through regular grid segmentation. From these units, spectral reflectance statistics (e.g., mean, standard deviation, kurtosis, and other related metrics), and geometric morphological features (e.g., surface variation rate, curvature, planarity, among others) are extracted. Next, a double-layer random forest model is employed for lithology identification. In the shallow layer, the Gini index is used to select relevant features for a coarse classification of vegetation, conglomerate, and mud–sandstone. The deep-layer module applies an optimized feature set to further classify thinly interbedded sandstone and mudstone. Geological prior knowledge, such as stratigraphic attitudes, is incorporated to spatially constrain and post-process the classification results, enhancing their geological plausibility. The method was tested on a TLS dataset from the Yueyawan outcrop of the Qingshuihe Formation, located on the southern margin of the Junggar Basin in China. Results demonstrate that the integration of spectral and geometric features significantly improves classification performance, with the Macro F1-score increasing from 0.65 (with single-feature input) to 0.82. Further, post-processing with stratigraphic constraints boosts the overall classification accuracy to 93%, outperforming SVM (59.2%), XGBoost (67.8%), and PointNet (75.3%). These findings demonstrate that integrating multi-source features and geological prior constraints effectively addresses the challenges of lithological identification in complex outcrops, providing a novel approach for high-precision geological modeling and exploration. Full article

(This article belongs to the Special Issue Remote Sensing Interpretation Systematic Engineering for Natural Resources Monitoring and Management (Second Edition))

►▼ Show Figures

Figure 1

25 pages, 12949 KiB

Open AccessArticle

Enhanced Landslide Visualization and Trace Identification Using LiDAR-Derived DEM

by Jie Lv, Chengzhuo Lu, Minjun Ye, Yuting Long, Wenbing Li and Minglong Yang

Sensors 2025, 25(14), 4391; https://doi.org/10.3390/s25144391 - 14 Jul 2025

Viewed by 171

Abstract

In response to the inability of traditional remote sensing technology to accurately capture the micro-topographic features of landslide surfaces in vegetated areas under complex terrain conditions, this paper proposes a method for enhanced landslide terrain display and trace recognition based on airborne LiDAR technology. Firstly, a high-precision LiDAR-DEM is constructed using preprocessed LiDAR point cloud data, and visual images are generated using visualization methods, including hillshade, slope, openness, and Sky View Factor (SVF). Secondly, pixel-level image fusion methods are applied to the visual images to obtain enhanced display images of the landslide terrain. Finally, a threshold is determined through a fractal model, and the Mean-Shift algorithm is utilized for clustering and denoising to extract landslide traces. The results indicate that employing pixel-level image fusion technology, which combines the advantageous features of multiple terrain visualization images, effectively enhances the display of landslide micro-topography. Moreover, based on the enhanced display images, the fractal model and the Mean-Shift algorithm are applied for denoising to extract landslide traces. Compared to orthophotos, this method can effectively and accurately extract landslide traces. The findings of this study provide valuable references for the enhanced display and trace recognition of landslide terrain in densely vegetated areas within complex mountainous areas, thereby providing technical support for emergency investigations of landslide disasters. Full article

(This article belongs to the Special Issue Sensor Fusion in Positioning and Navigation)

►▼ Show Figures

Figure 1

20 pages, 1753 KiB

Open AccessArticle

Hybrid Cloud-Based Information and Control System Using LSTM-DNN Neural Networks for Optimization of Metallurgical Production

by Kuldashbay Avazov, Jasur Sevinov, Barnokhon Temerbekova, Gulnora Bekimbetova, Ulugbek Mamanazarov, Akmalbek Abdusalomov and Young Im Cho

Processes 2025, 13(7), 2237; https://doi.org/10.3390/pr13072237 - 13 Jul 2025

Viewed by 274

Abstract

A methodology for detecting systematic errors in sets of equally accurate, uncorrelated, aggregate measurements is proposed and applied within the automatic real-time dispatch control system of a copper concentrator plant (CCP) to refine the technical and economic performance indicators (EPIs) computed by the system. This work addresses and solves the problem of selecting and obtaining reliable measurement data by exploiting the redundant measurements of process streams together with the balance equations linking those streams. This study formulates an approach for integrating cloud technologies, machine learning methods, and forecasting into information control systems (ICSs) via predictive analytics to optimize CCP production processes. A method for combining the hybrid cloud infrastructure with an LSTM-DNN neural network model has been developed, yielding a marked improvement in TEP for copper concentration operations. The forecasting accuracy for the key process parameters rose from 75% to 95%. Predictive control reduced energy consumption by 10% through more efficient resource use, while the copper losses to tailings fell by 15–20% thanks to optimized reagent dosing and the stabilization of the flotation process. Equipment failure prediction cut the amount of unplanned downtime by 30%. As a result, the control system became adaptive, automatically correcting the parameters in real time and lessening the reliance on operator decisions. The architectural model of an ICS for metallurgical production based on the hybrid cloud and the LSTM-DNN model was devised to enhance forecasting accuracy and optimize the EPIs of the CCP. The proposed model was experimentally evaluated against alternative neural network architectures (DNN, GRU, Transformer, and Hybrid_NN_TD_AIST). The results demonstrated the superiority of the LSTM-DNN in forecasting accuracy (92.4%), noise robustness (0.89), and a minimal root-mean-square error (RMSE = 0.079). The model shows a strong capability to handle multidimensional, non-stationary time series and to perform adaptive measurement correction in real time. Full article

(This article belongs to the Section AI-Enabled Process Engineering)

►▼ Show Figures

Figure 1

22 pages, 1013 KiB

Open AccessArticle

Leveraging Artificial Intelligence in Social Media Analysis: Enhancing Public Communication Through Data Science

by Sawsan Taha and Rania Abdel-Qader Abdallah

Journal. Media 2025, 6(3), 102; https://doi.org/10.3390/journalmedia6030102 - 12 Jul 2025

Viewed by 262

Abstract

This study examines the role of AI tools in improving public communication via social media analysis. It reviews five of the top platforms—Google Cloud Natural Language, IBM Watson NLU, Hootsuite Insights, Talkwalker Analytics, and Sprout Social—to determine their accuracy in detecting sentiment, predicting trends, optimally timing content, and enhancing messaging engagement. Adopting a structured model approach and Partial Least Squares Structural Equation Modeling (PLS-SEM) via SMART PLS, this research uses 500 influencer posts from five Arab countries. The results demonstrate the impactful relationships between AI tool functions and communication outcomes: the utilization of text analysis tools significantly improved public engagement (β = 0.62, p = 0.001), trend forecasting tools improved strategic planning decisions (β = 0.74, p < 0.001), and timing optimization tools enhanced message efficacy (β = 0.59, p = 0.004). Beyond the technical dimensions, the study addresses urgent ethical considerations by outlining a five-principle ethical governance model that encourages transparency, fairness, privacy, human oversee of technologies, and institutional accountability considering data bias, algorithmic opacity, and over-reliance on automated solutions. The research adds a multidimensional framework for propelling AI into digital public communication in culturally sensitive and linguistically diverse environments and provides a blueprint for improving AI integration. Full article

(This article belongs to the Special Issue Unravelling the Media’s Role in Technological Innovation and AI's Environmental, Social, and Economic Impacts)

►▼ Show Figures

Figure 1

20 pages, 10558 KiB

Open AccessArticle

Spatial–Spectral Feature Fusion and Spectral Reconstruction of Multispectral LiDAR Point Clouds by Attention Mechanism

by Guoqing Zhou, Haoxin Qi, Shuo Shi, Sifu Bi, Xingtao Tang and Wei Gong

Remote Sens. 2025, 17(14), 2411; https://doi.org/10.3390/rs17142411 - 12 Jul 2025

Viewed by 200

Abstract

High-quality multispectral LiDAR (MSL) data are crucial for land cover (LC) classification. However, the Titan MSL system encounters challenges of inconsistent spatial–spectral information due to its unique scanning and data saving method, restricting subsequent classification accuracy. Existing spectral reconstruction methods often require empirical parameter settings and involve high computational costs, limiting automation and complicating application. To address this problem, we introduce the dual attention spectral optimization reconstruction network (DossaNet), leveraging an attention mechanism and spatial–spectral information. DossaNet can adaptively adjust weight parameters, streamline the multispectral point cloud acquisition process, and integrate it into classification models end-to-end. The experimental results show the following: (1) DossaNet exhibits excellent generalizability, effectively recovering accurate LC spectra and improving classification accuracy. Metrics across the six classification models show some improvements. (2) Compared with the method lacking spectral reconstruction, DossaNet can improve the overall accuracy (OA) and average accuracy (AA) of PointNet++ and RandLA-Net by a maximum of 4.8%, 4.47%, 5.93%, and 2.32%. Compared with the inverse distance weighted (IDW) and k-nearest neighbor (KNN) approach, DossaNet can improve the OA and AA of PointNet++ and DGCNN by a maximum of 1.33%, 2.32%, 0.86%, and 2.08% (IDW) and 1.73%, 3.58%, 0.28%, and 2.93% (KNN). The findings further validate the effectiveness of our proposed method. This method provides a more efficient and simplified approach to enhancing the quality of multispectral point cloud data. Full article

(This article belongs to the Special Issue Advanced Lidar Remote Sensing for Atmosphere, Vegetation, and Ocean Observations)

►▼ Show Figures

Figure 1

18 pages, 4631 KiB

Open AccessArticle

Semantic Segmentation of Rice Fields in Sub-Meter Satellite Imagery Using an HRNet-CA-Enhanced DeepLabV3+ Framework

by Yifan Shao, Pan Pan, Hongxin Zhao, Jiale Li, Guoping Yu, Guomin Zhou and Jianhua Zhang

Remote Sens. 2025, 17(14), 2404; https://doi.org/10.3390/rs17142404 - 11 Jul 2025

Viewed by 263

Abstract

Accurate monitoring of rice-planting areas underpins food security and evidence-based farm management. Recent work has advanced along three complementary lines—multi-source data fusion (to mitigate cloud and spectral confusion), temporal feature extraction (to exploit phenology), and deep-network architecture optimization. However, even the best fusion- and time-series-based approaches still struggle to preserve fine spatial details in sub-meter scenes. Targeting this gap, we propose an HRNet-CA-enhanced DeepLabV3+ that retains the original model’s strengths while resolving its two key weaknesses: (i) detail loss caused by repeated down-sampling and feature-pyramid compression and (ii) boundary blurring due to insufficient multi-scale information fusion. The Xception backbone is replaced with a High-Resolution Network (HRNet) to maintain full-resolution feature streams through multi-resolution parallel convolutions and cross-scale interactions. A coordinate attention (CA) block is embedded in the decoder to strengthen spatially explicit context and sharpen class boundaries. The rice dataset consisted of 23,295 images (11,295 rice + 12,000 non-rice) via preprocessing and manual labeling and benchmarked the proposed model against classical segmentation networks. Our approach boosts boundary segmentation accuracy to 92.28% MIOU and raises texture-level discrimination to 95.93% F1, without extra inference latency. Although this study focuses on architecture optimization, the HRNet-CA backbone is readily compatible with future multi-source fusion and time-series modules, offering a unified path toward operational paddy mapping in fragmented sub-meter landscapes. Full article

(This article belongs to the Topic Advances in Smart Agriculture with Remote Sensing as the Core and Its Applications in Crops Field)

►▼ Show Figures

Figure 1

27 pages, 1889 KiB

Open AccessArticle

Advancing Smart City Sustainability Through Artificial Intelligence, Digital Twin and Blockchain Solutions

by Ivica Lukić, Mirko Köhler, Zdravko Krpić and Miljenko Švarcmajer

Technologies 2025, 13(7), 300; https://doi.org/10.3390/technologies13070300 - 11 Jul 2025

Viewed by 337

Abstract

This paper presents an integrated Smart City platform that combines digital twin technology, advanced machine learning, and a private blockchain network to enhance data-driven decision making and operational efficiency in both public enterprises and small and medium-sized enterprises (SMEs). The proposed cloud-based business intelligence model automates Extract, Transform, Load (ETL) processes, enables real-time analytics, and secures data integrity and transparency through blockchain-enabled audit trails. By implementing the proposed solution, Smart City and public service providers can significantly improve operational efficiency, including a 15% reduction in costs and a 12% decrease in fuel consumption for waste management, as well as increased citizen engagement and transparency in Smart City governance. The digital twin component facilitated scenario simulations and proactive resource management, while the participatory governance module empowered citizens through transparent, immutable records of proposals and voting. This study also discusses technical, organizational, and regulatory challenges, such as data integration, scalability, and privacy compliance. The results indicate that the proposed approach offers a scalable and sustainable model for Smart City transformation, fostering citizen trust, regulatory compliance, and measurable environmental and social benefits. Full article

(This article belongs to the Section Information and Communication Technologies)

►▼ Show Figures

Figure 1

21 pages, 12122 KiB

Open AccessArticle

RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision

by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong

Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025

Viewed by 131

Abstract

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article

(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 57.

Go to page 1 2 3 4 5

Search Results (2,831)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI