MDPI - Publisher of Open Access Journals

19 pages, 3492 KiB

Open AccessArticle

Deep Learning-Based Rooftop PV Detection and Techno Economic Feasibility for Sustainable Urban Energy Planning

by Ahmet Hamzaoğlu, Ali Erduman and Ali Kırçay

Sustainability 2025, 17(15), 6853; https://doi.org/10.3390/su17156853 - 28 Jul 2025

Viewed by 229

Accurate estimation of available rooftop areas for PV power generation at the city scale is critical for sustainable energy planning and policy development. In this study, using publicly available high-resolution satellite imagery, rooftop solar energy potential in urban, rural, and industrial areas is [...] Read more.

Accurate estimation of available rooftop areas for PV power generation at the city scale is critical for sustainable energy planning and policy development. In this study, using publicly available high-resolution satellite imagery, rooftop solar energy potential in urban, rural, and industrial areas is estimated using deep learning models. In order to identify roof areas, high-resolution open-source images were manually labeled, and the training dataset was trained with DeepLabv3+ architecture. The developed model performed roof area detection with high accuracy. Model outputs are integrated with a user-friendly interface for economic analysis such as cost, profitability, and amortization period. This interface automatically detects roof regions in the bird’s-eye -view images uploaded by users, calculates the total roof area, and classifies according to the potential of the area. The system, which is applied in 81 provinces of Turkey, provides sustainable energy projections such as PV installed capacity, installation cost, annual energy production, energy sales revenue, and amortization period depending on the panel type and region selection. This integrated system consists of a deep learning model that can extract the rooftop area with high accuracy and a user interface that automatically calculates all parameters related to PV installation for energy users. The results show that the DeepLabv3+ architecture and the Adam optimization algorithm provide superior performance in roof area estimation with accuracy between 67.21% and 99.27% and loss rates between 0.6% and 0.025%. Tests on 100 different regions yielded a maximum roof estimation accuracy IoU of 84.84% and an average of 77.11%. In the economic analysis, the amortization period reaches the lowest value of 4.5 years in high-density roof regions where polycrystalline panels are used, while this period increases up to 7.8 years for thin-film panels. In conclusion, this study presents an interactive user interface integrated with a deep learning model capable of high-accuracy rooftop area detection, enabling the assessment of sustainable PV energy potential at the city scale and easy economic analysis. This approach is a valuable tool for planning and decision support systems in the integration of renewable energy sources. Full article

(This article belongs to the Special Issue Urban and Regional Development Issues and Strategies: The Application of New Generation Information Technology)

► Show Figures

Figure 1

21 pages, 2443 KiB

Open AccessArticle

Lateralised Behavioural Responses of Chickens to a Threatening Human and a Novel Environment Indicate Fearful Emotions

by Amira A. Goma and Clive J. C. Phillips

Animals 2025, 15(14), 2023; https://doi.org/10.3390/ani15142023 - 9 Jul 2025

Viewed by 344

Abstract

The demeanour of a human during an interaction with an animal may influence the animal’s emotional response. We investigated whether the emotional responses of laying hens to a threatening or neutral human and a novel environment were lateralised, from which their emotional state [...] Read more.

The demeanour of a human during an interaction with an animal may influence the animal’s emotional response. We investigated whether the emotional responses of laying hens to a threatening or neutral human and a novel environment were lateralised, from which their emotional state can be inferred. Twenty-five DeKalb white laying hens reared in furnished cages under environmentally controlled conditions were individually assessed for their responses to these stimuli. They were contained in a box before emerging into an arena with a threatening human, who attempted direct eye contact with the bird and had their hands raised towards it, or a neutral person, who had no eye contact and sat with their hands on their knees. When initially placed in the box adjacent to the test arena, birds that remained in the box used their left eye more than their right eye, and they showed evidence of nervousness, with many head changes, neck stretching, and vocalisation. Birds showed lateralised behaviour in both the box and arena. Birds entering the arena with the threatening person used their left eye (connected to the right brain hemisphere) more than their right eye, usually with their body less vertical, and were more likely to be standing than sitting, compared with those viewing the neutral person. This confirms the bird’s interpretation of the person as threatening, with left eye/right brain hemisphere processing of flight or fight situations. We conclude that lateralised responses of chickens suggest that a threatening person is viewed more fearfully than a neutral person. However, further investigation is required with a larger sample of birds to strengthen these findings and enhance the generalisability of behavioural responses. Full article

(This article belongs to the Special Issue Welfare and Behavior of Laying Hens)

► Show Figures

Figure 1

21 pages, 997 KiB

Open AccessReview

Decoding Potential Co-Relation Between Endosphere Microbiome Community Composition and Mycotoxin Production in Forage Grasses

by Vijay Chandra Verma and Ioannis Karapanos

Agriculture 2025, 15(13), 1393; https://doi.org/10.3390/agriculture15131393 - 28 Jun 2025

Viewed by 315

Abstract

Cultivated pasture grasses contribute forage to more than 40% of cattle produced in 11 southern states in the USA. In recent years the increasing intoxication of cattle feeding on pasture grasses raised serious concerns about their palatability. While molecular and metagenomics techniques have [...] Read more.

Cultivated pasture grasses contribute forage to more than 40% of cattle produced in 11 southern states in the USA. In recent years the increasing intoxication of cattle feeding on pasture grasses raised serious concerns about their palatability. While molecular and metagenomics techniques have revealed the great diversity of microbial composition and functional richness of the grass endosphere microbiome, meta-sequencing techniques enable us to gain a bird’s-eye view of all plant-associated microbiomes as a ‘holobiont’. Plant holobionts provide a more comprehensive approach where one can define the functions of microbial communities and feedback between the core and satellite microbiomes of a targeted host. In the near future we will be able to tailor our grasses and their endosphere microbiomes through the host-directed selection of a ‘modular microbiome’, leading to ‘plant enhanced holobionts’ as a microbiome-driven solution to managing the intoxication of pasture grasses in livestock. The present review aims to understand the potential co-relation between the endosphere microbiome community composition and mycotoxin production in forage grasses in the southern United States. Full article

(This article belongs to the Topic Applications of Biotechnology in Food and Agriculture)

► Show Figures

Figure 1

18 pages, 1471 KiB

Open AccessArticle

LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection

by Qijun Feng, Chunyang Zhao, Pengfei Liu, Zhichao Zhang, Yue Jin and Wanglin Tian

Sensors 2025, 25(13), 4040; https://doi.org/10.3390/s25134040 - 28 Jun 2025

Viewed by 506

Abstract

This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. [...] Read more.

This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

14 pages, 2247 KiB

Open AccessArticle

Calibration-Free Roadside BEV Perception with V2X-Enabled Vehicle Position Assistance

by Wei Zhang, Yilin Gao, Zhiyuan Jiang, Ruiqing Mao and Sheng Zhou

Sensors 2025, 25(13), 3919; https://doi.org/10.3390/s25133919 - 24 Jun 2025

Viewed by 595

Abstract

Roadside bird’s eye view (BEV) perception can enhance the comprehensive environmental awareness required for autonomous driving systems. Current approaches typically concentrate on BEV perception from the perspective of the vehicle, requiring precise camera calibration or depth estimation, leading to potential inaccuracies. We introduce [...] Read more.

Roadside bird’s eye view (BEV) perception can enhance the comprehensive environmental awareness required for autonomous driving systems. Current approaches typically concentrate on BEV perception from the perspective of the vehicle, requiring precise camera calibration or depth estimation, leading to potential inaccuracies. We introduce a calibration-free roadside BEV perception architecture, which utilizes elevated roadside cameras in conjunction with the vehicle position transmitted via cellular vehicle-to-everything (C-V2X) independently of camera calibration parameters. To enhance robustness against practical issues such as V2X communication delay, packet loss, and positioning noise, we simulate real-world uncertainties by injecting random noise into the coordinate input and varying the proportion of vehicles providing location data. Experiments on the DAIR-V2X dataset demonstrate that the architecture achieves superior performance compared to calibration-based and calibration-free baselines, highlighting its effectiveness in roadside BEV perception. Full article

(This article belongs to the Topic Cloud and Edge Computing for Smart Devices)

► Show Figures

Figure 1

29 pages, 21063 KiB

Open AccessArticle

Perceiving Fifth Facade Colors in China’s Coastal Cities from a Remote Sensing Perspective: A New Understanding of Urban Image

by Yue Liu, Richen Ye, Wenlong Jing, Xiaoling Yin, Jia Sun, Qiquan Yang, Zhiwei Hou, Hongda Hu, Sijing Shu and Ji Yang

Remote Sens. 2025, 17(12), 2075; https://doi.org/10.3390/rs17122075 - 17 Jun 2025

Viewed by 510

Abstract

Urban color represents the visual skin of a city, embodying regional culture, historical memory, and the contemporary spirit. However, while the existing studies focus on pedestrian-level facade colors, the “fifth facade” from a bird’s-eye view has been largely overlooked. Moreover, color distortions in [...] Read more.

Urban color represents the visual skin of a city, embodying regional culture, historical memory, and the contemporary spirit. However, while the existing studies focus on pedestrian-level facade colors, the “fifth facade” from a bird’s-eye view has been largely overlooked. Moreover, color distortions in traditional remote sensing imagery hinder precise analysis. This study targeted 56 Chinese coastal cities, decoding the spatiotemporal patterns of their fifth facade color (FFC). Through developing an innovative natural color optimization algorithm, the oversaturation and color bias of Sentinel-2 imageries were addressed. Several color indicators, including dominant colors, hue–saturation–value, color richness, and color harmony, were developed to analyze the spatial variations of FFC. Results revealed that FFC in Chinese coastal cities is dominated by gray, black, and brown, reflecting the commonality of cement jungles. Among them, northern warm grays exude solidity, as in Weifang, while southern cool grays convey modern elegance, as in Shenzhen. Blue PVC rooftops (e.g., Tianjin) and red-brick villages (e.g., Quanzhou) serve as symbols of industrial function and cultural heritage. Economically advanced cities (e.g., Shanghai) lead in color richness, linking vitality to visual diversity, while high-harmony cities (e.g., Lianyungang) foster livability through coordinated colors. The study also warns of color pollution risks. Cities like Qingdao exposed planning imbalances through color clashes. This research pioneers a systematic and large-scale decoding of urban fifth facade color from a remote sensing perspective, quantitatively revealing the dilemma of “identical cities” in modernization development. The findings inject color rationality into urban planning and create readable and warm city images. Full article

(This article belongs to the Section Environmental Remote Sensing)

► Show Figures

Graphical abstract

39 pages, 2810 KiB

Open AccessReview

A Survey of Deep Learning-Driven 3D Object Detection: Sensor Modalities, Technical Architectures, and Applications

by Xiang Zhang, Hai Wang and Haoran Dong

Sensors 2025, 25(12), 3668; https://doi.org/10.3390/s25123668 - 11 Jun 2025

Viewed by 1787

Abstract

This review presents a comprehensive survey on deep learning-driven 3D object detection, focusing on the synergistic innovation between sensor modalities and technical architectures. Through a dual-axis “sensor modality–technical architecture” classification framework, it systematically analyzes detection methods based on RGB cameras, LiDAR, and multimodal [...] Read more.

This review presents a comprehensive survey on deep learning-driven 3D object detection, focusing on the synergistic innovation between sensor modalities and technical architectures. Through a dual-axis “sensor modality–technical architecture” classification framework, it systematically analyzes detection methods based on RGB cameras, LiDAR, and multimodal fusion. From the sensor perspective, the study reveals the evolutionary paths of monocular depth estimation optimization, LiDAR point cloud processing from voxel-based to pillar-based modeling, and three-level cross-modal fusion paradigms (data-level alignment, feature-level interaction, and result-level verification). Regarding technical architectures, the paper examines structured representation optimization in traditional convolutional networks, spatiotemporal modeling breakthroughs in bird’s-eye view (BEV) methods, voxel-level modeling advantages of occupancy networks for irregular objects, and dynamic scene understanding capabilities of temporal fusion architectures. The applications in autonomous driving and agricultural robotics are discussed, highlighting future directions including depth perception enhancement, open-scene modeling, and lightweight deployment to advance 3D perception systems toward higher accuracy and stronger generalization. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

18 pages, 3976 KiB

Open AccessProceeding Paper

Survey on Comprehensive Visual Perception Technology for Future Air–Ground Intelligent Transportation Vehicles in All Scenarios

by Guixin Ren, Fei Chen, Shichun Yang, Fan Zhou and Bin Xu

Eng. Proc. 2024, 80(1), 50; https://doi.org/10.3390/engproc2024080050 - 30 May 2025

Viewed by 455

Abstract

As an essential part of the low-altitude economy, low-altitude carriers are an important cornerstone of its development and a new industry that cannot be ignored strategically. However, it is difficult for the existing two-dimensional vehicle autonomous driving perception scheme to meet the needs [...] Read more.

As an essential part of the low-altitude economy, low-altitude carriers are an important cornerstone of its development and a new industry that cannot be ignored strategically. However, it is difficult for the existing two-dimensional vehicle autonomous driving perception scheme to meet the needs of general key technologies for all-scene perception such as the global high-precision map construction of low-altitude vehicles in a three-dimensional space, the perception identification of local environmental traffic participants, and the extraction of key visual information under extreme conditions. Therefore, it is urgent to explore the development and verification of all-scene universal sensing technology for low-altitude intelligent vehicles. In this paper, the literature on vision-based urban rail transit and general perception technology in low-altitude flight environment is studied, and the paper summarizes the research status and innovation points from five aspects, namely the environment perception algorithm based on visual SLAM, the environment perception algorithm based on BEV, the environment perception algorithm based on image enhancement, the performance optimization of the perception algorithm using cloud computing, and the rapid deployment of the perception algorithm using edge nodes, and puts forward the future optimization direction of this topic. Full article

(This article belongs to the Proceedings of 2nd International Conference on Green Aviation (ICGA 2024))

► Show Figures

Figure 1

19 pages, 3016 KiB

Open AccessArticle

Attention-Based LiDAR–Camera Fusion for 3D Object Detection in Autonomous Driving

by Zhibo Wang, Xiaoci Huang and Zhihao Hu

World Electr. Veh. J. 2025, 16(6), 306; https://doi.org/10.3390/wevj16060306 - 29 May 2025

Viewed by 1831

Abstract

In multi-vehicle traffic scenarios, achieving accurate environmental perception and motion trajectory tracking through LiDAR–camera fusion is critical for downstream vehicle planning and control tasks. To address the challenges of cross-modal feature interaction in LiDAR–image fusion and the low recognition efficiency/positioning accuracy of traffic [...] Read more.

In multi-vehicle traffic scenarios, achieving accurate environmental perception and motion trajectory tracking through LiDAR–camera fusion is critical for downstream vehicle planning and control tasks. To address the challenges of cross-modal feature interaction in LiDAR–image fusion and the low recognition efficiency/positioning accuracy of traffic participants in dense traffic flows, this study proposes an attention-based 3D object detection network integrating point cloud and image features. The algorithm adaptively fuses LiDAR geometric features and camera semantic features through channel-wise attention weighting, enhancing multi-modal feature representation by dynamically prioritizing informative channels. A center point detection architecture is further employed to regress 3D bounding boxes in bird’s-eye-view space, effectively resolving orientation ambiguities caused by sparse point distributions. Experimental validation on the nuScenes dataset demonstrates the model’s robustness in complex scenarios, achieving a mean Average Precision (mAP) of 64.5% and a 12.2% improvement over baseline methods. Real-vehicle deployment further confirms the fusion module’s effectiveness in enhancing detection stability under dynamic traffic conditions. Full article

(This article belongs to the Special Issue Electric Vehicle Autonomous Driving Based on Image Recognition)

► Show Figures

Figure 1

15 pages, 2213 KiB

Open AccessArticle

VirtualPainting: Addressing Sparsity with Virtual Points and Distance-Aware Data Augmentation for 3D Object Detection

by Sudip Dhakal, Deyuan Qu, Dominic Carrillo, Mohammad Dehghani Tezerjani and Qing Yang

Sensors 2025, 25(11), 3367; https://doi.org/10.3390/s25113367 - 27 May 2025

Viewed by 403

Abstract

In recent times, there has been a notable surge in multimodal approaches that decorate raw LiDAR point clouds with camera-derived features to improve object detection performance. However, we found that these methods still grapple with the inherent sparsity of LiDAR point cloud data, [...] Read more.

In recent times, there has been a notable surge in multimodal approaches that decorate raw LiDAR point clouds with camera-derived features to improve object detection performance. However, we found that these methods still grapple with the inherent sparsity of LiDAR point cloud data, primarily because fewer points are enriched with camera-derived features for sparsely distributed objects. We present an innovative approach that involves the generation of virtual LiDAR points using camera images and enhancing these virtual points with semantic labels obtained from image-based segmentation networks to tackle this issue and facilitate the detection of sparsely distributed objects, particularly those that are occluded or distant. Furthermore, we integrate a distance-aware data augmentation (DADA) technique to enhance the model’s capability to recognize these sparsely distributed objects by generating specialized training samples. Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods, resulting in significantly improved overall detection accuracy. Evaluation on the KITTI and nuScenes datasets demonstrates substantial enhancements in both 3D and bird’s eye view (BEV) detection benchmarks. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

19 pages, 14298 KiB

Open AccessArticle

BETAV: A Unified BEV-Transformer and Bézier Optimization Framework for Jointly Optimized End-to-End Autonomous Driving

by Rui Zhao, Ziguo Chen, Yuze Fan, Fei Gao and Yuzhuo Men

Sensors 2025, 25(11), 3336; https://doi.org/10.3390/s25113336 - 26 May 2025

Viewed by 771

Abstract

End-to-end autonomous driving demands precise perception, robust motion planning, and efficient trajectory generation to navigate complex and dynamic environments. This paper proposes BETAV, a novel framework that addresses the persistent challenges of low 3D perception accuracy and suboptimal trajectory smoothness in autonomous driving [...] Read more.

End-to-end autonomous driving demands precise perception, robust motion planning, and efficient trajectory generation to navigate complex and dynamic environments. This paper proposes BETAV, a novel framework that addresses the persistent challenges of low 3D perception accuracy and suboptimal trajectory smoothness in autonomous driving systems through unified BEV-Transformer encoding and Bézier-optimized planning. By leveraging Vision Transformers (ViTs), our approach encodes multi-view camera data into a Bird’s Eye View (BEV) representation using a transformer architecture, capturing both spatial and temporal features to enhance scene understanding comprehensively. For motion planning, a Bézier curve-based planning decoder is proposed, offering a compact, continuous, and parameterized trajectory representation that inherently ensures motion smoothness, kinematic feasibility, and computational efficiency. Additionally, this paper introduces a set of constraints tailored to address vehicle kinematics, obstacle avoidance, and directional alignment, further enhancing trajectory accuracy and safety. Experimental evaluations on Nuscences benchmark datasets and simulations demonstrate that our framework achieves state-of-the-art performance in trajectory prediction and planning tasks, exhibiting superior robustness and generalization across diverse and challenging Bench2Drive driving scenarios. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

15 pages, 3014 KiB

Open AccessArticle

Leveraging Bird Eye View Video and Multimodal Large Language Models for Real-Time Intersection Control and Reasoning

by Sari Masri, Huthaifa I. Ashqar and Mohammed Elhenawy

Safety 2025, 11(2), 40; https://doi.org/10.3390/safety11020040 - 7 May 2025

Viewed by 1210

Abstract

Managing traffic flow through urban intersections is challenging. Conflicts involving a mix of different vehicles with blind spots makes it relatively vulnerable for crashes to happen. This paper presents a new framework based on a fine-tuned Multimodal Large Language Model (MLLM), GPT-4o, that [...] Read more.

Managing traffic flow through urban intersections is challenging. Conflicts involving a mix of different vehicles with blind spots makes it relatively vulnerable for crashes to happen. This paper presents a new framework based on a fine-tuned Multimodal Large Language Model (MLLM), GPT-4o, that can control intersections using bird eye view videos taken by drones in real-time. This fine-tuned GPT-4o model is used to logically and visually reason traffic conflicts and provide instructions to the drivers, which aids in creating a safer and more efficient traffic flow. To fine-tune and evaluate the model, we labeled a dataset that includes three-month drone videos, and their corresponding trajectories recorded in Dresden, Germany, at a 4-way intersection. Preliminary results showed that the fine-tuned GPT-4o achieved an accuracy of about 77%, outperforming zero-shot baselines. However, using continuous video-frame sequences, the model performance increased to about 89% on a time serialized dataset and about 90% on an unbalanced real-world dataset, respectively. This proves the model’s robustness in different conditions. Furthermore, manual evaluation by experts includes scoring the usefulness of the predicted explanations and recommendations by the model. The model surpassed on average rating of 8.99 out of 10 for explanations, and 9.23 out of 10 for recommendations. The results demonstrate the advantages of combining MLLMs with structured prompts and temporal information for conflict detection. These results offer a flexible and robust prototype framework to improve the safety and effectiveness of uncontrolled intersections. The code and labeled dataset used in this study are publicly available (see Data Availability Statement). Full article

► Show Figures

Figure 1

17 pages, 1557 KiB

Open AccessArticle

MultiDistiller: Efficient Multimodal 3D Detection via Knowledge Distillation for Drones and Autonomous Vehicles

by Binghui Yang, Tao Tao, Wenfei Wu, Yongjun Zhang, Xiuyuan Meng and Jianfeng Yang

Drones 2025, 9(5), 322; https://doi.org/10.3390/drones9050322 - 22 Apr 2025

Viewed by 663

Abstract

Real-time 3D object detection is a cornerstone for the safe operation of drones and autonomous vehicles (AVs)—drones must avoid millimeter-scale power lines in cluttered airspace, while AVs require instantaneous recognition of pedestrians and vehicles in dynamic urban environments. Although significant progress has been [...] Read more.

Real-time 3D object detection is a cornerstone for the safe operation of drones and autonomous vehicles (AVs)—drones must avoid millimeter-scale power lines in cluttered airspace, while AVs require instantaneous recognition of pedestrians and vehicles in dynamic urban environments. Although significant progress has been made in detection methods based on point clouds, cameras, and multimodal fusion, the computational complexity of existing high-precision models struggles to meet the real-time requirements of vehicular edge devices. Additionally, during the model lightweighting process, issues such as multimodal feature coupling failure and the imbalance between classification and localization performance often arise. To address these challenges, this paper proposes a knowledge distillation framework for multimodal 3D object detection, incorporating attention guidance, rank-aware learning, and interactive feature supervision to achieve efficient model compression and performance optimization. Specifically: To enhance the student model’s ability to focus on key channel and spatial features, we introduce attention-guided feature distillation, leveraging a bird’s-eye view foreground mask and a dual-attention mechanism. To mitigate the degradation of classification performance when transitioning from two-stage to single-stage detectors, we propose ranking-aware category distillation by modeling anchor-level distribution. To address the insufficient cross-modal feature extraction capability, we enhance the student network’s image features using the teacher network’s point cloud spatial priors, thereby constructing a LiDAR-image cross-modal feature alignment mechanism. Experimental results demonstrate the effectiveness of the proposed approach in multimodal 3D object detection. On the KITTI dataset, our method improves network performance by 4.89% even after reducing the number of channels by half. Full article

(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)

► Show Figures

Figure 1

19 pages, 36390 KiB

Open AccessArticle

TerrAInav Sim: An Open-Source Simulation of UAV Aerial Imaging from Map-Based Data

by Seyedeh Parisa Dajkhosh, Peter M. Le, Orges Furxhi and Eddie L. Jacobs

Remote Sens. 2025, 17(8), 1454; https://doi.org/10.3390/rs17081454 - 18 Apr 2025

Viewed by 893

Abstract

Capturing real-world aerial images for vision-based navigation (VBN) is challenging due to limited availability and conditions that make it nearly impossible to access all desired images from any location. The complexity increases when multiple locations are involved. State-of-the-art solutions, such as deploying UAVs [...] Read more.

Capturing real-world aerial images for vision-based navigation (VBN) is challenging due to limited availability and conditions that make it nearly impossible to access all desired images from any location. The complexity increases when multiple locations are involved. State-of-the-art solutions, such as deploying UAVs (unmanned aerial vehicles) for aerial imaging or relying on existing research databases, come with significant limitations. TerrAInav Sim offers a compelling alternative by simulating a UAV to capture bird’s-eye view map-based images at zero yaw with real-world visible-band specifications. This open-source tool allows users to specify the bounding box (top-left and bottom-right) coordinates of any region on a map. Without the need to physically fly a drone, the virtual Python UAV performs a raster search to capture images. Users can define parameters such as the flight altitude, aspect ratio, diagonal field of view of the camera, and the overlap between consecutive images. TerrAInav Sim’s capabilities range from capturing a few low-altitude images for basic applications to generating extensive datasets of entire cities for complex tasks like deep learning. This versatility makes TerrAInav a valuable tool for not only VBN but also other applications, including environmental monitoring, construction, and city management. The open-source nature of the tool also allows for the extension of the raster search to other missions. A dataset of Memphis, TN, has been provided along with this simulator. A supplementary dataset is also provided, which includes data from a 3D world generation package for comparison. Full article

► Show Figures

Graphical abstract

21 pages, 31401 KiB

Open AccessArticle

BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds

by Daniel Ayo Oladele, Elisha Didam Markus and Adnan M. Abu-Mahfouz

AI 2025, 6(4), 82; https://doi.org/10.3390/ai6040082 - 18 Apr 2025

Viewed by 2353

Abstract

Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain [...] Read more.

Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP). Full article

(This article belongs to the Section AI in Autonomous Systems)

► Show Figures

Figure 1

Search Results (200)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (200)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI