Abstract
In current road segmentation tasks, high-frequency details of roads (such as road edges and pavement textures) tend to become blurred or even lost during feature extraction due to progressive downsampling, leading to imprecise segmentation boundaries. Moreover, existing fusion methods predominantly rely on simple concatenation or summation operations, which struggle to adaptively integrate the rich texture information from RGB modality with the geometric structural information from Depth modality, thereby limiting fusion efficiency. To address these issues, this paper proposes an innovative model. We design a Cross-scale Wavelet Enhancement Module (CWEM) to compensate for the shortcomings of traditional networks in frequency domain analysis, explicitly enhancing the representation capability of edge and texture features. Simultaneously, a Gated Cross-Modality Fusion module (GCMF) is constructed to achieve adaptive and efficient fusion between RGB and Depth features. Additionally, to tackle the high false detection rates and confusion between sidewalks and opposite lanes in existing methods, this paper optimizes the loss function to further improve the model’s discriminative ability in complex scenarios. Experiments on the public KITTI_Road dataset demonstrate that the proposed method achieves a segmentation accuracy of 97.31% while maintaining a real-time inference speed of 34 FPS, with particularly outstanding performance in road edge integrity and shadow area handling.