Author Contributions
X.C.: conceptualization, methodology, investigation, visualization, writing—original draft preparation; N.J.: software, data curation, validation, investigation; Y.L.: characterization, resources, investigation; G.C.: formal analysis, resources; Z.L.: visualization, data curation; Z.Y.: supervision, funding acquisition; Q.Z.: supervision, writing—review and editing, software; R.Z.: project administration, methodology, supervision. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The Cityscapes dataset for autonomous driving semantic segmentation. Distinct regions are delineated by varying colors, each signifying unique object semantic information.
Figure 1.
The Cityscapes dataset for autonomous driving semantic segmentation. Distinct regions are delineated by varying colors, each signifying unique object semantic information.
Figure 2.
In the DAFormer work, the GTA5 dataset is employed as the source domain, while the Cityscapes dataset is the target domain. Our approach involves utilizing the Cityscapes dataset as the source domain and the ACDC dataset as the target domain and crafting the specialized ShuffleFormer network tailored for autonomous driving scenarios.
Figure 2.
In the DAFormer work, the GTA5 dataset is employed as the source domain, while the Cityscapes dataset is the target domain. Our approach involves utilizing the Cityscapes dataset as the source domain and the ACDC dataset as the target domain and crafting the specialized ShuffleFormer network tailored for autonomous driving scenarios.
Figure 3.
The Student Net is employed for supervised learning within the source domain, while the Teacher Net serves the purpose of unsupervised learning in both the intermediate and target domains, thereby generating distinct loss functions for each.
Figure 3.
The Student Net is employed for supervised learning within the source domain, while the Teacher Net serves the purpose of unsupervised learning in both the intermediate and target domains, thereby generating distinct loss functions for each.
Figure 4.
The black block diagram and lines represent the original method structure of DAFormer, whereas the red block diagram and lines depict our novel approach. Here, the reference images serve as an intermediary domain, contributing to the generation of the loss function.
Figure 4.
The black block diagram and lines represent the original method structure of DAFormer, whereas the red block diagram and lines depict our novel approach. Here, the reference images serve as an intermediary domain, contributing to the generation of the loss function.
Figure 5.
(
a) DAFormer decoder network structure (image quoted from [
20]). (
b) ShuffleFormer decoder network structure, which is a light-weight operation of semantic segmentation decoder.
Figure 5.
(
a) DAFormer decoder network structure (image quoted from [
20]). (
b) ShuffleFormer decoder network structure, which is a light-weight operation of semantic segmentation decoder.
Figure 6.
Comparing standard convolution and group convolution, it is notable that group convolution entails fewer network parameters when the input and output remain consistent.
Figure 6.
Comparing standard convolution and group convolution, it is notable that group convolution entails fewer network parameters when the input and output remain consistent.
Figure 7.
The comprehensive structure diagram of our semantic segmentation network showcases the use of MiT-B5 architecture for the encoder network, complemented by the ShuffleFormer network structure for the decoder network.
Figure 7.
The comprehensive structure diagram of our semantic segmentation network showcases the use of MiT-B5 architecture for the encoder network, complemented by the ShuffleFormer network structure for the decoder network.
Figure 8.
Fourier-transformed representations of scenes under different atmospheric conditions: clear sky, nighttime, and foggy weather.
Figure 8.
Fourier-transformed representations of scenes under different atmospheric conditions: clear sky, nighttime, and foggy weather.
Figure 9.
Upon applying the FEDS algorithm, the source domain image acquires the frequency domain characteristics of the target domain image.
Figure 9.
Upon applying the FEDS algorithm, the source domain image acquires the frequency domain characteristics of the target domain image.
Figure 10.
Schematic diagram of frequency domain sampling of FEDS algorithm: Operation 1 uses low-frequency information in the source domain, operation 2 uses low-frequency information in the target domain, and operation 3 uses high-frequency information in the target domain.
Figure 10.
Schematic diagram of frequency domain sampling of FEDS algorithm: Operation 1 uses low-frequency information in the source domain, operation 2 uses low-frequency information in the target domain, and operation 3 uses high-frequency information in the target domain.
Figure 11.
Three possible cases of the FEDS algorithm: (a) the network training process is normal, (b) the source domain training process is underfitting, (c) the network training process is normal, and high-frequency signals in the target domain are sampled.
Figure 11.
Three possible cases of the FEDS algorithm: (a) the network training process is normal, (b) the source domain training process is underfitting, (c) the network training process is normal, and high-frequency signals in the target domain are sampled.
Figure 12.
(a) ACDC night image, (b) ACDC reference image; reference image taken at different times but in the same location, which can serve as an intermediate domain.
Figure 12.
(a) ACDC night image, (b) ACDC reference image; reference image taken at different times but in the same location, which can serve as an intermediate domain.
Figure 13.
Comparing the gray-level co-occurrence matrix images between daytime and nighttime, as well as foggy, conditions reveals notable discrepancies in standard deviation and entropy values.
Figure 13.
Comparing the gray-level co-occurrence matrix images between daytime and nighttime, as well as foggy, conditions reveals notable discrepancies in standard deviation and entropy values.
Figure 14.
The reference image serves as the intermediary domain, primarily focusing on extracting the most dissimilar region between this intermediary domain and the target domain. Subsequently, this extracted region is assigned a higher weight during the training process.
Figure 14.
The reference image serves as the intermediary domain, primarily focusing on extracting the most dissimilar region between this intermediary domain and the target domain. Subsequently, this extracted region is assigned a higher weight during the training process.
Figure 15.
Qualitative results on ACDC night. The yellow rectangular boxes represent the regions where our method has better segmentation performance than the mainstream semantic segmentation methods. Our method improves the segmentation accuracy for both small objects and background.
Figure 15.
Qualitative results on ACDC night. The yellow rectangular boxes represent the regions where our method has better segmentation performance than the mainstream semantic segmentation methods. Our method improves the segmentation accuracy for both small objects and background.
Figure 16.
Qualitative results on ACDC fog. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods. Our method has a certain improvement in segmentation accuracy for color patches with large areas.
Figure 16.
Qualitative results on ACDC fog. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods. Our method has a certain improvement in segmentation accuracy for color patches with large areas.
Figure 17.
Qualitative results on ACDC rain. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods.
Figure 17.
Qualitative results on ACDC rain. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods.
Figure 18.
Qualitative results on ACDC snow. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods.
Figure 18.
Qualitative results on ACDC snow. The yellow rectangular boxes represent the regions where our method has better segmentation performance compared to the mainstream semantic segmentation methods.
Table 1.
Comparison of ShuffleFormer and mainstream decoder network performance (ACDC night).
Table 1.
Comparison of ShuffleFormer and mainstream decoder network performance (ACDC night).
Encoder | Decoder | Params (M) | mIoU |
---|
ResNet-101 | SegFormer768 | 3.13 | 38.13 |
ResNet-101 | UperNet256 | 29.64 | 38.26 |
ResNet-101 | UperNet512 | 8.33 | 39.14 |
ResNet-101 | DAFormer aspp | 9.97 | 38.27 |
ResNet-101 | DAFormer sesapp | 3.71 | 39.89 |
ResNet-101 | ShuffleFormer (Ours) | 2.68 | 40.65 |
MiT-B5 | SegFormer768 | 3.13 | 41.87 |
MiT-B5 | UperNet256 | 8.33 | 42.46 |
MiT-B5 | UperNet512 | 29.64 | 43.73 |
MiT-B5 | DAFormer aspp | 9.97 | 42.97 |
MiT-B5 | DAFormer sesapp | 3.71 | 44.27 |
MiT-B5 | ShuffleFormer (Ours) | 2.68 | 44.34 |
Table 2.
Comparison of ACDC night experimental results. Ref: The reference image is used as an intermediate domain for training. Region Proposal: This involves focusing the training process on candidate regions within the intermediate domain image. FEDS: This abbreviation stands for the Fourier Index Decreasing Algorithm, which is used to enhance our method.
Table 2.
Comparison of ACDC night experimental results. Ref: The reference image is used as an intermediate domain for training. Region Proposal: This involves focusing the training process on candidate regions within the intermediate domain image. FEDS: This abbreviation stands for the Fourier Index Decreasing Algorithm, which is used to enhance our method.
Encoder | Ref | Region Proposal | FEDS | mIoU |
---|
ResNet-101 | | | | 40.65 |
ResNet-101 | √ | | | 41.66 |
ResNet-101 | √ | √ | | 44.27 |
ResNet-101 | √ | √ | √ | 45.47 |
MiT-B5 | | | | 44.34 |
MiT-B5 | √ | | | 46.68 |
MiT-B5 | √ | √ | | 48.80 |
MiT-B5 | √ | √ | √ | 51.11 |
Table 3.
Comparison of ACDC fog experimental results. Ref: The reference image is used as an intermediate domain for training. Region Proposal: This involves focusing the training process on candidate regions within the intermediate domain image. FEDS: This abbreviation stands for the Fourier Index Decreasing Algorithm, which is used to enhance our method.
Table 3.
Comparison of ACDC fog experimental results. Ref: The reference image is used as an intermediate domain for training. Region Proposal: This involves focusing the training process on candidate regions within the intermediate domain image. FEDS: This abbreviation stands for the Fourier Index Decreasing Algorithm, which is used to enhance our method.
Encoder | Ref | Region Proposal | FEDS | mIoU |
---|
ResNet-101 | | | | 39.81 |
ResNet-101 | √ | | | 43.36 |
ResNet-101 | √ | √ | | 44.61 |
ResNet-101 | √ | √ | √ | 45.85 |
MiT-B5 | | | | 50.51 |
MiT-B5 | √ | | | 53.71 |
MiT-B5 | √ | √ | | 55.08 |
MiT-B5 | √ | √ | √ | 55.85 |
Table 4.
Comparison of ACDC rain experimental results.
Table 4.
Comparison of ACDC rain experimental results.
Encoder | Ref | Region Proposal | FEDS | mIoU |
---|
ResNet-101 | | | | 45.54 |
ResNet-101 | √ | | | 47.36 |
ResNet-101 | √ | √ | | 49.21 |
ResNet-101 | √ | √ | √ | 51.77 |
MiT-B5 | | | | 56.32 |
MiT-B5 | √ | | | 58.96 |
MiT-B5 | √ | √ | | 59.99 |
MiT-B5 | √ | √ | √ | 62.68 |
Table 5.
Comparison of ACDC snow experimental results.
Table 5.
Comparison of ACDC snow experimental results.
Encoder | Ref | Region Proposal | FEDS | mIoU |
---|
ResNet-101 | | | | 44.27 |
ResNet-101 | √ | | | 45.36 |
ResNet-101 | √ | √ | | 47.61 |
ResNet-101 | √ | √ | √ | 49.28 |
MiT-B5 | | | | 50.64 |
MiT-B5 | √ | | | 52.91 |
MiT-B5 | √ | √ | | 54.96 |
MiT-B5 | √ | √ | √ | 56.57 |