R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset
Abstract
1. Introduction
1.1. Background
1.2. Related Applications
1.3. Research Objectives
2. Methodology
2.1. Dataset Preparation
2.1.1. Overview of Study Region
2.1.2. Data Source and Labeling
2.1.3. Dataset Outline
2.2. Network Architecture
2.2.1. General Structure
2.2.2. The Structure of Swin Transformer Module
2.2.3. The Structure of Atrous Spatial Pyramid Pooling
2.2.4. The Structure of CAM-Residual Block
2.3. Loss Function
2.3.1. Focal Loss
2.3.2. Tversky Loss
2.3.3. Dynamic Hybrid Loss
3. Experiment and Results
3.1. Optimization and Learning Rate Management
3.2. Evaluation Metrics
3.3. Comparative Experiment
3.3.1. Experiment Design
3.3.2. Experiment Results
4. Discussion
4.1. Necessity of Transformer-CNN Conjunction
4.2. The Gains and Trade-Offs of DSC Module
4.3. Labeling Noise and Training Ambiguities
5. Conclusions
5.1. Context-Aware Hybrid Architecture: Advancing Rural Road Segmentation
5.2. SQVillages Dataset: Enabling Robust Model Generalization
- (1)
- Multi-source labeling (cross-validating satellite imagery with sub-meter aerial orthomosaics to reduce spectral confusion errors);
- (2)
- Village-level stratified splitting (isolating validation/test sets by village to avoid data leakage). This dataset ensures model generalization to unseen rural scenes, as evidenced by R-SWTNet’s consistent performance across training and validation sets.
5.3. Limitations and Future Work
- (1)
- Expanding the SQVillages dataset to incorporate villages from western China’s mountainous regions to test model generalization further;
- (2)
- Exploring more advanced model compression and quantization techniques to enhance deployment feasibility without significant accuracy loss;
- (3)
- Investigating semi-supervised learning paradigms to reduce the heavy reliance on costly pixel-wise annotations.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| DSC | Depthwise Separable Convolution |
| FCN | Fully Convolutional Network |
| ASPP | Atrous Spatial Pyramid Pooling |
| IoU | Intersection over Union |
| W-MSA | Window Multi-head Self-Attention |
| SW-MSA | Shifted Window Multi-head Self-Attention |
| 1 | The official API of Tianditu can be found at http://lbs.tianditu.gov.cn/server/MapService.html (accessed from 14 June to 7 August 2025). The API is accessed using Python 3.6 script. |
References
- Zhang, H.; Dong, W.; Fang, X. Road construction and rural household income: Empirical evidence from village road paving in China. Financ. Res. Lett. 2023, 51, 103460. [Google Scholar] [CrossRef]
- World Bank Group. Measuring Rural Access: Using New Technologies; World Bank Group: Washington, DC, USA, 2016; Available online: https://openknowledge.worldbank.org/entities/publication/ba2e6b4d-ea2e-58f0-b54e-326c902169ba (accessed on 30 August 2025).
- Tian, Z.; Xin, Y.; Lin, Y. Do roads help rural populations escape poverty? new evidence from Chinese survey data. Appl. Econ. 2025, 1–14. [Google Scholar] [CrossRef]
- Morán Uriel, J.; Camerin, F.; Córdoba Hernández, R. Urban Horizons in China: Challenges and Opportunities for Community Intervention in a Country Marked by the Heihe-Tengchong Line. In Diversity as Catalyst: Economic Growth and Urban Resilience in Global Cityscapes; Urban Sustainability; Siew, G., Allam, Z., Cheshmehzangi, A., Eds.; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
- National Bureau of Statistics of China; Wang, P.P. Total Population Decline Narrowed, Population Quality Continued to Improve Report; National Bureau of Statistics of China: Beijing, China, 2025. Available online: https://www.stats.gov.cn/xxgk/jd/sjjd2020/202501/t20250117_1958337.html (accessed on 26 August 2025). (In Chinese)
- The Central Committee of the Communist Party of China (CPC); The State Council. Rural Revitalization Strategy Plan (2018–2022) Report; The State Council of the People’s Republic of China: Beijing, China, 2018. Available online: http://www.gov.cn/zhengce/2018-09/26/content_5325534.htm (accessed on 11 February 2021). (In Chinese)
- The Central Committee of the Communist Party of China (CPC); The State Council. Comprehensive Rural Revitalization Plan (2024–2027) Report; The State Council of the People’s Republic of China: Beijing, China, 2025. Available online: https://www.gov.cn/zhengce/202501/content_7000493.htm (accessed on 2 March 2025). (In Chinese)
- Ministry of Transport of the People’s Republic of China. 2024 Statistical Bulletin of the Transportation Industry; Ministry of Transport: Beijing, China, 2024. Available online: https://xxgk.mot.gov.cn/2020/jigou/zhghs/202506/t20250610_4170228.html (accessed on 7 August 2025). (In Chinese)
- Ministry of Transport of the People’s Republic of China. 2014 Statistical Bulletin of the Transportation Industry; Ministry of Transport: Beijing, China, 2014. Available online: https://www.gov.cn/xinwen/2015-04/30/content_2855735.htm (accessed on 7 August 2025). (In Chinese)
- Cai, H.; Yuan, S.; Yang, K.; Wang, F.; Sheng, G. Method for Rural Road Verification and Its Application Based on Chinese High Resolution Remote Sensing Image. Bull. Surv. Mapp. 2020, 03, 91–95, (In Chinese with English Abstract). [Google Scholar] [CrossRef]
- He, H.; Fan, J.; Chen, W.; Zhou, Y.; Zhang, P.; Yu, X. Extraction of Shaded Roads in High-Resolution Remote Sensing Imagery based on Brightness Compensation. J. Geo-Inf. Sci. 2020, 22, 258–267, (In Chinese with English Abstract). [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Paisitkriangkrai, S.; Sherrah, J.; Janney, P.; Van-Den Hengel, A. Effective semantic pixel labelling with convolutional networks and Conditional Random Fields. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 36–43. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Aoki, Y.; Saito, S. Building and road detection from large aerial imagery. In Proceedings of the SPIE—Image Processing: Machine Vision Applications VIII, San Francisco, CA, USA, 27 February 2015; SPIE: Bellingham, WA, USA, 2015; Volume 9405, p. 94050K. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Sherrah, J. Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery. arXiv 2016, arXiv:1606.02585. [Google Scholar] [CrossRef]
- Noh, H.; Hong, S.; Han, B. Learning Deconvolution Network for Semantic Segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Folgoc, L.I.L.; Lee, M.J.; Heinrich, M.P.; Misawa, K.; Mori, K.; McDonagh, S.G.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P., Belagiannis, V., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
- Wang, R.; Pan, F.; An, Q.; Diao, Q.; Feng, X. Aerial Unstructured Road Segmentation Based on Deep Convolution Neural Network. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8494–8500. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems; Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Trans-former using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In Proceedings of the ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Ge, C.; Nie, Y.; Kong, F.; Xu, X. Improving Road Extraction for Autonomous Driving Using Swin Transformer Unet. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 1216–1221. [Google Scholar]
- Sun, Y.; Gu, X.; Zhou, X.; Yang, J.; Shen, W.; Cheng, Y.; Zhang, J.M.; Chen, Y. DPIF-Net: A dual path network for rural road extraction based on the fusion of global and local information. PeerJ Comput. Sci. 2024, 10, e2079. [Google Scholar] [CrossRef] [PubMed]
- Lyu, S.; Li, J.; A, X.; Yang, C.; Yang, R.; Shang, X. Res_ASPP_UNet++: Building an extraction network from remote sensing imagery combining depthwise separable convolution with atrous spatial pyramid pooling. Natl. Remote Sens. Bull. 2023, 27, 502–519. (In Chinese) [Google Scholar] [CrossRef]
- Sloan, S.; Talkhani, R.R.; Huang, T.; Engert, J.; Laurance, W.F. Mapping Remote Roads Using Artificial Intelligence and Satellite Imagery. Remote Sens. 2024, 16, 839. [Google Scholar] [CrossRef]
- Yang, N.; Di, W.; Wang, Q.; Liu, W.; Feng, T.; Tian, X. Rural Road Extraction in Xiong’an New Area of China Based on the RC-MSFNet Network Model. Sensors 2024, 24, 6672. [Google Scholar] [CrossRef]
- Chen, Z.; Yuan, F.; Zhang, J.; Shen, S.; Li, X.; Li, X.; Huang, M.; Jowitt, S.M. Paleomagnetic evidence for the Gothenburg geomagnetic excursion during the Pleistocene–Holocene transition recorded in the Paleo-Danyang Lake, eastern China. J. Asian Earth Sci. 2020, 201, 104140. [Google Scholar] [CrossRef]
- Hu, X.; Wu, L.; Zhuang, Y.; Wang, X.; Ma, C.; Li, L.; Guan, H.; Lu, S.; Luo, W.; Xu, Z. Evolution of the historical polder landscape in the ancient Danyang wetland, lower Yangtze River, China, during the last 3000 years. J. Geogr. Sci. 2024, 34, 2053–2073. [Google Scholar] [CrossRef]
- Li, M.; Rui, Y.; Wang, C.X. Spatial Distribution and Influencing Factors of Traditional Villages: A Case Study of the Wuyue Cultural Region. Resour. Environ. Yangtze Basin 2018, 27, 1693–1702. Available online: https://yangtzebasin.whlib.ac.cn/CN/Y2018/V27/I08/1693 (accessed on 30 August 2025). (In Chinese).
- Dong, G.; Mou, X.; Zhang, H.; Li, R.; Wu, H.; Jiang, J.; Li, F.; Yu, W. Browsing target extraction and spatiotemporal preference mining from the complex virtual trajectories. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103819. [Google Scholar] [CrossRef]
- Kang, T. Administrative Village: Nationalization Governance of Rural Society. In Handbook of Essential Keywords for Understanding Rural China; Jinhai, L., Ed.; Springer Nature: Singapore, 2024; pp. 83–101. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhang, H.; Wu, R.; Chen, H.; Cui, W.; Wu, J.; Yang, J. Symbiosis of the “Old” and the “New” in the Renovation of Existing Rural Buildings: Taking the Practice of Residential Renovation in Gaogang Village, Nanjing City as an Example. Urban Des. 2024, 2024, 54–61. Available online: https://d.wanfangdata.com.cn/periodical/qk_f3eb0e481c6441a6a4e4642f4aebfe4a (accessed on 30 August 2025). (In Chinese with English Abstract).
- Yuan, M.; Zhang, H. The Practical Case of Gaogang in Rural Revitalization. World Archit. 2021, 43–47+127, (In Chinese with English Abstract). [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks. Lect. Notes Comput. Sci. 2017, 10541, 379–387. [Google Scholar] [CrossRef]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice Loss for Data-imbalanced NLP Tasks. arXiv 2019, arXiv:1911.02855. [Google Scholar]
- Loshchilov, I.; Hutter, F. Fixing Weight Decay Regularization in Adam. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Goyal, P.; Dollár, P.; Girshick, R.B.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar] [CrossRef]
- Huang, G.; Li, Y.; Pleiss, G.; Liu, Z.; Hopcroft, J.E.; Weinberger, K.Q. Snapshot Ensembles: Train 1, get M for free. arXiv 2017, arXiv:1704.00109. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Flat Minima. Neural Comput. 1997, 9, 1–42. [Google Scholar] [CrossRef]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 192–1924. [Google Scholar] [CrossRef]










| Name | Abbr. | Location | Natural Landscape | Honors and History | Construction and Development | Foreground Ratio of the 3K image (%) |
|---|---|---|---|---|---|---|
| Shenghong Village | SH | Xinfeng Country, Huangshan District, Huangshan, Anhui Province | Hill-side | Third Batch of Chinese Traditional Village | - | 3.00 |
| Yaduan Village | YD | Jishan Town, Nanling County, Wuhu, Anhui Province | Plain, River-side | - | - | 1.75 |
| Matou Village | MT | Qinxi Town, Jing County, Xuancheng, Anhui Province | Hill-side, River-side | Fourth Batch of Chinese Traditional Village | - | 2.53 |
| Huangtian Village | HT | Langqiao Town, Jing County, Xuancheng, Anhui Province | Hill-side | First Batch of Chinese Traditional Village | - | 2.62 |
| Badushu Village | BDS | Yangliu Town, Wanzhi District, Wuhu, Anhui Province | Hill-side | - | - | 0.91 |
| Suda Village | SD | Wanzhi Town, Wanzhi District, Wuhu, Anhui Province | Hill-side | Third Batch of Anhui Traditional Village | - | 3.51 |
| Qianjia Village | QJ | Yangjiang Town, Gaochun District, Nanjing, Jiangsu Province | Polder | - | - | 3.26 |
| Shuibiqiao Village | SBQ | Zhuanqiang Town, Gaochun District, Nanjing, Jiangsu Province | Polder Dyke, River-side | - | River Channel Widening; Pumping Station Construction; Two Resettlement Housing Complexes; Demolition of the Western Sector | 4.47 |
| Mojiazui Village | MJZ | Zhuqiao Country, Xuanzhou District, Xuancheng, Anhui Province | Polder Dyke | - | - | 4.51 |
| Qiqiao Village | Qiqiao Sub-district, Gaochun District, Nanjing, Jiangsu Province | Plain | Second Batch of Chinese Traditional Village | - | 12.62 | |
| Xiaohu Village | XH | Shuidong Town, Xuanzhou District, Xuancheng, Anhui Province | Plain, Hill-side | Fourth Batch of Chinese Traditional Village | - | 5.32 |
| Gaogang Village | GG | Qiqiao Sub-district, Gaochun District, Nanjing, Jiangsu Province | Plain | - | Collaboration with Tsinghua University, Utilizing Vacant Buildings [45] and Holistic Village Operation [46] | 6.14 |
| Hujiaba Village | HJB | Qiqiao Sub-district, Gaochun District, Nanjing, Jiangsu Province | Plain, Reservoir-side | Reservoir Relocation Village | Construction of Homestays | 5.48 |
| Wangjia Village | WJ | Dongba Sub-district, Gaochun District, Nanjing, Jiangsu Province | River-side, Close to Xu River | - | Resettlement | 8.21 |
| Zhangdu Village | ZD | Yunling Town, Jing County, Xuancheng, Anhui Province | River-side | Third Batch of Chinese Traditional Village | - | 4.39 |
| Baduhe Village | BDH | Gongshan Town, Nanling County, Wuhu, Anhui Province | Hill-side | Anhui Traditional Village | - | 1.85 |
| Yangtian Village | YT | Liqiao Town, Xuanzhou District, Xuancheng, Anhui Province | Hill-side | - | - | 2.73 |
| Pei Village | PC | Feili Town, Langxi County, Xuancheng, Anhui Province | Lake-side | Second Batch of Chinese Traditional Village | - | 1.29 |
| Name | Abbr. | Location | Natural Landscape | Honors and History | Construction and Development |
|---|---|---|---|---|---|
| Lingxiasu Village | LXS | Yongfeng Country, Huangshan District, Huangshan, Anhui Province | Hill-side, River-side | First Batch of Chinese Traditional Village | - |
| Chitan Village | CT | Qinxi Town, Jing County, Xuancheng, Anhui Province | River-side | Fifth Batch of Chinese Traditional Village | - |
| Xuantan Village | XT | Taoxin Town, Wanzhi District, Wuhu, Anhui Province | Polder | - | - |
| Youzhagou Village | YZG | Zhuqiao Country, Xuanzhou District, Xuancheng, Anhui Province | Polder Dyke | - | - |
| Longtan Village | LT | Yangjiang Town, Gaochun District, Nanjing, Jiangsu Province | Polder Dyke | - | - |
| Shanmen Village | SM | Gangkou Town, Ningguo City, Xuancheng, Anhui Province | Plain | Fourth Batch of Chinese Traditional Village | - |
| Hejia Village | HJ | Gucheng Sub-district, Gaochun District, Nanjing, Jiangsu Province | Plain, Lake-side | Fifth Batch of Jiangsu Traditional Village | Renewal of Vacant Buildings |
| Lixi Village | LX | Qiqiao Sub-district, Gaochun District, Nanjing, Jiangsu Province | Plain | - | - |
| Blocks | Shape of Input (C, H, W) | Shape of Output (C, H, W) | Type | Kernel | Stride | Padding | Window_Size |
|---|---|---|---|---|---|---|---|
| Encoder 1 | 3, 512, 512 | 64, 256, 256 | Conv | 7 × 7 | 2 | 3 | - |
| Encoder 2 | 64, 256, 256 | 64, 256, 256 | ResNet Conv × 3 | 3 × 3 | 1 | 1 | - |
| Encoder 3 | 64, 256, 256 | 128, 128, 128 | ResNet Downsampling | 3 × 3 | 2 | 1 | - |
| 128, 128, 128 | 128, 128, 128 | ResNet Conv × 3 | 3 × 3 | 1 | 1 | - | |
| Encoder 4 | 128, 128, 128 | 256, 64, 64 | ResNet Downsampling | 3 × 3 | 2 | 1 | - |
| 256, 64, 64 | 256, 64, 64 | ResNet Conv × 5 | 3 × 3 | 1 | 1 | - | |
| 256, 64, 64 | 256, 64, 64 | Transformer (256) × 2 | - | - | - | 8 | |
| Bottleneck | 256, 64, 64 | 512, 32, 32 | Patch Merging | 3 × 3 | 2 | 1 | - |
| 512, 32, 32 | 512, 32, 32 | ASPP | Multiple | - | - | - | |
| 512, 32, 32 | 512, 32, 32 | Transformer (512) × 2 | - | - | - | 16 | |
| Decoder 1 | 512, 32, 32 | 256, 64, 64 | Pixel Shuffle | 2 × 2 | 2 | 0 | - |
| 256, 64, 64 | 256, 64, 64 | Gated Skip Connection (Encoder 4) | - | - | - | - | |
| 256, 64, 64 | 256, 64, 64 | CAM-Residual | 1 × 1 | 1 | 0 | - | |
| 256, 64, 64 | 256, 64, 64 | Transformer (256) × 2 | - | - | - | 8 | |
| Decoder 2 | 256, 64, 64 | 128, 128, 128 | Pixel Shuffle | 2 × 2 | 2 | 0 | - |
| 128, 128, 128 | 128, 128, 128 | Gated Skip Connection (Encoder 3) | - | - | - | - | |
| 128, 128, 128 | 128, 128, 128 | CAM-Residual | 1 × 1 | 1 | 0 | - | |
| Decoder 3 | 128, 128, 128 | 64, 256, 256 | Pixel Shuffle | 2 × 2 | 2 | 0 | - |
| 64, 256, 256 | 64, 256, 256 | Gated Skip Connection (Encoder 2) | - | - | - | - | |
| 64, 256, 256 | 64, 256, 256 | CAM-Residual | 1 × 1 | 1 | 0 | - | |
| Decoder 4 | 64, 256, 256 | 64, 512, 512 | ConvTranspose | 2 × 2 | 2 | 0 | - |
| 64, 512, 512 | 64, 512, 512 | Conv, ReLU | 3 × 3 | 1 | 1 | - | |
| 64, 512, 512 | 1, 512, 512 | Conv, Sigmoid | 1 × 1 | 1 | 0 | - |
| No. | Model Name | Key Features | Params (M) | FLOPs (G) |
|---|---|---|---|---|
| 1 | U-Net | Foundational encoder–decoder structure | 31.04 | 219.08 |
| 2 | Swin-UNet | As defined in related literature | 23.77 | 105.38 |
| 3 | R-Net | U-Net using ResNet34 as encoder backbone with classic decoder | 24.35 | 117.42 |
| 4 | R-Net-DSC | R-Net with depthwise separable convolution | 4.30 | 25.19 |
| 5 | D-LinkNet [56] | Similar to R-Net, using dilated convolution in upsampling | 23.69 | 110.96 |
| 6 | R-SWTNet | The proposed model prototype | 34.24 | 126.28 |
| 7 | R-SWTNet-DSC | R-SWTNet with depthwise separable convolution | 20.93 | 58.13 |
| No. | Model Name | Train Precision (%) | Val Precision (%) | Train Recall (%) | Val Recall (%) | Train IoU (%) | Val IoU (%) | Train F1 Score (%) | Val F1 Score (%) | Overfitting (%) 1 | Relative Overfitting (%) 2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | U-Net | 31.69 | 43.97 | 59.43 | 32.07 | 26.05 | 22.76 | 41.34 | 37.08 | 4.26 | 11.49 |
| 2 | Swin-UNet | 55.90 | 46.11 | 74.78 | 70.77 | 47.03 | 38.73 | 63.98 | 55.84 | 8.14 | 14.58 |
| 3 | R-Net | 81.04 | 69.73 | 92.60 | 69.25 | 76.01 | 53.15 | 86.43 | 69.41 | 17.02 | 24.52 |
| 4 | R-Net-DSC | 75.24 | 65.63 | 88.50 | 64.20 | 68.55 | 48.05 | 81.34 | 64.91 | 16.43 | 25.31 |
| 5 | D-LinkNet | 86.40 3 | 73.81 | 95.72 | 67.79 | 83.18 | 54.64 | 90.82 | 70.67 | 20.15 | 28.51 |
| 6 | R-SWTNet | 79.55 | 69.99 | 91.41 | 71.11 | 74.01 | 54.88 | 85.07 | 70.87 | 14.20 | 20.04 |
| 7 | R-SWTNet-DSC | 77.98 | 66.77 | 90.40 | 65.01 | 72.02 | 49.12 | 83.73 | 65.88 | 17.85 | 27.09 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Yang, J.; Xu, X.; Zeng, Y.; Cheng, Y.; Liu, X.; Zhang, H. R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset. Land 2025, 14, 1930. https://doi.org/10.3390/land14101930
Wu J, Yang J, Xu X, Zeng Y, Cheng Y, Liu X, Zhang H. R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset. Land. 2025; 14(10):1930. https://doi.org/10.3390/land14101930
Chicago/Turabian StyleWu, Jianing, Junqi Yang, Xiaoyu Xu, Ying Zeng, Yan Cheng, Xiaodong Liu, and Hong Zhang. 2025. "R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset" Land 14, no. 10: 1930. https://doi.org/10.3390/land14101930
APA StyleWu, J., Yang, J., Xu, X., Zeng, Y., Cheng, Y., Liu, X., & Zhang, H. (2025). R-SWTNet: A Context-Aware U-Net-Based Framework for Segmenting Rural Roads and Alleys in China with the SQVillages Dataset. Land, 14(10), 1930. https://doi.org/10.3390/land14101930

