LDANet: A Lightweight Dynamic Addition Network for Rural Road Extraction from Remote Sensing Images

: Automatic road extraction from remote sensing images has an important impact on road maintenance and land management. While signiﬁcant deep-learning-based approaches have been developed in recent years, achieving a suitable trade-off between extraction accuracy, inference speed and model size remains a fundamental and challenging issue for real-time road extraction applications, especially for rural roads. For this purpose, we developed a lightweight dynamic addition network (LDANet) to exploit rural road extraction. Speciﬁcally, considering the narrow, complex and diverse nature of rural roads, we introduce an improved Asymmetric Convolution Block (ACB)-based Inception structure to extend the low-level features in the feature extraction layer. In the deep feature association module, the depth-wise separable convolution (DSC) is introduced to reduce the computational complexity of the model, and an adaptation-weighted overlay is designed to capture the salient features. Moreover, we utilize a dynamic weighted combined loss, which can better solve the sample imbalance and boosts segmentation accuracy. In addition, we constructed a typical remote sensing dataset of rural roads based on the Deep Globe Land Cover Classiﬁcation Challenge dataset. Our experiments demonstrate that LDANet performs well in road extraction with fewer model parameters (<1 MB) and that the accuracy and the mean Intersection over Union reach 98.74% and 76.21% on the test dataset, respectively. Therefore, LDANet has potential to rapidly extract and monitor rural roads from remote sensing images.


Introduction
As basic geographic information, rural roads are an important part of the transport system and play a significant role in urban planning, traffic navigation and digital map updating. With the advancement of science and technology, efficient and high-accuracy road extraction must be carried out in geographic mapping. Traditional methods for road area labeling are mostly manual and GPS-based methods, but the former is burdensome and the latter usually loses road details such as width, edges, etc. [1] Remote sensing images have the advantages of being large-scale, fast-updating, easy to access and rich in information. With increasing resolution, the value and universality of remote sensing images have been greatly expanded. This introduces the possibility and potential of various Earth observation tasks by providing powerful data support [2]. Therefore, using remote sensing images for rapid and efficient road extraction has been a key research topic for many scholars in recent years. There are two main areas of study in remote-sensing-based road network extraction technology. The first is the shallow feature observation method, in which early scholars used the inherent geometric, textural and spectral features of images for road network extraction. However, these features are so simple that the method resulted in low accuracy [3][4][5][6], so some researchers combined it with multi-source data fusion, template matching and model orientation application methods to improve accuracy and efficiency. Zhao et al. [7] used the Extended Kalman Filter (EKF) and Particle Filter (PF) models for road extraction, achieving acceptable performance in moderately and highly noisy backgrounds. Perciano et al. [8] used a two-layer Markov random field (MRF) to analyze multi-source fused data and to improve accuracy. Zang et al. [9] proposed a non-periodic directional structure measure (ADSM) method by introducing the representation of road-like features to enhance its effectiveness. Chinnathev et al. [10] referenced morphological features to develop an automatic road-centerline-extraction fieldprogrammable gate array (FPGA) architecture to meet the demand for real-time road extraction. Although these studies have been effective, there are still problems with low stability and adaptability to complex information, gradient loss and overfitting of results.
The second method is the deep feature mining method, which uses multilayer networks to expand the non-linear mapping of image features and extract them. Hinton et al. [11] introduced a neural network that made the deep learning approach gain widespread attention. Compared with traditional machine learning, deep learning focuses on automatic feature learning from huge datasets with multilayer neuron organization. With the development of artificial intelligence, many scholars have been interested in the intelligent extraction of road networks. Zhong et al. [12] used full convolutional networks (FCNs) for road extraction from remote sensing images and obtained acceptable performance. Varia et al. [13] validated the effectiveness of deep learning methods by using conditional generative adversarial networks (GANs) and FCNs to extract roads from the data of unmanned aerial vehicles (UAVs). Doshi et al. [14] combined the ResNet network and the Inception network to propose a Residual Inception Skip Network and greatly improved the performance of road network extraction. Zhou et al. [15] constructed a D-LinkNet model by improving the extended convolutional layers in LinkNet, which expanded the perceptual field without reducing the resolution of the feature map. Li et al. [16] developed the hybrid convolutional network (HCN) by referring to FCN, Unet and VGG. Boonpook et al. [17] proposed a deep residual deconvolutional network with SegNet, which improves model extraction accuracy by enhancing feature relationships to overcome interference with complex scenes. Lu et al. [18] constructed a multi-scale and multi-task deep learning framework based on Unet, which concerns both road detection and centerline extraction operations, and it outperformed in deep-learning-based road extraction methods. However, it was found that while the higher complexity may achieve better performance, it may also result in greater requirements. Therefore, most of the deep mining models in road extraction are seeking to overcome the problems of training and application.
For meeting the requirements of practical applications, many lightweight networks have been proposed, with the primary intention of lightening the network in size and speed, while maintaining as much accuracy as possible. MobileNets [19][20][21] replaces the standard convolutions with depth-wise separable convolutions (DSC), which effectively reduces the number of parameters by decomposing the convolution operations. ShuffleNets [22,23] came up with a split-shuffle structure to speed up the model training process and enhance inter-channel correlation. ENet [24] was proposed to solve the problem of the large number of floating point operations in the network by compressing the channel, but with the resulting loss of spatial information, the accuracy does not perform well. In recent years, multiple networks have been proposed that employ optimized convolution, which can balance the network's number of parameters, inference speed and segmentation accuracy. LiteSeg [25] applies the Atrous Spatial Pyramid Pooling module to MobileNetV2 and has shown strong performance. ERFNet [26] uses residual connections and factorized convolutions to remain efficient while retaining remarkable accuracy. EDANet [27] proposes a bottleneck structure that combines asymmetric convolution with depth-wise convolution. ESPNet [28] and ESPNetv2 [29] introduce aurous convolution into their networks because they can obtain more features over larger areas using the same parameters. ELANet [30] uses an attention mechanism to strengthen different levels of information. MADNet [31] proposes a dense lightweight network by combining the residual multiscale module with an attention mechanism with the dual residual path block, which effectively reduces the model parameters and complexity. Despite these achievements, we believe that the balance between the accuracy and efficiency of these design strategies for the extraction of single vulnerable targets still need to be improved. Designing a reasonable network structure that achieves greater accuracy with few computational resources by making full use of remote sensing image features is important to facilitate the application of deep learning to road extraction.
To achieve a lightweight, efficient and high-accuracy real-world application of rural road extraction, we designed a lightweight dynamic addition network (LDANet) based on the encoder-decoder framework. Firstly, we constructed an improved ACB-based Inception structure to extend the low-level features in the feature extraction layer. Then, we developed a deep feature association module by introducing DSC and an adaptationweighted overlay to reduce the computational complexity. Finally, we built a typical rural road dataset to evaluate the performance and utilize a dynamically weighted combined loss function to solve the sample imbalance. The main contributions of this article are as follows:

1.
A lightweight rural road extraction model is proposed and shows significant performance on two datasets, enhancing the applicability of remote sensing techniques.

2.
We extended shallow features using ACB-based Inception and designed a lightweight deep correlation module by referring to DSC and an adaptation-weighted overlay.

3.
We designed a dynamic hybrid loss function to improve the accuracy of unbalanced samples.

Data
In this paper, we evaluate the performance of the LDANet on two datasets: (1) the typical rural road dataset and (2) the Massachusetts roads dataset [32].

The Typical Rural Roads Dataset
This dataset is constructed based on the DeepGlobe Land Cover Classification Challenge dataset [33], which is a publicly available dataset of high-resolution remote sensing images that can be obtained from the website http://Deepglobe.org/challenge.html (accessed on 10 March 2022). The dataset's spatial resolution is 50 cm/pixel and includes 1000 images, each 1024 × 1024 in size. The training set, validation set and test set were divided from the typical rural road dataset in a ratio of 75%, 15% and 10%, respectively, and the types of roads contained in the dataset include unpaved roads, paved roads and dirt roads. In the experiment, we used the data augmentation method which includes geometric transformations (scaling and rotation) and pixel transformations (noise addition and gamma transform) [34] to expand the training set and the validation set to a size of 7500 and 1500 images, respectively, with image sizes of 256 × 256. In the test set, 100 images of size 1024 × 1024 were used. Sample images are shown in Figure 1.

The Massachusetts Roads Dataset
This dataset was downloaded from https://www.cs.toronto.edu/~vmnih/data/ (accessed on 15 December 2021). The dataset's spatial resolution is 1.2 m/pixel and the image size is 1500 × 1500. The Massachusetts roads dataset covers both urban and suburban areas. In this paper, we split the images from the original dataset into multiple 256 × 256 images and expanded the training set and validation set to 6000 and 1200 images, respectively, for the experiment. In the test set, we used 100 images of size 1500 × 1500. Sample images are shown in Figure 2.

The Massachusetts Roads Dataset
This dataset was downloaded from https://www.cs.toronto.edu/~vmnih/data/ (accessed on 15 December 2021). The dataset's spatial resolution is 1.2 m/pixel and the image size is 1500 × 1500. The Massachusetts roads dataset covers both urban and suburban areas. In this paper, we split the images from the original dataset into multiple 256 × 256 images and expanded the training set and validation set to 6000 and 1200 images, respectively, for the experiment. In the test set, we used 100 images of size 1500 × 1500. Sample images are shown in Figure 2.

Methodology
In this work, we aimed to construct a lightweight dynamic addition network (LDANet) for rural road extraction. LDANet consists of two modules: the feature expansion module based on the improved Inception framework and the deep feature association module based on the DSC and dynamic feature superposition. The model framework is shown in Figure 3.
In the feature expansion module, the model obtains global and multi-scale features from the input image by averaging pooling and convolving at different scales, and uses 1

Methodology
In this work, we aimed to construct a lightweight dynamic addition network (LDANet) for rural road extraction. LDANet consists of two modules: the feature expansion module based on the improved Inception framework and the deep feature association module based on the DSC and dynamic feature superposition. The model framework is shown in Figure 3.

Feature Expansion Module
Based on previous research [35], we have found that road texture, color and geometric features have a positive effect on road extraction. Rural roads are narrow, complex and diverse, so enriching shallow information is essential and important. To fully reflect the performance of shallow features, we built a feature expansion module based on the Inception network module ( Figure 3a). First, the input layer is up-dimensioned using a 1 × 1 convolution kernel, which not only reduces the convolution parameters but also integrates features across channels to ensure the validity of the features while simplifying the model. Then, we use the asymmetric convolution block (ACB) [36] to expand the feature layers. The ACB (Figure 3b) contains three branches-a 3 × 1 horizontal kernel, a 1 × 3 vertical kernel and a 3 × 3 global kernel-and each branch extracts the layer features separately and then fuses the results. Compared with double-layer convolution, ACB not only reduces the model parameters but also improves the saliency of node features while ensuring feature globalization. As roads have certain geospatial correlations, expanding the perceptual field can further enhance the road feature information, so we set the convolutional kernel size to 3 × 3 versus 5 × 5. Finally, the features are up-dimensioned and output by 1 × 1 convolution.

Deep Feature Association Module
As shown in Figure 4b, we use a multi-layer convolutional network for deep feature extraction and to enhance feature association with dynamic weighted superposition. This module is based on the Unet [37] network framework (Figure 4a). In the encoder stage, we expand the feature space dimension by 3 × 3 convolution to enhance the deep information and then compress the feature layer size by MaxPool to reduce the model's computational complexity and to improve the computational efficiency. In the decoder stage, we recover and output the image by jump connection and upsampling. In this module, we introduce an adaptation-weighted overlay layer. In the encoder stage, the layer has an overlap and dynamically adjusts the weights of the convolutional output feature layer, which increases the expressiveness of salient features and improves the efficiency of feature extraction. In the decoder stage, dynamically weighted overlay layers and 1 × 1 convolution realize the jump connection in the network. Compared with the complex jump connection in Unet++ [38] and Unet+++ [39], this method can not only strengthen the association of salient features but can also better preserve the structure of the feature space and achieve model simplification based on ensuring the accuracy of extraction. As shown in Figure 4b, this module takes fewer feature dimensions than Unet In the feature expansion module, the model obtains global and multi-scale features from the input image by averaging pooling and convolving at different scales, and uses 1 × 1 convolution for the fusion and dimensionality enhancement of feature layers. The deep feature association module is divided into two stages: the encoder and the decoder. In the encoder stage, six convolutional layers are constructed to extract the deep features of the expanded feature layers, where every two layers are pooled once to compress the image size, and the two feature layers are dynamically weighted and superimposed. In the decoder stage, the image size of the deep feature layers is expanded by upsampling, then the expanded layers are fused with the upper superimposed layers, and finally, a result map the same size as the input data is produced.

Feature Expansion Module
Based on previous research [35], we have found that road texture, color and geometric features have a positive effect on road extraction. Rural roads are narrow, complex and diverse, so enriching shallow information is essential and important. To fully reflect the performance of shallow features, we built a feature expansion module based on the Inception network module (Figure 3a). First, the input layer is up-dimensioned using a 1 × 1 convolution kernel, which not only reduces the convolution parameters but also integrates features across channels to ensure the validity of the features while simplifying the model. Then, we use the asymmetric convolution block (ACB) [36] to expand the feature layers. The ACB (Figure 3b) contains three branches-a 3 × 1 horizontal kernel, a 1 × 3 vertical kernel and a 3 × 3 global kernel-and each branch extracts the layer features separately and then fuses the results. Compared with double-layer convolution, ACB not only reduces the model parameters but also improves the saliency of node features while ensuring feature globalization. As roads have certain geospatial correlations, expanding the perceptual field can further enhance the road feature information, so we set the convolutional kernel size to 3 × 3 versus 5 × 5. Finally, the features are up-dimensioned and output by 1 × 1 convolution.

Deep Feature Association Module
As shown in Figure 4b, we use a multi-layer convolutional network for deep feature extraction and to enhance feature association with dynamic weighted superposition. This module is based on the Unet [37] network framework (Figure 4a). In the encoder stage, we expand the feature space dimension by 3 × 3 convolution to enhance the deep information and then compress the feature layer size by MaxPool to reduce the model's computational complexity and to improve the computational efficiency. In the decoder stage, we recover and output the image by jump connection and upsampling. In this module, we introduce an adaptation-weighted overlay layer. In the encoder stage, the layer has an overlap and dynamically adjusts the weights of the convolutional output feature layer, which increases the expressiveness of salient features and improves the efficiency of feature extraction. In the decoder stage, dynamically weighted overlay layers and 1 × 1 convolution realize the jump connection in the network. Compared with the complex jump connection in Unet++ [38] and Unet+++ [39], this method can not only strengthen the association of salient features but can also better preserve the structure of the feature space and achieve model simplification based on ensuring the accuracy of extraction. As shown in Figure 4b, this module takes fewer feature dimensions than Unet because, on the one hand, the road is a two-class model, and too rich a feature space will cause greater calculation pressure, and on the other hand, this module uses DSC instead of traditional convolution calculation to reduce model parameters. Because one convolution kernel is responsible for one channel in DSC, and one channel is only convolved by one convolution kernel, the number of features mapped is the same as the number of channels in the input layer [40]. So, we added an intermediate feature layer to the DSC, first by up-dimensioning the input features by 1 × 1 convolution, then using DSC to generate the intermediate feature layer, and finally using the 1 × 1 convolution output; the convolution process is shown in Figure 5. because, on the one hand, the road is a two-class model, and too rich a feature space will cause greater calculation pressure, and on the other hand, this module uses DSC instead of traditional convolution calculation to reduce model parameters. Because one convolution kernel is responsible for one channel in DSC, and one channel is only convolved by one convolution kernel, the number of features mapped is the same as the number of channels in the input layer [40]. So, we added an intermediate feature layer to the DSC, first by up-dimensioning the input features by 1 × 1 convolution, then using DSC to generate the intermediate feature layer, and finally using the 1 × 1 convolution output; the convolution process is shown in Figure 5.

Loss Function
Loss functions are often used to measure the extent to which a model's predictions differ from the actual data, and selecting the correct loss function can guide model training in the right direction.
Cross-Entropy Loss is a loss function that evaluates the difference between the because, on the one hand, the road is a two-class model, and too rich a feature space will cause greater calculation pressure, and on the other hand, this module uses DSC instead of traditional convolution calculation to reduce model parameters. Because one convolution kernel is responsible for one channel in DSC, and one channel is only convolved by one convolution kernel, the number of features mapped is the same as the number of channels in the input layer [40]. So, we added an intermediate feature layer to the DSC, first by up-dimensioning the input features by 1 × 1 convolution, then using DSC to generate the intermediate feature layer, and finally using the 1 × 1 convolution output; the convolution process is shown in Figure 5.

Loss Function
Loss functions are often used to measure the extent to which a model's predictions differ from the actual data, and selecting the correct loss function can guide model training in the right direction.

Loss Function
Loss functions are often used to measure the extent to which a model's predictions differ from the actual data, and selecting the correct loss function can guide model training in the right direction.
Cross-Entropy Loss is a loss function that evaluates the difference between the probability distribution and the true distribution of the current training set to judge the training effect on the model. The model in this paper is dichotomous, so Binary Cross-Entropy Loss (BCE Loss) was chosen as the loss function.
where y i is the sample value, the positive class is 1 and the negative class is 0. p i is the predicted value, taking values within (0, 1). Roads behave narrowly in the dataset, covering a small area, and can suffer from severe sample imbalance during training. Dice Loss solves the problem of having too small a proportion of foregrounds by measuring the overlap of two samples, and it has acceptable performance in binary classification problems.
where y i is the sample value, the positive class is 1 and the negative class is 0. p i is the predicted value, taking values within (0, 1). Dice Loss has acceptable performance for scenarios with a severe imbalance between positive and negative samples, but the loss tends to be unstable when training small targets, which leads to drastic gradient changes. Therefore, this paper proposes a Combined Weighted Loss (CWL), which sets the weights according to the ratio of BCE Loss to Dice Loss, giving higher weights to functions with larger loss values and lower weights to functions with smaller loss values, to increase the proportion of high-value loss functions while using low-value loss functions to maintain the stability of the model, thereby accelerating the convergence of the model and improving its accuracy.
W CB = |BCE Loss n | |BCE Loss n | + |Dice Loss n | (4) where W CB is the proportion of BCE Loss, and W CD is the proportion of Dice Loss.

Model Evaluation Criteria
In this paper, four precision metrics, Precision, Recall, F1 Score and IoU (Intersection over Union), were chosen to evaluate the road extraction results, and their expressions are as follows.
Precision is the ratio of the area of the real road area in the resulting image to the area of the road in the labeled image.
Recall is the proportion of correctly identified roads to the total roads in the tagged image.
The F1 Score, a statistical measure of the accuracy of a dichotomous model, is often used to determine the overall performance of a dichotomous model.
IoU is the degree of overlap between the pre-target measurement results and the target label.
where TP (true positive) represents a positive label and a positive prediction, FP (false positive) represents a negative label and a positive prediction and FN (false negative) represents a positive label and a negative prediction.

Loss Function Selection
In this section, we use BCE Loss, Dice Loss, BCE Loss + Dice Loss and CWL as model loss functions and test them on the rural road dataset. The experimental results were evaluated using both Precision and IoU, and the results are shown in Table 1. From the comparison of the single-loss-function applications, it can be seen that BCE Loss performs better in terms of extraction accuracy compared to Dice Loss, but Dice Loss has better performance in terms of IoU, which is because Dice Loss focuses more on the overlapping area of image samples, while BCE Loss focuses more on the global performance. The combined loss results show that the Precision and IoU of BCE Loss + Dice Loss and CWL are higher than that of using a single loss function, indicating that the combination of loss functions can effectively improve the performance of the model. The results show that the Precision and IoU of CWL are the highest, which shows that the dynamic weighting method can give full play to the advantages of the combination of loss functions, so that the model in this paper can fully utilize the advantages of Dice Loss in the secondary classification while ensuring stability and improving the applicability of the model to situations of serious imbalance between positive and negative samples. Therefore, CWL was selected as the loss function for road extraction in this paper.

Results and Discussion
To estimate the performance of our model, we compared the model with five other models: Unet, Unet++, Unet+++, MACUnet [41] and MobileNet. Unet++, Unet+++ and MACUnet are all Unet-based networks, with Unet++ and Unet+++ refining the jump layer and enhancing global feature linkage, and MACUnet improving upon Unet with the ACB module and multi-scale jumps to enhance network feature acquisition. MobileNet was used as a benchmark for the comparison of lightweight models. All experiments were implemented on an NVIDIA Tesla P100 GPU with Adam as the optimizer and a learning rate of 0.001. LDANet uses CWL as the loss function and the rest of the models use a Cross-Entropy Loss function.

Results of the Typical Rural Roads Dataset
In this section, we compare LDANet with five other image segmentation models on the rural road dataset. Table 2 shows the results of four accuracy evaluation metrics, Precision, Recall, F1 Score and IoU, for the six models on the rural road test set. Since we aimed to build fast and efficient lightweight models for rural road extraction, it was necessary to evaluate the model complexity, so we conducted statistics on the training time and model parameters for each of the six models mentioned above to measure the model performance. From Table 2, Unet++ has the highest Precision and IoU, scoring 0.9881 and 0.7644, respectively. This is because it deepens the network context and can fully learn the characteristics of the target. However, the complex network structure degradation increases its operation difficulty, making it not perform well in terms of model parameters and training time. In terms of training efficiency, MobileNet uses DSC to replace the traditional convolution process, which reduces the cost of convolution and makes it perform best in terms of model parameters and training time. However, due to image restoration only occurring through the upsampling method, the global feature association of the target is insufficient in the modeling process, resulting in low extraction accuracy.
In terms of model accuracy, LDANET has a Precision of 0.9874 and an IoU of 0.7621, which are 1.98% and 2.13% lower than Une+++, respectively, but LDANET greatly improves training efficiency with a model parameter count of 0.20 M and a training speed of 183 s/epoch, thus having 3375% fewer parameters and being 836% faster than Unet+++. Meanwhile, LDANET performed comparably to MobileNet in terms of training efficiency, but with 1.91% and 1.9% improvements in accuracy and IoU, respectively.
The results show that compared with Unet++, LDANet is relatively lightweight because it introduces ACB and DSC at the cost of slightly reducing model accuracy, which greatly enhances the applicability of the model. Compared with MobileNet, although LDANet's calculation parameters have increased, it can significantly improve the accuracy of the model. This is because LDANet uses the encoder and decoder structure to strengthen the feature association and to significantly improve the expression of dominant features by introducing dynamically weighted overlay layers. Figure 6 shows the extraction results of the six models on the rural road test set. From the figure, it can be seen that all six models have some defects in the road extraction effects, but compared with other models, Unet+++ and LDANet have an obvious improvement in terms of road extraction. In summary, the use of LDANet for rural road extraction has a high application value. From Table 2, Unet++ has the highest Precision and IoU, scoring 0.9881 and 0.7644, respectively. This is because it deepens the network context and can fully learn the characteristics of the target. However, the complex network structure degradation increases its operation difficulty, making it not perform well in terms of model parameters and training time. In terms of training efficiency, MobileNet uses DSC to replace the traditional convolution process, which reduces the cost of convolution and makes it perform best in terms of model parameters and training time. However, due to image restoration only occurring through the upsampling method, the global feature association of the target is insufficient in the modeling process, resulting in low extraction accuracy.
In terms of model accuracy, LDANET has a Precision of 0.9874 and an IoU of 0.7621, which are 1.98% and 2.13% lower than Une+++, respectively, but LDANET greatly improves training efficiency with a model parameter count of 0.20 M and a training speed of 183 s/epoch, thus having 3375% fewer parameters and being 836% faster than Unet+++. Meanwhile, LDANET performed comparably to MobileNet in terms of training efficiency, but with 1.91% and 1.9% improvements in accuracy and IoU, respectively.
The results show that compared with Unet++, LDANet is relatively lightweight because it introduces ACB and DSC at the cost of slightly reducing model accuracy, which greatly enhances the applicability of the model. Compared with MobileNet, although LDANet's calculation parameters have increased, it can significantly improve the accuracy of the model. This is because LDANet uses the encoder and decoder structure to strengthen the feature association and to significantly improve the expression of dominant features by introducing dynamically weighted overlay layers. Figure 6 shows the extraction results of the six models on the rural road test set. From the figure, it can be seen that all six models have some defects in the road extraction effects, but compared with other models, Unet+++ and LDANet have an obvious improvement in terms of road extraction. In summary, the use of LDANet for rural road extraction has a high application value. To further validate the superiority and robustness of our LDANet, we also compared the LDANet with other methods on the Massachusetts roads dataset, using Precision, Recall, F1 Score and IoU metrics for evaluation.
The comparison results are shown in Table 3. The table shows that LDANet also

Results of the Massachusetts Roads Dataset
To further validate the superiority and robustness of our LDANet, we also compared the LDANet with other methods on the Massachusetts roads dataset, using Precision, Recall, F1 Score and IoU metrics for evaluation.
The comparison results are shown in Table 3. The table shows that LDANet also performs well on the Massachusetts roads dataset, with Precision, Recall, F1 Score and IoU reaching 0.9755, 0.9707, 0.9731 and 0.6834, respectively. Figure 7 shows the extraction results of LDANet on the Massachusetts roads dataset compared to the ground truth map comparison, and it can be seen that our method achieves adequate visual results of road extraction.

Discussion
Deep learning methods to achieve road extraction from remote sensing images are important to promote the application of remote sensing images and the development of cities. In this work, a lightweight model was constructed based on the Unet network structure. The model enhances the shallow feature information by introducing a feature expansion module and uses dynamic weighted superposition to improve the feature representation. Compared with Boonpook and Lu et al., this method can significantly reduce the modeling parameters by using DSC, making the model lighter and faster. Compared with MobileNets and ShuffleNets, this model can obtain high-accuracy road extraction results by upsampling and hopping connections of the Unet network structure to fully learn contextual features.

Discussion
Deep learning methods to achieve road extraction from remote sensing images are important to promote the application of remote sensing images and the development of cities. In this work, a lightweight model was constructed based on the Unet network structure. The model enhances the shallow feature information by introducing a feature expansion module and uses dynamic weighted superposition to improve the feature representation. Compared with Boonpook and Lu et al., this method can significantly reduce the modeling parameters by using DSC, making the model lighter and faster. Compared with MobileNets and ShuffleNets, this model can obtain high-accuracy road extraction results by upsampling and hopping connections of the Unet network structure to fully learn contextual features.

Conclusions
We have proposed a lightweight extraction model based on rural road data. The model is composed of a feature expansion module and a deep feature association module. In addition, we used a dynamically weighted loss function according to the small proportion of rural roads. Compared with complex methods which are expensive to calculate, our method focuses on enriching shallow features and strengthening the correlation of deep salient features, which can effectively balance reliability and speed. In the typical rural roads dataset, our model's accuracy was 0.9874, and its IoU was 0.7621. In the Massachusetts roads dataset, our model also performed well. Our model has the characteristics of a small number of parameters and a fast training speed, which can greatly reduce the requirements for hardware while still ensuring extraction accuracy in practical applications. Therefore, it is of great significance to promote the portable and rapid application of remote sensing technology.
Future work will involve optimization strategies based on combined model applications to achieve multi-objective learning applications by enhancing global information interactivity based on constrained model complexity.