1. Introduction
Winter wheat is one of the major crops for global food security. Accurate statistics of its planting area are crucial for guiding agricultural production, ensuring food supply, and optimizing planting structure [
1]. The timely and accurate acquisition of crop spatial distribution information is of great practical significance for optimizing crop planting layout, enhancing crop monitoring and production management, accurately estimating and predicting yield, and providing decision support for government agencies in formulating scientifically sound agricultural policies. With the advancement of modern agricultural technology, especially following the integration of remote sensing image processing and machine learning, the development of crop recognition and planting area statistics techniques has been significantly promoted [
2,
3].
Although traditional machine learning methods have achieved good results in crop semantic segmentation tasks using remote sensing images, there are still many limitations to be addressed. For example, the clustering-based segmentation methods are largely affected by lighting and seasonal changes [
4]; the threshold segmentation methods are sensitive to initial parameters [
5]; and the edge segmentation methods have difficulty in accurately distinguishing the boundaries of crop planting plots [
6]. In recent years, deep learning technology has made an important breakthrough in the field of crop segmentation. The multi-layer neural network in deep learning models can automatically learn image features and effectively capture multi-scale information of crops. Compared to traditional machine learning methods, deep learning models exhibit higher accuracy and stronger generalization ability [
7,
8,
9].
In crop semantic segmentation tasks based on remote sensing images, the model must accurately recognize the class of each pixel and determine the positions of crops. This requires the model to not only learn high-level semantic information of wheat plots, but also capture low-level boundary details to accurately obtain the contour and shape of a target plot [
10,
11,
12]. The Variable Spatial Attention Module proposed by Yang et al. [
13] achieved a fine recognition of crop features by calculating the feature weights in spatial dimensions, which improved the model’s capability of capturing crop details as well as its adaptability to complex farmland scenes. The multi-task learning encoder–decoder network proposed by Long et al. [
14] was able to help the classification model to learn both local details and the overall structure of crops by unifying the perception of boundaries and shapes of planting plots. The HRNet model introduced by Zhang et al. [
15] successfully maintained high-resolution feature representation through its parallel structure and multi-scale feature fusion, allowing the model to accurately capture the fine structure and spatial distribution of crops. This model provides accurate spatial information of the target object in tasks of crop semantic segmentation. Zhang et al. [
16] improved DeepLabV3+ by replacing its backbone network with the lightweight MobileNetV2 and introducing a Convolutional Block Attention Module (CBAM), which combines channel attention and spatial attention modules, thereby achieving lightweight semantic segmentation extraction for winter wheat.
Although the above methods have integrated the spatial contextual information at multiple scales through feature fusion, which is conducive to enhancing the model’s ability of pixel-level feature representation and improving the overall segmentation performance, there are still two main drawbacks. Firstly, the independence of the correlation computation process will inevitably introduce significant noise and fuzziness into the model [
17,
18,
19], which may seriously affect the classification accuracy. For example, in semantic segmentation tasks of winter wheat planting plots, it may lead to fuzzy and erroneous boundary recognition. Secondly, single-confidence scale class representation methods exhibit shortcomings in dealing with intra-class heterogeneity [
17,
18,
20], making it difficult to adapt to changes in wheat growth status within the same planting plot caused by different environmental and growth factors. This phenomenon will result in the poor performance of the model in distinguishing differences within the same planting plot, which in turn affects the overall evaluation of the planting plot distribution.
To address the above issues, a classification model based on an improved HRNet network for the accurate recognition of winter wheat planting plots was proposed in this paper, using Gaofen-1 satellite images as the data source. The main contributions are summarized as follows:
- (1)
A diverse semantic segmentation dataset for winter wheat in North China was constructed, encompassing various categories. The study area includes both small, fragmented plots in mountainous regions and relatively concentrated farmland in the plains, reflecting a certain degree of regional heterogeneity.
- (2)
A semantic domain module was incorporated into the classification model to extend the semantic domain of pixels. Additionally, the concept of class confidence was introduced as a scale criterion within the semantic domain to extract multi-confidence scale class representations. Consequently, the model’s parsing accuracy of pixel-level semantic information significantly improved, and the discrepancies between pixel-level semantic recognition results and actual classes were effectively reduced, leading to enhanced semantic segmentation accuracy.
- (3)
A nested attention module was introduced to enhance the model’s sensitivity to local features, strengthening key features in images while suppressing correlations with non-target features. Consequently, the model’s ability to recognize crop boundaries in complex agricultural scenes was greatly improved.
2. Materials and Methods
2.1. Overview of the Research Area
The research area of this paper is confined to Shijiazhuang City, Hebei Province, China. Shijiazhuang City is situated in the central southern part of Hebei Province, with a geographical scope ranging from 113°30′–115°20′ E and 37°27′–38°47′ N. At the northern edge of the Huang Huai Plain, Shijiazhuang City has four distinct seasons (hot summers, cold winters, and relatively short springs and autumns). Our research area covers parts of the mountainous regions and plains in Shijiazhuang City, with complex and diverse terrain conditions. Generally speaking, the planting plots in mountainous regions are small in scale and scattered in distribution, while the plots in plains are concentrated in distribution. A total of 156 villages were sampled from the counties under the jurisdiction of Shijiazhuang City (locations shown in
Figure 1), covering different terrain types. The total area of the samples is 624.1 km
2, including several typical agricultural production regions. Overall, the distribution of the selected planting plots can well reflect the general characteristics of land distribution in Shijiazhuang City. The climate in the research area belongs to the temperate monsoon climate, with precipitation mainly concentrated in summers and relative dryness in winters. Winter wheat, as the main crop planted over the winter season, is an important component of local agricultural production. It is usually sown in early October, and harvested from late May to early June in the following year after experiencing the low temperature of winter.
2.2. Data Preprocessing and Sample Construction
In this study, the remote sensing image data obtained by the PMS sensor carried by the China-made Gaofen-1 satellite were used as the source data. In the process of image data selection, priority was given to images with less cloud content and a uniform color tone distribution to ensure that the data quality meets our research requirements. The original multispectral images have an 8 m resolution and include four bands: red, green, blue, and near-infrared. The panchromatic images have a 2 m resolution; despite having a high spatial resolution, the panchromatic images only provide spectral information for a single band. In order to correct the geometric deformation of images, a 30 m resolution Digital Elevation Model (DEM) was used to perform orthorectification on both multispectral and panchromatic images. After orthorectification, the images were treated by the Gram Schmidt Orthogonalization (GS) fusion technique to achieve effective fusion between multispectral images and panchromatic images. This process can not only preserve the rich spectral features in multispectral images, but also improve the spatial resolution to 2 m, so that the applicability of the images in fine terrain recognition was greatly improved. After the above steps, the cloud coverage of the images was significantly reduced, and the color tone distribution became more uniform, which is conducive to improving the model’s recognition accuracy of crops, buildings, water bodies, forests, and other background classes that are contained in the source data images. A sample image after the above processing steps is shown in
Figure 2.
In terms of geographical scope selection, to ensure the reliability of our research, representative locations that can comprehensively cover different geographical environments in Shijiazhuang City were selected through precise analysis of the remote sensing image data. The selected samples cover typical terrain types across the city, including a total of 156 villages. Based on the remote sensing images, the boundaries of the land cover types in the samples were drawn to generate vector polygons, and the specific classes of each planting plot were determined through field investigations. In addition, the class information was annotated in the corresponding vector attribute table. The annotated vector data were then converted into raster data, which were used as the label data for model training in ArcGIS10.8 software. To meet the model’s input requirements, the images in the source data and the corresponding labels were cropped into 512 × 512 pixel blocks, and a dataset containing 3488 images was obtained. These images were divided into the training set, validation set, and testing set in a ratio of 7:2:1, which were used for model training, parameter tuning, and performance evaluation, respectively. To improve the model’s robustness, data augmentation was performed on the training and validation sets from the following aspects: (1) randomly rotating the images by 90°, 180°, or 270° to simulate land cover types at different angles and directions; (2) randomly scaling the images by 0.5, 0.75, 1.25, and 1.5 times to simulate the size changes of objects at different scales; (3) increasing or decreasing the contrast of images, such as histogram equalization and gamma correction, to simulate different image qualities or weather conditions.
2.3. Overall Structure of the Model
In this study, the HRNet network [
21] was used as the backbone feature extraction network of the classification model, and a semantic domain module (SDM) and a nested attention module (NAM) [
22] were introduced to improve the model’s performance. The overall architecture of the model is illustrated in
Figure 3. First, the features extracted by the HRNet backbone network are denoted as R; meanwhile, a class probability distribution D is obtained through a 1 × 1 convolution network to serve as the representations of initial classification. Then, the SDM is used to extract multi-confidence scale class representations
and global class representations
from R and D. Subsequently, the NAM is used to process R and
for the purpose of optimizing the pixel class relationships
.
then interacts with
to generate the enhanced semantic representations
. Lastly, the image resolution is restored through bilinear interpolation.
2.4. Semantic Domain Module
The background complexity of remote sensing images is an important consideration of researchers. Due to the diversity of shooting conditions (e.g., geographical locations, time, and shooting angles), even for the same type of land cover type, different remote sensing images may show significant variability. For example, winter wheat may exhibit obvious differences in color and texture in the images captured from different regions, seasons, and shooting angles. The traditional class context modeling methods typically focus on extracting the global feature center of each class. This type of method often overlooks the diversity and complexity within the same class; consequently, the classification model may not be able to accurately distinguish the internal differences of classes with high similarity, thereby increasing the risk of false classification.
To solve the above problem, a scaling criterion based on class confidence was introduced in this paper to explore class representations with multi-confidence scales. Specifically, class representations of small-scale semantics (including only high confidence levels) can help quickly model prominent features, while class representations of large-scale semantics (including all confidence levels) can help model global features. By strengthening the interaction between pixels and multi-confidence scale class representations, the perceptual ability of pixels to different classes can be enhanced, and the interference from noise can be reduced, enabling the model to extract more accurate relationships between pixels and classes to achieve accurate winter wheat classification. The SDM introduced in this paper is shown in
Figure 4. According to the class probability distribution
, the feature representations
are grouped into multiple class regions as follows:
where
is an
matrix;
k refers to a class label; and
refers to the number of representations belonging to Class
k. Similarly, a matrix
of size
is defined as follows:
Let
represent a function that returns the range from the
i-th element to the
j-th element, sorted in a descending order with a starting index of 1. The absolute difference between the highest and second highest probability values in Class
k is defined as the certainty that each pixel belongs to Class
k. Then, the confidence level of whether a target pixel belongs to Class
k can be determined (the larger the difference, the higher the certainty that this pixel belongs to Class
k), as shown below:
where
refers to a matrix of
× 1, representing the confidence that a pixel belongs to Class k. In addition, the certainty that a pixel belongs to Scale m of Class
k is defined as follows:
where
refers to a matrix of
× 1,
m ∈ [1,
M]. For each Class
k, the context representations are computed as follows:
where
refers to the part in
corresponding to the Weight
, while
refers to the center of Class
k at Scale
m.
The output of the SDM is a tensor
, which is considered as a multi-confidence scale class representation, as shown below:
Moreover, large-scale class representations
, as expressed below, are adopted to participate in class context integration:
2.5. Nested Attention Module
To seek common features in the original pixel class relationships for the purpose of enhancing correct correlations and suppressing erroneous correlations, an NAM was introduced into our model to reduce the noise and ambiguity of pixel-level class relationship weights. The relationship between pixels and global classes can be obtained from the relationship between pixels and multi-confidence class representations. The specific network structure is shown in
Figure 5.
Firstly, the relationship between pixels and multi-confidence scale class representations is computed as follows:
Then, the NAM takes
as the input:
The network structure of the NAM is shown in
Figure 6, where
represents the relationship
after pixel class optimization; and
refers to the fully connected layer used for projecting pixels to multi-confidence scale class representations. Finally, the optimized pixel class relationship
is obtained through residual linking.
2.6. Loss Function
To achieve the best training effect, the model proposed in this paper was trained by the Cross-Entropy Loss (CE Loss) function. An advantage of CE Loss is that it can effectively reduce the noise and ambiguity in pixel class relationships, and improve the perceptual ability of pixels to classes, thereby enhancing the model’s discriminative ability and overall performance. CE Loss is capable of handling class imbalance, providing stable gradient signals, and accelerating model convergence. In addition, it incorporates soft labeling to handle uncertainty, which is conducive to improving the model’s robustness and generalization ability, making it a suitable option for optimizing multi-confidence scale class representations. Mathematically,
CE Loss can be expressed as follows:
where
refers to the value of the model output after being processed by softmax;
refers to the one hot encoding of a true value. In this way,
CE Loss can play an important role in the optimization process of multi-confidence scale class representations, ensuring that the model can still maintain efficient and accurate classification performance under complex backgrounds and varying conditions. More specifically, in dealing with high-complexity and high-diversity tasks such as the segmentation of remote sensing images, the advantage of CE Loss is very prominent. It can not only handle data imbalance, but also effectively optimize class representations with different confidence levels, therefore further improving the model’s overall performance and practicality.
2.7. Experimental Environment and Configurations
The server used in our experiment is equipped with Intel Core i9-9820X @ 3.30 GHz CPU, 64 GB memory, and 4 pieces of 24 GB RTX 3090 GPU for computation. The CUDA version is 11.7. The development environment is Python 3.9. The model construction, training, and parameter tuning are based on the Python 2.0.0 deep learning framework with the following parameters: optimizer, Adam; initial learning rate, 0.01; weight decay, 0.0005; batch size, 4; and epoch, 100.
2.8. Model Evaluation Metrics
Similar to previous studies related to remote sensing semantic segmentation [
23,
24,
25], the mean intersection over union (mIoU), recall, precision, overall accuracy (OA), and F1-score were used in this paper as evaluation metrics to compare the computation results between different models. Specifically, the
mIoU represents the mean ratio of the intersection over union of all classes, which can be computed as follows:
In this paper,
TP,
TN,
FP, and
FN refer to the true positive, true negative, false positive, and false negative pixels predicted by the model, respectively.
Precision refers to the proportion of samples that actually belong to a certain class among all the samples predicted to be this class.
Recall is used to evaluate the model’s capability of recognizing positive samples. It can be computed as follows:
OA represents the proportion of correctly classified samples to the total number of samples, which can be computed as follows:
The
F1-score represents the weighted harmonic average of
recall and
precision, which can be computed as follows:
3. Results and Discussion
3.1. Subsection Model Comparison
To verify the effectiveness and accuracy of the proposed method, we selected several traditional deep learning models for remote sensing semantic segmentation to compare their performance with ours. The selected models include U-Net [
26], DeepLabv3+ [
27], Segformer [
28], and PSPNet [
29]. The comparison results are presented in
Table 1. It can be seen that the mIoU of our method was increased by 4.5, 3,3, 1.8, and 1.4 percentage points compared to U-Net, DeepLabv3+, Segformer, and PSPNet, respectively, indicating that our model can effectively address the issue of intra-class differences in remote sensing image segmentation tasks. By enhancing the perceptual ability of pixels to different classes and suppressing the interference from noise, our model can more accurately capture pixel class relationships, thereby greatly improving the winter wheat segmentation accuracy.
According to the visualization results of semantic segmentation shown in
Figure 7, U-Net and Segformer had false classifications in classifying narrow and elongated winter wheat planting plots, especially when dealing with smaller plots, and DeepLabv3+ had poor performance in boundary extraction. In terms of river segmentation, our model delivered the best performance in maintaining river continuity; U-Net and PSPNet had problems in recognizing water bodies; and DeepLabv3+ and Segformer had generally poor performance. In terms of trees and background segmentation, U-Net tended to classify trees as background content; DeepLabv3+ also misclassified a part of trees that were similar to background content; and Segformer had a small number of false classifications.
Comprehensive analysis shows that the model proposed in this paper has significant advantages in segmentation continuity, and is able to accurately recognize and segment planting plots while significantly reducing noise interference. These findings indicate that the SDM and NAM have prominent effects in optimizing the semantic segmentation performance. By enhancing the model’s capability of recognizing internal differences within various land cover classes, the mode’s segmentation performance targeting complex land cover structures was significantly improved, and the pixel-level class relationships were optimized, allowing it to achieve higher accuracy and robustness in semantic segmentation tasks.
3.2. Ablation Tests
In order to clarify the effectiveness of our method in improving the model accuracy, a series of ablation tests were conducted on the two optimization modules introduced in this paper based on the evaluation metrics mentioned earlier.
Table 2 shows the results of the ablation tests on each module. For the model only incorporating an SDM, the mIoU was increased by 3.37%; for the model only incorporating an NAM, the mIoU was increased by 3.29%; for the model incorporating both modules, the mIoU was increased by 4.74%. The results prove that both optimization modules have a positive effect on the semantic segmentation of remote sensing images.
While exploring the influencing factors on the model’s performance, the scale number of the SDM and the range of the NAM were considered as key parameters. Under the same control environment and data conditions, a series of comparative tests were conducted. First, the range of the NAM was set to 16 × 16 pixels, while the scale number of the SDM was adjusted on such a basis.
Table 3 shows the changing trend of model performance as the scale number increases from 1 to 16. Specifically, when the scale number was set to 1 (i.e., single-confidence scale), the model’s performance was unsatisfactory. As the scale number increased, the model’s segmentation performance was improved gradually, indicating that multi-scale representation is an effective strategy for capturing complex features in images. However, when the scale number was too large, the mIoU tended to turn downward, which may be due to overfitting or limitations in computing resources. In our tests, it was found that the model achieved the best performance in terms of IoU in classes including winter wheat, as well as in terms of overall mIoU, when the scale number was set to 8. Therefore, the scale number of the SDM of the final model was set to 8.
Subsequently, we further investigated the impact of the range of the NAM on the model performance. Specifically, the range of the NAM was gradually increased from 8×8 pixels to 64×64 pixels to observe the model performance under different range sizes. According to the results summarized in
Table 4, the mIoU of the model reached its highest value when the range of the NAM was set to 16 × 16 pixels, indicating that the model can most effectively capture local features in images while maintaining good sensitivity to global contextual information at this range, so as to achieve the optimal segmentation performance. As the range of the NAM increased further, although the model could cover a wider range of contextual information, the mIoU did not continue to improve but instead showed a downward trend. This is probably because an excessively large range may reduce the sensitivity of the model to local details or increase the computational complexity, leading to performance weakening.
3.3. Migration Tests
Based on the test results and analyses in earlier sections, it can be concluded that the improved model proposed in this paper can deliver excellent performance in extracting winter wheat planting plots. However, due to differences in planting structure, texture features, and crop growth periods in remote sensing images across different regions, the segmentation performance of the model may be significantly affected in other areas. These differences could lead to a decline in the model’s generalization ability in various environments, making it particularly important to test the model’s performance in different regions to ensure its applicability and stability. In order to further evaluate the model’s generalization ability to different geographical areas, we selected some regions from Xingtai City, Hebei Province, China, as new validation regions. The climate conditions in Xingtai City are similar to those in Shijiazhuang City, both belonging to a temperate monsoon climate with cold winters, providing a suitable environment for the growth of winter wheat. For the selected regions in Xingtai City, the same method as the training set was used for sample extraction and data annotation, and a total of 84 villages were sampled, covering a total area of 285.4 km
2. The collected remote sensing images were treated by a series of preprocessing steps, including radiometric correction, atmospheric correction, orthorectified correction, and image fusion, to ensure the quality of the data. Subsequently, based on the results of field investigations, the classes of planting plots were determined and accurately annotated in the corresponding vector attribute table, which was then converted into raster data for further use. The geographical distribution of the selected validation regions is shown in
Figure 8.
The selected regions in Xingtai City were used to test the previously trained model and the results are summarized in
Table 5. It can be seen that the model exhibited a good generalization ability in these new regions. Specifically, in terms of winter wheat recognition, the precision, recall, F1-score, and IoU of the model reached 92.71%, 95.21%, 93.95%, and 88.58%, respectively. In particular, compared to the results in Shijiazhuang City, the difference in IoU was only 2.82%, indicating that our model has high consistency and stability in applications across different regions.
Notably, the class with the largest difference in IoU between the regions in Shijiazhuang City and Xingtai City is trees. Through in-depth analysis of the image data and field conditions, it was found that there is a significant temperature difference between Shijiazhuang and Xingtai in April, which leads to differences in the growth cycle of trees between the two cities. This biological difference is manifested as significant spectral and texture feature changes in remote sensing images, which affects the segmentation accuracy of the model for the class of trees.
4. Conclusions
In order to improve the semantic segmentation performance of winter wheat in remote sensing images, we proposed an improved model based on HRNet as the backbone network by incorporating an SDM and NAM as optimization modules. By introducing multi-confidence scale class representations through the SDM, we significantly enhanced the model’s pixel-level class perceptual ability, effectively reduced the interference from noise, and achieved more accurate extraction of pixel class relationships. The NAM utilizes the original pixel relationship class as a query to identify common features within its scope, in order to strengthen correct correlations and suppress erroneous correlations. On the test set, the model achieved an average intersection to union ratio (mIoU) of 80.51%, precision of 88.64%, recall of 89.14%, overall accuracy of 90.12% (OA), and F1 score of 88.89%. Compared to the existing models of U-Net, DeepLabv3+, Segformer, and PSPNet, the mIoU of our model was increased by 4.5, 3.3, 1.8, and 1.4 percentage points, respectively. In tests conducted in Xingtai, a region with varying spatial heterogeneity, the model achieved a precision of 92.71%, a recall of 95.21%, an F1 score of 93.95%, and an IoU of 88.58% for winter wheat recognition. The IoU difference compared to Shijiazhuang was 2.82%. This demonstrates that the model exhibits high consistency and stability when applied across different regions. It provides an effective tool and technical support for accurately measuring the area of winter wheat fields.
Although the model proposed in this paper has achieved significant performance improvements, there are still some limitations. First, the relatively high complexity of the model structure leads to a high demand for hardware resources, which may cause the problem of a long training time. Second, under the influence of environmental differences and differences in image acquisition time in different regions, the generalization ability of our model needs to be further enhanced.
In response to these limitations, our future research will focus on the following key directions:
- (1)
Model lightweighting: We will focus on developing lightweight neural network models by reducing model parameters through structural optimization and knowledge distillation techniques to lower the overall computational costs. In the meantime, we aim to maintain or even improve the segmentation accuracy to meet varying deployment requirements on edge devices.
- (2)
Dataset expansion and diversification: in order to enhance the generalization ability of our model, we will focus on constructing or expanding remote sensing image datasets by including images reflecting winter wheat growth under different geographical and climatic conditions, images reflecting different wheat growing stages from sowing to maturity, as well as images reflecting different planting patterns and influence from a range of pests and diseases.
- (3)
Innovation in optimization strategies and algorithms: we will focus on exploring new training strategies and optimization algorithms to further improve the training efficiency and performance of our model and reduce training time, while maintaining or enhancing model accuracy.
Through the above measures, we expect to maintain high precision and accuracy of our model while reducing resource consumption and improving its feasibility and applicability in practical applications. We hope to make a profound impact on the research field of remote sensing semantic segmentation and promote the development and application of related techniques, especially in precision agriculture and crop monitoring.
Author Contributions
C.W.: Writing—Original draft preparation; P.Z.: Methodology, Software; S.Y.: Data curation, Visualization; L.Z.: Writing—Reviewing and Editing. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Natural Science Foundation of Hebei Province of China, grant number F2022204004.
Data Availability Statement
Data are contained within the article; further inquiries may be directed to the corresponding author.
Acknowledgments
We are grateful to our colleagues at the Hebei Key Laboratory of Agricultural Big Data and National Engineering Research Center for Information Technology in Agriculture for their help and input, without which this study would not have been possible.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Li, F.; Ren, J.; Wu, S.; Zhao, H.; Zhang, N. Comparison of Regional Winter Wheat Mapping Results from Different Similarity Measurement Indicators of NDVI Time Series and Their Optimized Thresholds. Remote Sens. 2021, 13, 1162. [Google Scholar] [CrossRef]
- Yan, S.; Yao, X.; Zhu, D.; Liu, D.; Zhang, L.; Yu, G.; Gao, B.; Yang, J.; Yun, W. Large-scale crop mapping from multi-source optical satellite imageries using machine learning with discrete grids. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102485. [Google Scholar]
- Pittman, K.; Hansen, M.C.; Becker-Reshef, I.; Potapov, P.V.; Justice, C.O. Estimating global cropland extent with multi-year MODIS data. Remote Sens. 2010, 2, 1844–1863. [Google Scholar] [CrossRef]
- Coates, A.; Ng, A.Y. Learning feature representations with k-means. In Neural Networks: Tricks of the Trade: Second Edition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 561–580. [Google Scholar]
- Al-Amri, S.S.; Kalyankar, N.V. Image segmentation by using threshold techniques. arXiv 2010, arXiv:1005.4020. [Google Scholar]
- Al-Amri, S.S.; Kalyankar, N.V.; Khamitkar, S.D. Image segmentation by using edge detection. Int. J. Comput. Sci. Eng. 2010, 2, 804–807. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Fu, Y.; Zhang, X.; Wang, M. DSHNet: A Semantic Segmentation Model of Remote Sensing Images Based on Dual Stream Hybrid Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4164–4175. [Google Scholar]
- Wei, P.; Ye, H.C.; Qiao, S.T.; Liu, R.H.; Nie, C.J.; Zhang, B.R.; Song, L.J.; Huang, S.Y. Early Crop Mapping Based on Sentinel-2 Time-Series Data and the Random Forest Algorithm. Remote Sens. 2023, 15, 3212. [Google Scholar] [CrossRef]
- Chen, J.; Zhu, J.; Sun, G.; Li, J.; Deng, M. SMAF-Net: Sharing Multiscale Adversarial Feature for High-Resolution Remote Sensing Imagery Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1921–1925. [Google Scholar] [CrossRef]
- Zhang, G.; Jiang, W. Remote Sensing Image Semantic Segmentation Method Based on a Deep Convolutional Neural Network and Multiscale Feature Fusion. Int. J. Semant. Web Inf. Syst. 2023, 19, 1–16. [Google Scholar] [CrossRef]
- Gao, L.; Qian, Y.R.; Liu, H.; Zhong, X.W.; Xiao, Z.Q. SRANet: Semantic relation aware network for semantic segmentation of remote sensing images. J. Appl. Remote Sens. 2022, 16, 014515. [Google Scholar] [CrossRef]
- Yang, X.; Li, S.; Chen, Z.; Chanussot, J.; Jia, X.; Zhang, B.; Li, B.; Chen, P. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2021, 177, 238–262. [Google Scholar] [CrossRef]
- Long, J.; Li, M.; Wang, X.; Stein, A. Delineation of agricultural fields using multi-task BsiNet from high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102871. [Google Scholar] [CrossRef]
- Zhang, J.; Lin, S.; Ding, L.; Bruzzone, L. Multi-scale context aggregation for semantic segmentation of remote sensing images. Remote Sens. 2020, 12, 701. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, H.; Liu, J.; Zhao, X.; Lu, Y.; Qu, T.; Tian, H.; Su, J.; Luo, D.; Yang, Y. A Lightweight Winter Wheat Planting Area Extraction Model Based on Improved DeepLabv3+ and CBAM. Remote Sens. 2023, 15, 4156. [Google Scholar] [CrossRef]
- Zhang, F.; Chen, Y.; Li, Z.; Hong, Z.; Liu, J.; Ma, F.; Han, J.; Ding, E. Acfnet: Attentional class feature network for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6798–6807. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VI 16. Springer: Cham, Switzerland, 2020; pp. 173–190. [Google Scholar]
- Yu, C.; Wang, J.; Gao, C.; Yu, G.; Shen, C.; Sang, N. Context prior for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12416–12425. [Google Scholar]
- Ma, X.; Ma, M.; Hu, C.; Song, Z.; Zhao, Z.; Feng, T.; Zhang, W. Log-can: Local-global class-aware network for semantic segmentation of remote sensing images. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Ma, X.; Che, R.; Wang, X.; Ma, M.; Wu, S.; Feng, T.; Zhang, W. DOCNet: Dual-Domain Optimized Class-Aware Network for Remote Sensing Image Segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
- Liu, Y.; Shi, S.; Wang, J.; Zhong, Y. Seeing beyond the patch: Scale-adaptive semantic segmentation of high-resolution remote sensing imagery based on reinforcement learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 16868–16878. [Google Scholar]
- Peng, C.; Li, Y.; Jiao, L.; Chen, Y.; Shang, R. Densely based multi-scale and multi-modal fully convolutional networks for high-resolution remote-sensing image semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2612–2626. [Google Scholar] [CrossRef]
- Wang, Z.; Guo, J.X.; Huang, W.Z.; Zhang, S.W. High-resolution remote sensing image semantic segmentation based on a deep feature aggregation network. Meas. Sci. Technol. 2021, 32, 095002. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).