Semi-/Weakly-Supervised Semantic Segmentation Method and Its Application for Coastal Aquaculture Areas Based on Multi-Source Remote Sensing Images—Taking the Fujian Coastal Area (Mainly Sanduo) as an Example

: Coastal aquaculture areas are some of the main areas to obtain marine ﬁshery resources and are vulnerable to storm-tide disasters. Obtaining the information of coastal aquaculture areas quickly and accurately is important for the scientiﬁc management and planning of aquaculture resources. Recently, deep neural networks have been widely used in remote sensing to deal with many problems, such as scene classiﬁcation and object detection, and there are many data sources with different spatial resolutions and different uses with the development of remote sensing technology. Thus, using deep learning networks to extract coastal aquaculture areas often encounters the following problems: (1) the difﬁculty in labeling; (2) the poor robustness of the model; (3) the spatial resolution of the image to be processed is inconsistent with that of the existing samples. In order to ﬁx these problems, this paper proposes a novel semi-/weakly-supervised method, the semi-/weakly-supervised semantic segmentation network (Semi-SSN), and adopts 3 data sources: GaoFen-2 image, GaoFen-1(PMS)image, and GanFen-1(WFV)image with a 0.8 m, 2 m, and 16 m spatial resolution, respectively, and through experiments, we analyze the extraction effect of the model comprehensively. After comparing with other the-state-of-art methods and verifying on an open remote sensing dataset, we take the Fujian coastal area (mainly Sanduo) as the experimental area and employ our method to detect the effect of storm-tide disasters on coastal aquaculture areas, monitor the production, and make the distribution map of coastal aquaculture areas.


Introduction
Recent successful advances of deep learning make it become an increasingly popular choice in many fields of application. Following this wave of success and due to the increased availability of data and computational resources, the usage of deep learning in remote sensing is finally taking off in remote sensing as well. Coastal aquaculture areas, as a typical area for remote sensing, are vulnerable to storm-tide disasters, and are important for the government's scientific management and planning of aquaculture resources. In order to obtain the information of aquaculture areas, there are more and more researchers paying attention to using remote sensing technology and machine learning, and a series of research works has ensued [1][2][3][4][5][6][7]. At present, researchers use expert experience [8][9][10], characteristic learning [11][12][13][14], threshold segmentation [15,16], and semantic segmentation networks [6] to extract aquaculture areas, and the practice has proven that these methods work well in this field. Reference [6] adopted a semantic segmentation network based on hybrid dilated convolution (HDC) [17] to extract aquaculture areas and summed up its four improvements compared to traditional machine learning: (1) the extraction results have clearer boundaries; (2) attenuation of the impact of sediments in seawater on the extraction results; (3) avoiding the influence of ships and other floatage; (4) avoiding the misidentification of the internal clearance of the cage culture area.
Although CNN-based approaches have achieved astonishing performance, they require an enormous amount of training data, and the robustness of the model is too poor to be applied to more scenarios. Different from image classification and object detection, semantic segmentation requires accurate per-pixel annotations for each of the training data, which can have a considerable expense and time. To ease the effort of acquiring high-quality labeled data, semi-supervised methods and weakly-supervised methods [18][19][20][21][22][23][24][25] have been applied to the task of semantic segmentation, which is significant for the application of deep learning in remote sensing. They are both incomplete supervised learning based on a small amount of labeled training samples, but weakly-supervised methods requires a lower quality of labeled training samples that cannot be exactly the same as the test and validation samples. Meanwhile, the emergence of the generative adversarial network (GAN) [26] makes semi-/weakly-supervised semantic segmentation more possible, and there are many semi-/weakly-supervised semantic segmentation networks based on GANs. The conditional GAN [27] has a further improvement based on GAN, which feeds y into both the discriminator and generator as an additional input layer, in order to make the generator able to generate samples related to y.
In this paper, we construct our network, the semi-/weakly-supervised semantic segmentation network (Semi-SSN) based on conditional generative adversarial nets (CGANs) and, through a self-training method, generate the pseudo-labels of unlabeled data by the generator to achieve semi-/weakly-supervised learning. Then, we employ Semi-SSN to extract aquaculture in a semi-supervised manner in GF-2 images, make comparative experiments with other state-of-the-art methods, and explore the scientific quality and practicability of our method based on an open remote sensing dataset. Besides, in remote sensing, there are many data sources with different spatial resolutions, and different spatial resolutions of remote sensing images serve different purposes. Ten meter level image remote sensing usually can be used to obtain information in a large area, because of the larger breadth, which is good for mapping the distribution map of coastal aquaculture areas. Meter level and sub-meter level remote sensing images are convenient to obtain the spatial distribution and capture more accurate information, so they are suitable for change detection such as disaster emergency response, production monitoring, etc. However, in practice, the resolution of the image to be processed can be inconsistent with that of the existing samples. Therefore, we employed Semi-SSN to extract aquaculture areas in a weakly-supervised manner with different spatial resolution remote sensing images. Taking the Fujian coastal area (mainly Sanduo) as the experimental area, we explore the application effect of Semi-SSN in different scenarios.
In short, we propose a novel method, Semi-SSN, based on conditional adversarial learning to extract aquaculture areas, in order to deal with the following problems: (1) the difficulty in labeling; (2) the poor robustness of the model; (3) the spatial resolution of the image to be processed is inconsistent with that of the existing samples. After comparing with other state-of-the-art methods and verifying on an open remote sensing dataset, we take the Fujian coastal area (mainly Sanduo) as an example and use our method to carry out disaster emergency response, production monitoring, and map making.

Related Work
In 2014, Goodfellow et al. [26] first proposed the generative adversarial network (GAN), which has been widely used in object detection [27], semantic segmentation [27,28], etc. The GAN (Figure 1) is composed of two neural networks: the generator G and the discriminator D. The generator trains with the objective to maximize the probability of the discriminator making mistakes, i.e., building a mapping function from a generator distribution p g to the data space as G z; θ g (where θ g are the parameters of the generator). The discriminator D(x; θ d ) (where θ d are the parameters of the discriminator) aims to estimate the probability that the sample came from the training data rather than the generator distribution p g . Both networks are trained simultaneously with value function V(G, D): Based on its theoretical foundation, M. Mirza et al. [29] proposed conditional generative adversarial nets (CGANs) to make improvements (Figure 2), which makes the generator and discriminator conditioned on some extra information y. Both the prior noise distribution p z (z) and y are input into the generator, and in the discriminator, x and y are treated as inputs. The value function would be as follows: Then, the methodology of conditional adversarial learning was applied to a wide range of discrete labels [30,31], tackling prediction from a normal map [32], future frame prediction [33], semantic segmentation [34,35], and image generation from sparse annotations [36,37].
The semantic segmentation network [38][39][40][41][42] is a method of interpreting images at the pixel level, which requires an enormous amount of labeled data and has a considerable expense. Pinheiro and Collobert [18] and Pathak et al. [43] employed MIL to generate labels for supervised training. Hong et al. [44] used image level supervised images and a few fully annotated images to train their semantic segmentation network. In order to ease the effort of acquiring high-quality labeled data, semi-supervised semantic segmentation is imperative, and the emergence of GAN makes it more possible.
Recently, with the rise of the GAN and its improved network, adversarial learning has been widely used in semantic segmentation. Pauline et al. [28] trained a convolutional semantic segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network to detect and correct higher-order inconsistencies between ground truth and the map generated by the segmentation net. Huaqing Liu et al. [45] proposed semi-cGAN, based on CGAN, to segment lumbosacral structures on thin-layer computed tomography with a few labeled data. Souly et al. [46] leveraged a massive amount of available unlabeled or weakly labeled data and non-real images created through the GAN to achieve semisupervised learning, and subsequently, Hung W C [47] made improvements based on it. Konstantinos et al. [48] adapted source domain images to the target domain based on a GAN-based method, which outperformed many unsupervised domain adaptation scenarios, and produced plausible samples. Xue et al. [49] proposed a novel semantic segmentation network, named SegAN, used a fully convolutional neural network as the segmenter, and adopted a multi-scale L1 loss function to train the critic and segmenter. Cherian et al. [50] presented a semantically-consistent GAN framework based on Cycle-GAN, dubbed Sem-GAN, which improved the quality of the translated images significantly.
Based on the methodology of CGAN and previous research, this paper proposes a novel network, Semi-SSN, that introduces conditional adversarial learning into the semantic segmentation network to realize semi-/weakly-supervised learning. We adopt the confidence maps generated by the discriminator and the predicted maps generated by the generator of the unlabeled data to produce the pseudo-labels for the model's training.

Network and Algorithm
The self-training method in semi-/weakly-supervised learning means that the classifier can be used to generate pseudo-labels of unlabeled data, after it is sufficiently trained on labeled data. If we take the confident predictions and assume that they are correct, we can add the unlabeled data with pseudo-labels into the training. If the noise in the pseudo-labels is sufficiently low, the model can benefit from the additional training data to obtain improved accuracy.
This paper proposed a self-training semi-supervised semantic segmentation method, which is divided into two processes: (1) using labeled data to train the classifier; (2) obtaining pseudo-labels of unlabeled data based on the classifier and then further training the classifier. At the same time, this paper introduced adversarial loss into the network, which not only improves the accuracy of semantic segmentation, but also reduces the noise of the pseudo-labels, thereby improving the accuracy of the entire model. The specific algorithm and network architecture are as follows.

Network Architecture
Based on the methodology of conditional adversarial learning, we propose a semi-/weakly-supervised semantic segmentation network (Semi-SSN), as shown in Figure 3. In this framework, we made the generator-classifier and discriminator of the GAN as a kind of semantic segmentation network; the generator-classifier, i.e., S(·), generates the prediction map of the labeled image X or unlabeled imageX, and the latter, i.e., D(·), takes [X, S(X)], or [X, Y], or [X, S(X)] as the input and outputs a confidence map, which infers the regions where the prediction results are close enough to the ground truth distribution.

Generator-classifier:
According to the training tips proposed by DCGAN [51], we made some improvements to SegNet to obtain the generator-classifier, i.e., the baseline model in this paper: (1) Use Leaky-ReLU activation for all layers except for the output, which uses Softmax.
(2) Replace deterministic spatial pooling functions (such as max pooling) with strided convolutions. (3) Replace upsampling with deconvolutions (the difference between the two methods is that the latter's parameters can be learned in the training process). (4) Use batch normalization on all layers except for the output layer.

Discriminator
We chose a simple dilation convolution network, the context network as discriminator (Table 1), and we used Leaky-ReLU activation for all layers.

Algorithm
At the start, we trained our model based on labeled datasets. Given an input image X of size H × W × channels, where channels is the number of bands, with its label map Y of size H × W × C, where C is the category number, we could obtain the predicted probability map S(X) and confidence map D(X, S(X)) or D(X, Y).
After training the discriminator k times, fix the discriminator, and update the generatorclassifier by minimizing L class , which is combined by the generator-classifier's cross-entropy loss L seg and the adversarial loss L adv .
where λ adv ≤ 1 is a constant for balancing the multi-task training.
Iterate the above steps n times. After obtaining the trained generator-classifier and discriminator based on labeled data, we trained in a semi-/weakly-supervised manner the generator-classifier based on unlabeled data. Given an input imageX of size H × W × C, where channels is the number of bands, we utilized the prediction map S(X) and the confidence map D(X, S(X)) generated by the trained discriminator to generate the fake label mapŶ.Ŷ = OneHotEncode I D(X, S(X)) > T semi · arg max(S(X)) , where the threshold, T semi , is equal to the validation accuracy of the generator-classifier that was trained by L class and labeled data and I(·) is the indicator function, OneHotEncode(·) indicates one-hot encoding of the vector, i.e., when D(X, S(X)) (h,w) is greater than T semi , S(X) (h,w) can be regarded as a true value ofX (h,w) . The resulting semi-/weakly-supervised loss is defined by: Then, we trained the generator-classifier by minimizing L semi−class , which combines semi-/weakly-supervised loss L semi and the adversarial loss L adv .
where λ adv , λ semi ≤ 1 is a constant for balancing the multi-task training. And the specific details of training are in Algorithm 1.

Algorithm 1.
Minibatch stochastic gradient descent training of our model. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k = 1, the least expensive option, in our experiments.
For number of supervised training iterations do For k steps do • Select a minibatch with {x 1 , x 2 , . . . , x m } with ground truth {y 1 , y 2 , . . . , y m } from the labeled training set. • Obtain the probability maps {S(x 1 ), S(x 2 ), . . . , S(x m )} • Update the discriminator by ascending its stochastic gradient:

End For
• Select a minibatch with {x 1 , x 2 , . . . , x m } with ground truth {y 1 , y 2 , . . . , y m } from labeled training set. • Update the generator by ascending its stochastic gradient: End For For number of semi-/weakly-supervised training iterations do • Select a minibatch with {x 1 ,x 2 , . . . ,x m } from the unlabeled training set.

Data Source
There are two mainly types of coastal aquaculture areas: cage culture area and raft culture area (Figure 4). Cage culture areas have small grid cages that can be clearly observed in high-resolution remote sensing images. They are mainly made of plastic material and appear as uneven rectangles on the image. The raft culture area is usually arranged together and appears as a dark and more uniform rectangle on the image, and there are small light points on the edge of the raft culture area in optical images.
This paper employed four GF-2 images, three GF-1(PMS) images, and three GF-1(WFV) images (Table 2) to explore the scientific quality and practicality of our method, Semi-SSN. The technical indicators of the GF-2, GF-1(PMS), and GF-1(WFV) images are as shown in Table 3.

Results
There were two experiments. In one experiment, we used Semi-SSN to extract coastal aquaculture areas in GF-2 images based on semi-supervised method with different labeled GF-2 data amounts. In the other experiment, we employed Semi-SSN to extract coastal aquaculture areas in higher (lower) spatial resolution remote sensing images based on weakly-supervised method with lower (higher) spatial resolution remote sensing images.

Sample Construction
In deep learning, the dataset mainly includes the training set, validation set, and test set. The training set is used to train the weight parameters of the model. Neither the validation set, nor the test set is involved in the training of the model. The former is used to adjust the hyperparameters of the model, preliminarily to evaluate the prediction ability of the model, and to prevent over-fitting during training. The latter is used to evaluate the robustness and generalization ability of the model.
According to Table 2, we chose two GF-2 images, one GF-1(PMS) image, and one GF-1(WFV) image for sample construction. We selected four areas from GF-2 images with a size of 5000 × 5000 pixels, an area from GF-1(PMS) images with a size of 2000 × 2000 pixels, and an area from GF-1(WFV) images with a size of 2000 × 2000 pixels for category marking. We randomly cut these images and labels for a size of 128 × 128 pixels. Then, according to Tables 4 and 5, we constructed training set and validation set, respectively.   Then, we chose a test area in the GF-2 image with a size of 2000 × 2000 pixels, a test area in the GF-1(PMS) image with a size of 1000 × 1000 pixels, and a test area in the GF-1(WFV) image with a size of 1000 × 1000 pixels, as shown in Figure 5, to further verify the effect of the model.

Implementation Details
We used the mean intersection over union (mIoU) as the evaluation function, as Equation (11).
where TP is the pixel number that is correctly extracted as aquaculture areas, FP is pixel number that is mistakenly extracted as aquaculture areas, and FN is the pixel number that is not extracted, but is aquaculture areas in reality. N is the number of categories. IoU i is the intersection over union of the ith class. We implemented our network using the Keras and TensorFlow framework. We trained our network on eight GPUs with 12 GB memory. We used Adam [52] as the optimizer, where the weight decay was 10 −4 and the initial learning rate was set as 10 −7 . In our experiment, T semi is a dynamic value that changes with the validation accuracy (i.e., mIoU on the validation set) of each iteration, as Equation (12). λ semi is determined according to the labeled data amount, as Equation (13). We set λ adv as 0.04 (the detail is given in Section 4.1).

Semi-Supervised Experiment
In the experiment, we randomly divided the GF-2 sample as Table 4, calculated the mIoU on the validation set and test set (Table 6), and obtained the extracted result of the test area ( Figure 6).

Weakly-Supervised Experiment
Another currently hot issue is how to use a small amount of existing labeled samples to extract aquaculture areas in different spatial resolution remote sensing images and comprehensively analyze practical problems according to the data from various sources based on the weakly-supervised semantic segmentation method.
In this experiment, we used Semi-SSN to extract coastal aquaculture areas in lower (higher) spatial resolution images based on weakly-supervised method with a small amount of labeled higher (lower) spatial resolution data with unlabeled lower (higher) data (Table 5), obtained the extraction results of unlabeled data source (Figure 7), and calculated mIoU on the validation set and test set (Table 7).

Hyperparameter Selection
The hyperparameters are mainly determined according to the performance of the model in the validation set. There are three hyperparameters: λ adv and λ semi are used to balance the multi-task learning, and T semi is used to control the sensitivity in the semi-/weakly-supervised learning described in Equation (7). In our experiment, T semi is a dynamic value that changes with the validation accuracy of each iteration, as Equation (12). λ semi is determined according to the labeled data amount, as Equation (13). We used GF-2 labeled data to evaluate the effect on λ adv , as shown in Table 8, and chose final values as 0.04.

Semi-Supervised Results
It can be seen ( Table 6) that the adversarial loss and unlabeled data could improve the mIoU to different degrees; the former can help improve the validation accuracy by 1.9-4.7% and improve the test accuracy by 2.1-8.0%, and the latter can help improve the validation accuracy by 4.8-9.2% and improve the test accuracy by 5.7-12.2%, compared to the baselines. The extraction results have a clear boundary with few fragments (Figure 6), especially the addition of adversarial loss, which makes the model more sensitive to structure information, which greatly avoids the misidentification of floatage on the water surface. In order to facilitate observation, we zoom in on local details (Figure 8). It can be seen that Semi-SSN is convenient to filter out some impurities such as floatage on the water surface after adding adversarial loss.

Weakly-Supervised Results
From the overall point of view, using the higher spatial resolution labeled data to extract the coastal aquaculture areas in the lower spatial resolution image, both validation accuracy and test accuracy can reach about 80%. The extraction result (Figure 7g,k,l) has a clear boundary, which can be directly put into practical applications.
While using lower spatial resolution labeled data to extract aquaculture areas in higher spatial resolution images, the situation is relatively complicated, and the common point is that the accuracy is not enough to meet the needs of practical applications. Using labeled GF-1(PMS) data to achieve extraction in a GF-2 image cannot extract the aquaculture areas precisely (Figure 7c), but the extraction result can simply reflect the spatial distribution of the aquaculture areas, which means it can be used for rough analysis and extraction. However, using labeled GF-1(WFV) data to extract the aquaculture area in a GF-1(PMS) or GF-2 image (Figure 7d,h), the raft culture area is easily confused with seawater, and the cage area culture cannot be extracted completely.
In short, Semi-SSN is conducive to using the existing labeled data to extract aquaculture areas in different spatial resolution remote sensing images and can help reduce the time cost of labeling. According to our method, labeled higher spatial resolution samples can be used to extract aquaculture areas in lower spatial resolution images, and partially labeled lower spatial resolution data (GF-1(PMS)) can be conducive to roughly extracting the distribution of aquaculture areas in higher spatial resolution images (GF-2). However, in general, using labeled lower spatial resolution samples to extract aquaculture areas in higher spatial resolution images cannot obtain ideal results.

Comparison with Other Methods
We made comparative experiments with FCN8s, UNet, SegNet, and HDCUNet [6] based on 8000 labeled GF-2 samples ( Table 9). Note that the validation accuracy and test accuracy of our baseline model are slightly inferior to HDCUNet, but they are effectively improved with the addition of the adversarial loss. Besides, it can be seen (Tables 6 and 9) that our method can reach the approximate effect of FCN8s, UNet, and SegNet based on a smaller amount (1/2, 1/4, and 1/2) of labeled samples, after adding L adv and L semi to our baseline. Tong et al. [53] constructed a large-scale land cover dataset with GaoFen-2 (GF-2) satellite images, named GID, and it contained two sub-datasets: a large-scale classification set (LCS) and a fine land cover classification set (FLCS), which is provided online at http://captain.whu.edu.cn/GID/ (accessed on 1 November 2020). The LCS contains 150 pixel level annotated GF-2 images (the training set contains 120 images; the validation set contains 30 images), and the FLCS is composed of 30,000 multi-scale image patches (training set) coupled with 10 pixel level annotated GF-2 images (validation set).
In this paper, we pretrained our baseline based on LCS and generated coarse pixel level labels based on the patch level labels of the FLCS training set. Then, we used the FLCS to explore the effectiveness of our method, trained Semi-SSN based on labeled samples and unlabeled samples with different ratios, and made a comparative experiment with Tong et al. [53] Given that Tang et al. [53] used overall accuracy (OA) as the evaluation criterion (as Equation (14)), we made a comparison based on this (Table 10).  Table 10 shows the evaluation results on the FLCS dataset; the OAof our baseline model trained fully labeled data can reach above 69%, and OA can exceed that in Tong et al. [53] after adding L adv . The adversarial loss brings consistent performance improvement (2.2-4.8%) over different amounts of training data, and incorporating the proposed semi-supervised learning scheme brings overall a 6.1-13.8% improvement.

Disaster Emergency Response
On 11 July 2018, typhoon "Maria", the eighth super typhoon, landed on Huangqi Peninsula, Lianjiang county, Fujian Province, with a local maximum wind level of 14, accompanied by heavy rain, and Sanduo suffered the most serious damage. Based on two GF-1(PMS) images, the disaster situation of the area is discussed as shown in Figure 9a,b. We used Semi-SSN to extract coastal aquaculture areas based on the existing labeled GF-2 samples and unlabeled samples collected from these two images (Figure 9a,b) and detected the changes of the intersecting parts (3500 × 3500 pixels).
We found that typhoon "Maria" had a relatively small impact on the cage culture area (Figure 9e), while the raft culture area (Figure 9h) had a large area loss. Besides, for the incremental part after the typhoon, the patch-like increase was likely caused by the broken aquaculture areas being blown away by the typhoon, and the more complete increase may be the newly added aquaculture areas for subsequent reconstruction after the disaster. According to the statistics, the raft culture area after the typhoon was reduced by about 174,664 m 2 , and about 16,388 m 2 were added; and the cage culture area was reduced by about 5992 m 2 and about 4108 m 2 added.
What is more, from 11 to 13 July 2018, we followed the field investigation team from the National Marine Hazard Mitigation Service (NMHMS) who went to the Fujian coastal areas to investigate the impact and destruction of the disaster. As shown in Figure 10, the raft culture area in Sanduo was seriously damaged, and the aquaculture facilities were destroyed and scattered by storm surges and coastal waves, which further supports the results of this experiment. Through field investigation, the main reason for the loss of the raft culture area was that it was easily pushed away by storm surge and nearshore waves, which usually causes a large area of damage. However, for the cage culture area, the construction was relatively stable, which was not easily destroyed; therefore, the specific damage to the species cultured in the cage culture area caused by the typhoon is not easy to directly measure through image interpretation.

Production Monitoring
Aquaculture production activities are mainly carried out from May to September every year. The number and density of aquaculture areas are very important for production quality and water environment, so it is necessary to monitor the production activities. After an investigation, there was no super typhoon passing through the area from 22 June to 19 September in 2016, so we used the intersection region (10,000 × 10,000 pixels) of these two GF-2 images (Figure 11a,b) to discuss the area change of the two kinds of aquaculture areas in the peak season.
We used Semi-SSN to extract coastal aquaculture areas based on the existing labeled GF-2 samples and unlabeled samples collected from these two images (Figure 11a,b) and compared the changes of the intersecting parts.
It can be seen intuitively that the increase and decrease of the raft culture area (Figure 11h) is obvious, but the change of the cage culture area (Figure 11e) is not obvious. Through the investigation, the cage culture area can be used for two to three years, in general, and does not need to be redeployed every year, so it has a lower probability of a large number of changes. From the extraction results, it is found that the cage culture area did not change significantly in the peak season of 2016, which shows that this year is very likely not the year when the cage culture area needed to be replaced and redeployed, while the raft culture area needs to be redeployed every year, and its significant increase or decrease indicates that this kind of aquiculture had been actively carried out during this period. However, at the same time, the density of the raft culture area increased significantly, which is very important to the yield and biological environment. Therefore, it is necessary to monitor and control it in time. According to the statistics, in the peak season, the raft culture area was reduced by about 50,806 m 2 and 72,485 m 2 added, and the cage culture area was reduced by about 5045 m 2 and about 4733 m 2 added.

Map Making
We chose two GF-1(WFV) images covering Sanduo and carried out image fusion to avoid the interference of clouds; the detail information is shown in Table 2. Then, we used Semi-SSN to extract coastal aquaculture areas based on the existing labeled GF-2 samples and unlabeled samples collected from these two images and made the distribution map of coastal aquaculture areas in Sanduo, as shown in Figure 12. According to the statistics, in 2016, the raft culture area was 911,872 m 2 , and the cage culture area was 810,496 m 2 .

Conclusions
In this work, we proposed a semi-/weakly-supervised semantic segmentation network (Semi-SSN) based on conditional adversarial learning for extracting aquaculture areas, and after experiments and analysis, we drew the following conclusions:

1.
For semi-supervised extraction in GF-2 images, both adversarial loss and unlabeled samples are conducive to improving the validation accuracy and test accuracy, and especially the former can make the model more sensitive to structure information; 2.
For multi-scale spatial resolution remote sensing images, labeled higher spatial resolution samples are conducive to extracting aquaculture areas in lower spatial resolution images, but not vice versa; 3.
Through experiments, Semi-SSN can reach the approximate effect of other state-of-theart methods based on relatively fewer labeled samples and adversarial loss and perform better in an open remote sensing dataset (FLCS) than Tong et al.'s method [53]; 4.
Applying Semi-SSN to detect the change before and after typhoon "Maria", the raft culture area was more vulnerable than the cage culture area in this disaster.

5.
Employing Semi-SSN to monitor the production, in the peak season (2016), the distribution density of the raft culture area increased significantly during this period, while the cage culture area did not change significantly. 6.
Making the distribution map of coastal aquaculture areas in 2016, the raft culture area was 911,872 m 2 , and the cage culture area was 810,496 m 2 .
In short, Semi-SSN is convenient for practical application and provides a new paradigm for solving the following problems: (1) the difficulty in labeling; (2) the poor robustness of the model; (3) the spatial resolution of the image to be processed is inconsistent with that of the existing samples. In the future, we will be devoted to improving the transfer capacity and robustness of our model, which is the general dilemma of deep learning models in dealing with remote sensing problems, and further explore the performance of our model on different spectral resolutions, radiometric resolutions, radar data, and different remote sensing tasks, to make it more suitable for the actual needs.