NDFTC: A New Detection Framework of Tropical Cyclones from Meteorological Satellite Images with Deep Transfer Learning

: Accurate detection of tropical cyclones (TCs) is important to prevent and mitigate natural disasters associated with TCs. Deep transfer learning methods have advantages in detection tasks, because they can further improve the stability and accuracy of the detection model. Therefore, on the basis of deep transfer learning, we propose a new detection framework of tropical cyclones (NDFTC) from meteorological satellite images by combining the deep convolutional generative ad ‐ versarial networks (DCGAN) and You Only Look Once (YOLO) v3 model. The algorithm process of NDFTC consists of three major steps: data augmentation, a pre ‐ training phase, and transfer learn ‐ ing. First, to improve the utilization of finite data, DCGAN is used as the data augmentation method to generate images simulated to TCs. Second, to extract the salient characteristics of TCs, the gener ‐ ated images obtained from DCGAN are inputted into the detection model YOLOv3 in the pre ‐ train ‐ ing phase. Furthermore, based on the network ‐ based deep transfer learning method, we train the detection model with real images of TCs and its initial weights are transferred from the YOLOv3 trained with generated images. Training with real images helps to extract universal characteristics of TCs and using transferred weights as initial weights can improve the stability and accuracy of the model. The experimental results show that the NDFTC has a better performance, with an accu ‐ racy (ACC) of 97.78% and average precision (AP) of 81.39%, in comparison to the YOLOv3, with an ACC of 93.96% and AP of 80.64%.


Introduction
A tropical cyclone (TC) is a kind of catastrophic weather system with enormous destructive force [1,2]. TCs encompass hurricanes, typhoons, and cyclone equivalents, and they pose a serious threat to the safety of people's lives and property and cause huge losses to agricultural production and transportation [3][4][5][6][7]. Therefore, accurate detection of TCs is the key to reducing the hazards [8,9].
Traditionally, the mainstream detection methods for TCs are numerical weather prediction (NWP) models, which have done a great deal of work in the development of a forecast system to provide guidance for TC prediction based on physics parameterizations and modeling techniques [10,11]. For example, the Met Office has been objectively providing real-time guidance for TC prediction and detection using its global numerical weather forecast model in recent years [12]. However, the predicted error increases because of the initial value dependency if numerical dynamical models try to simulate farther into the future [13].
The significant advantage of machine learning (ML) methods over traditional detection methods based on NWP is that ML methods do not require any assumption [14]. Decision trees (DT) are trained to classify different levels of TCs and the accuracy of TC prediction prior to 24 h was about 84.6% [15]. In addition, a convective initiation algorithm was developed from the Communication, Ocean, and Meteorological Satellite Meteorological Imager based on the DT, random forest (RF), and support vector machines (SVM) [16,17].
Recently, deep learning models, as a subset of ML methods, have had good performance in detection tasks [18][19][20][21]. For the detection task in images, object detection models based on deep learning are mainly divided into two streams based on different processing stages, which are one-stage detection models and two-stage detection models. YOLO series [22][23][24], SSD [25], and RetinaNet [26] are typical one-stage detection models, and R-CNN [27], Fast R-CNN [28], and Faster R-CNN [29] are classic two-stage detection models. Broadly speaking, two-stage detection models obtain high accuracy by region proposal with large-scale computing resources, whereas one-stage detection models have better performance with finite computing resources.
Additionally, deep learning models have been introduced in TC detection as well, for example, the use of deep neural networks (DNN) for existing TC detection [30], precursor detection of TCs [31], tropical and extratropical cyclone detection [32], TC track forecasting [33], and TC precursor detection by a cloud-resolving global nonhydrostatic atmospheric model [34]. However, deep learning models usually require a large number of training samples, because it is difficult to achieve high accuracy in case of finite training samples in computer vision and other fields [35][36][37]. At this time, transfer learning can effectively alleviate this problem by transferring the knowledge from the source domain to the target domain, and further improve the accuracy of deep learning models [38][39][40][41].
Deep transfer learning studies how to make use of knowledge transferred from other fields by DNN [42]. On the basis of different kinds of transfer techniques, there are four main categories: instance-based deep transfer learning, mapping-based deep transfer learning, network-based deep transfer learning, and adversarial-based deep transfer learning [42][43][44][45][46]. Instance-based deep transfer learning refers to selecting partial instances from the source domain to the training set in the target domain [43]. Mapping-based deep transfer learning refers to mapping partial instances from the source domain and target domain into a new data space [44]. Network-based deep transfer learning refers to reusing the partial network and connection parameters in the source domain and transferring it to be a part of DNN used in the target domain [45]. Adversarial-based deep transfer learning refers to introducing adversarial technologies such as generative adversarial nets (GAN) to find transferable formulations that apply to both the source domain and the target domain [46]. It is also worth noting that GAN has advantages in image processing and few-shot learning [47][48][49].
In order to improve the accuracy of a TC detection model in case of finite training samples, on the basis of deep transfer learning, we propose a new detection framework of tropical cyclones (NDFTC) from meteorological satellite images by combining the deep convolutional generative adversarial networks (DCGAN) and You Only Look Once (YOLO) v3 model.
The main contributions of this paper are as follows: (1) In view of the finite data volume and complex backgrounds encountered in meteorological satellite images, a new detection framework of tropical cyclones (NDFTC) is proposed for accurate TC detection. The algorithm process of NDFTC consists of three major steps: data augmentation, a pre-training phase, and transfer learning, which ensures the effectiveness of detecting different kinds of TCs in complex backgrounds with finite data volume. (2) We used DCGAN as the data augmentation method instead of traditional data augmentation methods such as flip and crop. DCGAN can generate images simulated to TCs by learning the salient characteristics of TCs, which improves the utilization of finite data. (3) We used the YOLOv3 model as the detection model in the pre-training phase. The detection model is trained with the generated images obtained from DCGAN, which can help the model to learn the salient characteristics of TCs. (4) In the transfer learning phase, YOLOv3 is still the detection model, and it is trained with real TC images. Most importantly, the initial weights of the model are weights transferred from the model trained with generated images, which is a typically network-based deep transfer learning method. After that, the detection model can extract universal characteristics from real images of TCs and obtain a high accuracy.

Materials and Methods
The flowchart of the NDFTC in this paper is illustrated in Figure 1. The framework can be summarized in the following steps: (1) a dataset based on meteorological satellite images of TCs is created; (2) the dataset is divided into three sub-datasets, which are training dataset 1, training dataset 2, and test dataset; (3) DCGAN is used as the data augmentation method to generate images simulated to TCs; (4) the generated images obtained from DCGAN are inputted into the detection model YOLOv3 in the pre-training phase; and (5) the detection model is trained with real images of TCs and its initial weights are transferred from the YOLOv3 trained with generated images.

Deep Convolutional Generative Adversarial Networks
As one of the research hotspots of artificial intelligence, generative adversarial networks (GAN) have developed rapidly in recent years and are widely used in image generation [50], image repair [51], visual prediction of typhoon clouds [52], and other fields.
GAN contains a generator and a discriminator [50]. The purpose of the generator is to make the discriminator unable to distinguish between the real images and generated images, whereas the purpose of the discriminator is to distinguish between real and generated images as much as possible. For the generator, an n-dimensional vector is required for input and the output is an image. The generator can be any model that can produce images, such as the simple fully connected neural network. For the discriminator, the input is a picture, and the output is the label of the picture. Similarly, the discriminator structure is similar to the generator structure, such as a network that contains convolution, and so on.
Deep convolutional generative adversarial networks (DCGANs) are an improvement on the original GAN [53]. The improvement does not include strict mathematical proof and the main contents of the improvement are as follows. Both the generator and discriminator use convolutional neural networks (CNN). Batch normalization is used in both generators and discriminators. Neither the generator nor the discriminator uses the pooling layer. The generator uses ReLU as the activation function except tanh for the output layer. The discriminator retains the structure of CNN, and the generator replaces the convolution layer with fractionally strided convolution. All layers of the discriminator use Leaky ReLU as the activation function.

You Only Look Once (YOLO) v3 Model
The detection model of NDFTC is the YOLOv3 model [24]. The reason why YOLOv3 is used as the detection model is that the detection speed of YOLOv3 is at least 2 times faster than SSD, RetinaNet, and Faster R-CNN [24], which can realize real-time detection of TCs and provide guarantee for disaster prevention and mitigation of TCs. In addition, YOLOv3 refers to the idea of feature pyramid networks and it ensures accurate detection of both large-size and small-size objects.
The base network of the YOLOv3 is Darknet-53. Darknet-53 uses successive 3 × 3 and 1 × 1 convolutional layers. It has 53 convolutional layers in total, as shown in Figure 1, which is why it is called Darknet-53. In addition, a large number of residual blocks are added to Darknet-53 to prevent the exploding gradient problem from network layer deepening. In the model, batch normalization is placed before the activation function Leaky ReLU, which alleviates the gradient disappearance problem. It should be noted that the concat is not the numerical addition operation for different feature graphs, but rather a direct concatenation. This means that the feature map is concatenated directly according to the channel dimension.
As for the change in image size during TC detection, the input meteorological satellite images has a size of 512 × 512 pixels. The model outputs feature maps of three sizes. The first feature map is obtained by down-sampling 32 times, and the size is 16 × 16 pixels. The second feature map is obtained by down-sampling 16 times, and the size is 32 × 32 pixels. The third feature map is obtained by down-sampling 8 times, and the size is 64 × 64 pixels. The above down-sampling is done under the guidance of YOLOv3 model by Redmon et al., which is a uniform operation of YOLOv3 and aims to obtain TC features at different scales and thus improve the detection accuracy of different kinds of TCs. Besides, the third dimension of these three feature maps is 18. Because there are three anchor boxes and each box has 1-dimensional confidence values, 4-dimensional prediction values , , , ℎ , and 1-dimensional object class numbers, the final calculation formula is (3 × (4 + 1 + 1)) and the result is 18.
It is important to note that once the number of anchor boxes is determined, confidence values, prediction values, and object class numbers are also determined [23]. In general, an anchor box has 1-dimensional confidence values, because it is the IOU of the bounding box and the prediction box, reflecting the detection effect of this anchor box [22]. An anchor box has 4-dimensional prediction values, reflecting the coordinate information of the anchor box [22]. An anchor box has only 1-dimensional object class numbers, because our study only detects TC and not other objects.

Loss Function
The loss function is the error between the predicted value and the real value, which is one of the important parameters to determine the detection performance. The loss of the NDFTC includes the loss of DCGAN and the loss of YOLOv3.

Loss Function of DCGAN
The loss function of DCGAN includes the loss function of generator G and the loss function of discriminator D. When the generator is trained, parameters of the discriminator are fixed. When training the discriminator, parameters of the generator are fixed.
The purpose of the generator is to make the discriminator unable to distinguish between the real TC images and the generated TC images. First, the adversarial loss is introduced. G(X) represents the TC images generated by the generator, Y represents the real images corresponding to it, and D(•) represents the discriminant probability of the generated images. The adversarial loss is as follows: By minimizing Formula (1), the generator can fool the discriminator, which means that the discriminator cannot distinguish between real images and generated images. Next, the loss function is introduced to measure the distance between generated images and real images.
where , represents pixel coordinates, and and are the width and height of TC images, respectively.
The generator's total loss function is as follows: where and are empirical weight parameters. The generator can generate highquality images of TCs by minimizing Formula (3).
The purpose of the discriminator D is to distinguish between the real TC images and the generated TC images. To achieve this goal, the adversarial loss function of the discriminator is as follows: For Equation (4), if the real image is wrongly judged as the generated image, or the generated image is wrongly judged as the real image, then an infinite situation will appear in Formula (4), which means that the discriminator should still be optimized. If the value of Formula (4) decreases gradually, it means that the discriminator is trained better and better.

Loss Function of YOLOv3
The loss function of YOLOv3 includes boundary box loss, confidence loss, and classification loss. The smaller the loss value, the better the performance of the model. The parameters involved in the loss function are introduced below.
The model divides the input image into an grid. Each grid cell is responsible for detecting TCs if the center of a TC falls into a grid cell. The grid cell predicts B bounding boxes and confidence scores. These scores reflect how confident the model is that the box contains an object.
The first part of the total loss function is the boundary box loss, which is used to measure the difference between the real box and the predicted box, as follows: where is the number of bounding boxes, and , , , ℎ is the positional parameter of the predicted box. and represent the center point coordinates of the predicted box, and and ℎ represent the width and height of the predicted box, respectively. Similarly, , , , ℎ is the parameter of the true box.
The second part of the total loss function is the confidence loss, which reflects how confident the model is that the box contains an object. The confidence loss is as follows: where represents the probability of the object in the anchor box i. ℎ ∈ 0,1 represents whether the object is present in the anchor box i, in which 1 means yes and 0 means no.
The third part of the total loss function is the classification loss as follows: where represents the probability of the object of class k in the anchor box i. ℎ ∈ 0,1 represents whether the object of class k is present in the anchor box i, in which 1 means yes and 0 means no. In this paper, there is only one kind of object, so 1. To sum up, the total loss function of the YOLOv3 model is as follows: (8) where , , and are empirical weight parameters, and 1 in this paper.

Algorithm Process
According to the above description, the specific algorithm process is shown as follows.

Data Set
The data set we used includes meteorological satellite observation images in the Southwest Pacific area from 1979 to 2019. These images, provided by the National Institute of Informatics, are meteorological satellite images with a size of 512 × 512 pixels. For more details on the meteorological satellite images we used in this study [54], see the website: http://agora.ex.nii.ac.jp/digital-typhoon/search_date.html.en#id2 (accessed on 29 March 2021).
In this paper, a total of 2400 real TC images were used. Among them, 600 real images were input into DCGAN model to produce 1440 generated images for training the detection model in the pre-training phase. Additionally, 80% of the remaining 1800 real TC images, which were from 1979 to 2011, were used to train the model. A total of 20% of the remaining 1800 real TC images, which were from 2011 to 2019, were used to test the model.
In other words, in the transfer learning phase, the selection rule for training and test data was based on the time when the TC was captured by the meteorological satellite. A total of 80% of the data used for training was historical data occurring from 1979 to 2011, whereas 20% of the data used for testing was recent data occurring from 2011 to 2019. Such a data selection method of training with historical data and testing with recent data is effective in the application of deep learning in meteorology [55], and thus we also adopted this data selection method.

Experiment Setup
In order to show the superiority of NDFTC in the training process and detection results, a TC detection model for comparison was also trained, which was only based on YOLOv3 and did not use NDFTC. In order to train and test this TC detection model for comparison, we still used 2400 real TC images, 80% of which were used for training and 20% for testing.
For the sake of fairness, the total number of training times for both NDFTC and YOLOv3 was 50,000. For the NDFTC, it used generated TC images to train 10,000 times, and then it used real TC images to train 40,000 times. For the detection model only based on YOLOv3, it was trained 50,000 times using real TC images. In the training process, the change of loss function values of NDFTC and detection model only based on YOLOv3 are shown in Figure 2.  In order to show the stability of NDFTC during the training process from another perspective, the changes of region average IOU are also visualized in Figure 3. Region average IOU is the intersection over union (IOU) between the predicted box and the ground truth [22]. It is one of the most important indicators to measure the stability of models in the training process, and is commonly found in deep learning models such as YOLOv1 [22], YOLOv2 [23], YOLOv3 [24], and YOLOv4 [56]. In general, the closer it is to 1, the better the model is trained. In Figure 3, the region average IOU of the models in the training process was generally decreasing. However, the region average IOU of YOLOv3 oscillated more sharply when the training reached a later stage. Compared with the TC detection model only including YOLOv3, the NDFTC oscillated less in the whole training process. This means that the NDFTC converged faster and was more stable in the training process.

Results and Discussion
In order to evaluate the detection effect of the NDFTC proposed in this paper, ACC and AP were used as evaluation indexes.
ACC refers to accuracy, which means the proportion of TCs detected correctly by the model in all images. The definition of ACC is as follows: (9) where TP refers to the number of TC images detected correctly by the model, and ALL refers to the number of all images.
AP refers to average precision, which takes into account cases such as detection error and detection omission phenomenon, and it is a common index for evaluating YOLO series models such as YOLOv1, YOLOv2, and YOLOv3 by Redmon et al. [22][23][24]. AP is defined by precision and recall: (11) where TP refers to the number of TCs correctly recognized as TCs by the detection model, FP refers to the number of other objects recognized as TCs by the detection model, and FN refers to the number of TCs recognized as other objects by the detection model [57,58]. Then the P-R curve can be obtained by using the recall of TCs as the x-coordinate and the precision of TCs as the y-coordinate [59], and the area under the curve is AP, which is the index that evaluates the detection effectiveness of the NFDTC. Figure 4 shows the ACC and AP of NDFTC and other models in the test set when the training times were 10,000, 20,000, 30,000, 40,000, and 50,000. Apparently, Figure 4 reflects that NDFTC performed better than YOLOv3 and other models with the same training times. Finally, the experimental results show that the NDFTC had better performance, with an ACC of 97.78% and AP of 81.39%, in comparison to the YOLOv3, with an ACC of 93.96% and AP of 80.64%.  In order to evaluate the detection effect on different kinds of TCs, all TCs in the test set were divided into five categories. According to the National Standard for Tropical Cyclone Grade (GB/T 19201-2006), TC intensity includes tropical storm (TS), severe tropical storm (STS), typhoon (TY), severe typhoon (STY), and super typhoon (SuperTY). The ACC performance of the NDFTC and other models on the test set is shown in Table 1. It shows that the NDFTC generally had a higher ACC. The best result was from NDFTC for Su-perTY detection, and at that time the ACC reached 98.59%. Next, the AP performance of the NDFTC and other models on the test set is shown in Table 2. It can be found that the NDFTC basically had a higher AP. The best result was from NDFTC for STY detection, which was 91.34%. Last but not least, an example of TC detection results is shown in Figure 5, which is the super typhoon Marcus in 2018. It can be found that the NDFTC had a more detailed detection result, because the prediction box of NDFTC fit Marcus better. More importantly, compared with the TC detection model only including YOLOv3, the detection result of NDFTC was more consistent with the physical characteristics of TCs, because the spiral rainbands at the bottom of Marcus were also included in the detection box of NDFTC.

Discussion
To begin with, the complexity of NDFTC is explained here. Compared to the complex network architecture and huge number of parameters of YOLOv3, the complexity of DCGAN, which is a relatively simple network, could be negligible [60]. Therefore, the complexity of the NDFTC in this paper was approximately equal to that of the YOLOv3 model, conditional on a finite data set and the same scale of computing resources. More importantly, compared with the YOLOv3 model, NDFTC further improved the detection accuracy of TCs with almost no increase in complexity, which proves that NDFTC ensures generalization performance.
Then, the way in which the generated and real images are used in different phases needs to be emphasized again. In 2020, Maryam Hammami et al. proposed a CycleGAN and YOLO combined model for data augmentation and used generated data and real data to train a YOLO detector, in which generated data and real data are simultaneously input into YOLO for training [61]. In our study, the detector was trained using only generated images in the pre-training phase and only real images in the transfer learning phase, which is a typically network-based deep transfer learning method. Additionally, the average IOU and loss function values during the training process are plotted in this paper to reflect the stability of NDFTC.
Furthermore, it is necessary to explain the proportion of the data set allocated. In NDFTC, the initial dataset is composed of meteorological satellite images of TCs, and when it is divided into training dataset 1, training dataset 2, and test dataset according to Algorithm 1, then training datasets 1 and 2 must include the real images of TC. This means that training datasets 1 and 2 must contain TC features at the same time, which is a prerequisite for the adoption of NDFTC.
Finally, we need to explain the reason why 80% of the real images of TC were used for training and the rest for testing. In general, for finite datasets that are not very large, such a training and testing ratio is a common method in the field of deep learning [62,63]. It is generally believed that when the total number of images in the dataset reaches tens of thousands or even hundreds of thousands, the proportion of the training set can exceed 90% [63]. Of course, considering that the dataset of TCs used in this paper has only thousands of images, 80% was acceptable. More importantly, for object detection tasks with finite datasets, setting a smaller training dataset usually leads to lower accuracy, so we chose the common ratio of 80% over others.

Conclusions
In this paper, on the basis of deep transfer learning, we propose a new detection framework of tropical cyclones (NDFTC) from meteorological satellite images by combining the DCGAN and YOLOv3. The algorithm process of NDFTC consists of three major steps: data augmentation, a pre-training phase, and transfer learning, which ensures the effectiveness of detecting different kinds of TCs in complex backgrounds with finite data volume. We used DCGAN as the data augmentation method instead of traditional data augmentation methods because DCGAN can generate images simulated to TCs by learning the salient characteristics of TCs, which improves the utilization of finite data. In the pre-training phase, we used YOLOv3 as the detection model and it was trained with the generated images obtained from DCGAN, which helped the model learn the salient characteristics of TCs. In the transfer learning phase, we trained the detection model with real images of TCs and its initial weights were transferred from the YOLOv3 trained with generated images, which is a typically network-based deep transfer learning method and can improve the stability and accuracy of the model. The experimental results show that the NDFTC had better performance, with an ACC of 97.78% and AP of 81.39%, in comparison to the YOLOv3, with an ACC of 93.96% and AP of 80.64%. On the basis of the above conclusions, we think that our NDFTC with high accuracy has promising potential for detecting different kinds of TCs and we believe that NDFTC could benefit current TC-detection tasks and similar detection tasks, especially for those tasks with finite data volume.

Data Availability Statement:
The data used in this study are openly available at the National Institute of Informatics (NII) at http://agora.ex.nii.ac.jp/digital-typhoon/search_date.html.en#id2 (accessed on 29 March 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: