Fast Automatic Airport Detection in Remote Sensing Images Using Convolutional Neural Networks

Fast and automatic detection of airports from remote sensing images is useful for many military and civilian applications. In this paper, a fast automatic detection method is proposed to detect airports from remote sensing images based on convolutional neural networks using the Faster R-CNN algorithm. This method first applies a convolutional neural network to generate candidate airport regions. Based on the features extracted from these proposals, it then uses another convolutional neural network to perform airport detection. By taking the typical elongated linear geometric shape of airports into consideration, some specific improvements to the method are proposed. These approaches successfully improve the quality of positive samples and achieve a better accuracy in the final detection results. Experimental results on an airport dataset, Landsat 8 images, and a Gaofen-1 satellite scene demonstrate the effectiveness and efficiency of the proposed method.


Introduction
Fast and automatic detection of airports from remote sensing images is useful in many military and civilian applications [1,2]. In recent years, airport detection has gained increased attention and has been a topic of interest in computer vision and remote sensing research.
Many researchers have studied this problem and have reported valuable results in literature. Some previous works focused on using the characteristics of runways, which generally have obvious elongated parallel edges [3][4][5][6][7]. They used edge or line segment extraction methods to delineate the shape of airport runways. Some scholars used textural information or shape features for airport detection via classifiers, such as support vector machine (SVM) or Adaptive Boosting (AdaBoost) algorithm [8][9][10]. In remote sensing images, however, there could be ground objects such as roads or coastlines, which demonstrate linear shape features or textural features similar to runways. This will lower the performance of these algorithms and can lead to a relatively high false alarm rate. Tao et al. [1] proposed to implement the airport detection task based on a set of scale-invariant feature transform (SIFT) local features. Within a dual-scale and hierarchical architecture, they first used clustered SIFT keypoints and segmented region information to locate candidate regions that may contain an airport in the coarse scale. Then, these candidate regions were mapped to the original scale and texture information was used for airport detection with a SVM classifier. Similarly, Zhu et al. [11] use a constraint to sieve low quality positive samples. Experimental results on an airport dataset, Landsat 8 images, and a Gaofen-1 satellite scene demonstrate the effectiveness and efficiency of the proposed method.
The remainder of this paper is organized as follows. In Section 2, we first introduce the neural network architecture and the domain-specific fine-tuning and data argumentation methods of the proposed approach. We then present some specific improvement techniques for airport detection. In Section 3, we first test the proposed method on a set of airport images, and then evaluate its practical performance on Landsat 8 images and a Gaofen-1 satellite scene. Next, the results of these improvements are presented and discussed and, finally, conclusions are drawn in Section 4.

Methodology
In the machine learning area, CNNs have demonstrated a much better feature representation than traditional methods [21] and they have shown great potential in many visual applications. A CNN is formed by a stack of distinct layers, where each successive layer uses the output from the previous layer as input. These multiple layers make the network very expressive to learn relationships between inputs and outputs. By using an appropriate loss function on a set of labeled training data, which penalizes the deviation between the predicted and true labels, the weights of these layers can be obtained by supervised learning. In the following, we first introduce the network architecture of the proposed method for airport detection, and then present some specific improvement techniques of the proposed method. The flowchart of the proposed algorithm is shown in Figure 1.

Convolutional Neural Network for Airport Detection
The Visual Geometry Group network with 16 layers (VGG-16) [22] was used in our experiments. This network has demonstrated better performance than many other networks by using a deeper convolutional network architecture [17,18,22]. The network has 13 convolutional layers, where the filters have 3 × 3 receptive fields with stride 1. The convolutional process is defined as where x i,j,d and y i,j,d are the respective input and output neuron values at position (i, j) in the dth channel. f is the filter, and b d is the bias in the dth channel. Nonlinear rectified linear unit (ReLU) where x is the input to a neuron, is used for each layer. Max-pooling y i,j,d = max i =1,2;j =1,2 x i+i −1,j+j −1,d where x i,j,d and y i,j,d are the input and output neuron values at position (i, j) in the dth channel, respectively, is performed with stride 2, following two or three convolutional layers [21][22][23]. A RoI pooling layer and two fully connected layers were used to extract a fixed-length (4096-dimensional in the experiments) feature vector from the convolutional feature map for each proposal generated by the RPN, which will be introduced in the following section. The RoI pooling layer divides a region proposal into a fixed-size grid and then in each grid cell uses max-pooling on the values, which are in the corresponding sub-windows in the feature map. The extracted feature vector was used in a fully connected layer to produce the softmax probability estimated as airport and in another fully connected layer to produce the four values that encoded the refined bounding box position of the airport using regression. For the two fully connected layers, dropout regularization [24] was performed with probability 0.5 for individual nodes. More details about the network architecture can found in [17,22]. If the score of a proposal generated by the network is larger than a pre-specified threshold, an airport is claimed to be detected in the refined proposal.

Region Proposal Network
RPN is a deep network that can produce proposals in an effective forward computational way with the feature map produced by the CNN described in the previous section. It first adds an additional convolutional layer on the top of the feature map obtained by that CNN to extract a fixed-length (512-dimensional in the experiments) feature vector at each convolutional map position. Then, the extracted feature vector is fed into two additional sibling fully connected layers. One outputs a set of rectangular positions relative to different scales and aspect ratios at that convolutional map position, and the other outputs the scores of these rectangular positions. ReLU is also used for the convolutional layer. Because the RPN shares the convolutional layers with the CNN for airport detection, the RPN adds only small computational load to the original CNN for airport detection. More details of the RPN architecture can be found in [18].
Note that the region proposals produced by the RPN generally overlap each other. Therefore, non-maximum suppression (NMS) is performed on the produced proposals based on their scores. A region proposal is rejected if it has an intersection-over-union (IoU) overlap, which is more than a specified threshold, with another region proposal that has higher score. IoU was computed as the intersection divided by the union of two sets where A ∪ B and A ∩ B denote the union and intersection of two concerned sets, respectively, and area() denotes the area of a set. After NMS, the top-300 proposals are fed in the above CNN RoI pooling layer to produce the final airport detection.

Fine-Tuning and Argumentation
In our collected airport dataset, which is introduced in the following section, only a limited number of airports were available. As the CNN requires a large amount of data for training in order to avoid overfitting, we used the pre-trained network weights of the VGG model, which was successfully trained with the widely used ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset [18,21]. To adapt the pre-trained CNN to the airport detection task, we fine-tuned these network weights using the collected airport dataset. Due to limited memory, the shared convolutional layers were initialized by the pre-trained VGG model weights and only the conv3_1 (the first layer of the third stack of convolutional layers) and upper layers were fine-tuned. Because the RPN shares the convolutional layers with the CNN for airport detection, these two networks can be trained alternatively between fine-tuning for the region proposal task and then fine-tuning for the airport detection task. In our experiments, we used argumentation techniques such as horizontal flip, vertical flip, image rotation, and image translation to enlarge the training data. It was observed that this domain-specific fine-tuning allows learning good network weights for a high-capacity CNN for airport detection.

Improvement Techniques
In the original Faster R-CNN algorithm [18], which was proposed for detecting ordinary objects, at each feature map position, nine region proposals were predicted at three scales of 128 2 , 256 2 , and 512 2 , and three aspect ratios of 1:1, 1:2, and 2:1. However, we found that there are many small airfields in remote sensing images of medium spatial resolution (e.g., 16 m), especially in America, Europe and on certain islands. In such cases, the smallest scale of 128 2 would be too large to accurately locate these. Therefore, in our implementation, another scale of 64 2 was added. For example, in Figure 2, there is a small airport (Matsu Peigan) located in the scene, and its bounding box is indicated by the blue rectangle. One of the positive samples, which were obtained by the original Faster R-CNN algorithm with the smallest scale of 128 2 , is indicated by the red rectangle in Figure 2a. Although this positive sample includes the whole area of the airfield, it is too large and most of the pixels in it are other ground objects. This redundant information lowers the quality of the training samples. By using a smaller scale of 64 2 , the positive sample, which is shown in Figure 2b, is more accurate than the one shown in Figure 2a. The runway, which always demonstrates an elongated geometric linear shape, is the most important feature for airport detection. We found that the aspect ratios of 1:1, 1:2, and 2:1, which are generally sufficient for detecting ordinary objects, cannot accurately locate those vertical or horizontal runways that have elongated bounding boxes. Therefore, two aspect ratios of 1:3 and 3:1 were added in our implementation. For example, in Figure 3, there is a nearly horizontal airport (Matsu Nangan airport) and its bounding box, indicated by the blue rectangle, demonstrates an elongated geometric shape. One of the positive samples, which were obtained by the original Faster R-CNN algorithm with the aspect ratio of 1:2, is shown in the red rectangle in Figure 3a. One can see that this positive sample only covers a part of the runway area. By using a bigger aspect ratio of 1:3, one can obtain a better positive sample (shown in red rectangle in Figure 3b). In our experiments, we observed that the added scale and aspect ratios helped to increase both the quality of positive samples and the final accuracy of airport detection.  In CNN training by back-propagation and stochastic gradient descent [25], the quality of the positive samples is important to obtain good network weights, which will affect the final detection accuracy. In the original Faster R-CNN algorithm [18], which was proposed for ordinary object detection, positive samples were labeled to the region that has the highest IoU overlap with the bounding box or to those regions that have an IoU ratio overlap higher than 0.7 with the bounding box. This works for ordinary objects that have a relatively small aspect ratio. However, in this paper we want to detect airports, which have elongated linear shapes. Simply using the IoU ratio overlap with the bounding box for that purpose could generate positive samples that cover only a small fraction of the runway area in the ground-truth. As the runway is often the only obvious feature that can be used for airport detection, this could lead to low-quality positive samples.
Nominal positive samples that are generated from airfield images in which the runway has a diagonal or anti-diagonal direction, for example, may cover only a small part of that runway. We observed that such samples contain so little valuable information that they cannot have a positive effect on network training. In Figure 4a, the Zhangjiakou Ningyuan airport (blue rectangle indicates bounding box) has a runway with an elongated linear shape in the diagonal direction. The groundtruth of the runway area occupies only a small fraction of pixels in its bounding box. One of the positive samples, obtained using the rules of the original Faster R-CNN algorithm (red rectangle in Figure 4a), only covers a very small fraction of the runway located in the right top corner. Most of the information in this nominal positive sample is therefore redundant. In Figure 4b, there is a smaller airport (Yangzhou Taizhou airport). One of the positive samples that were obtained using the original  In CNN training by back-propagation and stochastic gradient descent [25], the quality of the positive samples is important to obtain good network weights, which will affect the final detection accuracy. In the original Faster R-CNN algorithm [18], which was proposed for ordinary object detection, positive samples were labeled to the region that has the highest IoU overlap with the bounding box or to those regions that have an IoU ratio overlap higher than 0.7 with the bounding box. This works for ordinary objects that have a relatively small aspect ratio. However, in this paper we want to detect airports, which have elongated linear shapes. Simply using the IoU ratio overlap with the bounding box for that purpose could generate positive samples that cover only a small fraction of the runway area in the ground-truth. As the runway is often the only obvious feature that can be used for airport detection, this could lead to low-quality positive samples.
Nominal positive samples that are generated from airfield images in which the runway has a diagonal or anti-diagonal direction, for example, may cover only a small part of that runway. We observed that such samples contain so little valuable information that they cannot have a positive effect on network training. In Figure 4a, the Zhangjiakou Ningyuan airport (blue rectangle indicates bounding box) has a runway with an elongated linear shape in the diagonal direction. The groundtruth of the runway area occupies only a small fraction of pixels in its bounding box. One of the positive samples, obtained using the rules of the original Faster R-CNN algorithm (red rectangle in Figure 4a), only covers a very small fraction of the runway located in the right top corner. Most of the information in this nominal positive sample is therefore redundant. In Figure 4b, there is a smaller airport (Yangzhou Taizhou airport). One of the positive samples that were obtained using the original In CNN training by back-propagation and stochastic gradient descent [25], the quality of the positive samples is important to obtain good network weights, which will affect the final detection accuracy. In the original Faster R-CNN algorithm [18], which was proposed for ordinary object detection, positive samples were labeled to the region that has the highest IoU overlap with the bounding box or to those regions that have an IoU ratio overlap higher than 0.7 with the bounding box. This works for ordinary objects that have a relatively small aspect ratio. However, in this paper we want to detect airports, which have elongated linear shapes. Simply using the IoU ratio overlap with the bounding box for that purpose could generate positive samples that cover only a small fraction of the runway area in the ground-truth. As the runway is often the only obvious feature that can be used for airport detection, this could lead to low-quality positive samples.
Nominal positive samples that are generated from airfield images in which the runway has a diagonal or anti-diagonal direction, for example, may cover only a small part of that runway. We observed that such samples contain so little valuable information that they cannot have a positive effect on network training. In Figure 4a, the Zhangjiakou Ningyuan airport (blue rectangle indicates bounding box) has a runway with an elongated linear shape in the diagonal direction. The ground-truth of the runway area occupies only a small fraction of pixels in its bounding box. One of the positive samples, obtained using the rules of the original Faster R-CNN algorithm (red rectangle in Figure 4a in this nominal positive sample is therefore redundant. In Figure 4b, there is a smaller airport (Yangzhou Taizhou airport). One of the positive samples that were obtained using the original Faster R-CNN algorithm (red rectangle) only covers the terminal building and a very small fraction of the runway. As the runway could be the only detectable feature in this airport, we believe that this positive sample is not a typical airport image. In fact, it is even very hard for a human to determine whether it is an airport by looking at only this sample. We therefore propose to add a constraint on the positive samples to sieve the low quality ones, i.e., those that only cover a few airport pixels such as the examples shown in Figure 4. In our implementation, if a positive sample obtained by the original Faster R-CNN algorithm covers less than 70% of ground-truth runway pixels, it will be discarded in the training process. Although this will decrease the number of positive samples for CNN training, in our experiments, we observed that the final accuracy of airport detection did not decrease but increased instead. Faster R-CNN algorithm (red rectangle) only covers the terminal building and a very small fraction of the runway. As the runway could be the only detectable feature in this airport, we believe that this positive sample is not a typical airport image. In fact, it is even very hard for a human to determine whether it is an airport by looking at only this sample. We therefore propose to add a constraint on the positive samples to sieve the low quality ones, i.e., those that only cover a few airport pixels such as the examples shown in Figure 4. In our implementation, if a positive sample obtained by the original Faster R-CNN algorithm covers less than 70% of ground-truth runway pixels, it will be discarded in the training process. Although this will decrease the number of positive samples for CNN training, in our experiments, we observed that the final accuracy of airport detection did not decrease but increased instead.

Dataset and Computational Platform
We collected images from 343 airports around the world from Google Earth. These airports have various sizes from very large ones such as Chicago O'Hare to very small airfields such as Matsu Peigan. For each airport, images were collected at a spatial resolution of 8 m or 16 m and at different angles of view. The average size of these images is about 885 × 1613 pixels. 194 airports were randomly chosen as training data, which include 377 images collected at these airports. Some of the training airport images are shown in Figure 5. By using the argumentation methods mentioned in Section 2.3, an enlarged training dataset of about 28,000 images was obtained. The rest of the 149 airports were used for detection performance testing, which include 281 images collected at these airports. Another 50 images, which have no airports but some similar ground objects, were also used for testing. Landsat 8 images acquired in four seasons, and a Gaofen-1 satellite scene that covers the Beijing-Tianjin-Hebei area, China, were also taken to test the practical performance of the proposed method.
All the experiments were implemented on a personal computer with double 3.6-GHz Intel i7 cores, 8 GB DDR4 memory, and an NVIDIA GeForce GTX1060 graphics card with 6 GB memory. The experiments were implemented using Caffe [26].

Dataset and Computational Platform
We collected images from 343 airports around the world from Google Earth. These airports have various sizes from very large ones such as Chicago O'Hare to very small airfields such as Matsu Peigan. For each airport, images were collected at a spatial resolution of 8 m or 16 m and at different angles of view. The average size of these images is about 885 × 1613 pixels. 194 airports were randomly chosen as training data, which include 377 images collected at these airports. Some of the training airport images are shown in Figure 5. By using the argumentation methods mentioned in Section 2.3, an enlarged training dataset of about 28,000 images was obtained. The rest of the 149 airports were used for detection performance testing, which include 281 images collected at these airports. Another 50 images, which have no airports but some similar ground objects, were also used for testing. Landsat 8 images acquired in four seasons, and a Gaofen-1 satellite scene that covers the Beijing-Tianjin-Hebei area, China, were also taken to test the practical performance of the proposed method.
All the experiments were implemented on a personal computer with double 3.6-GHz Intel i7 cores, 8 GB DDR4 memory, and an NVIDIA GeForce GTX1060 graphics card with 6 GB memory. The experiments were implemented using Caffe [26].

Results on Test Images
We qualitatively and quantitatively compared our proposed method with the original Faster R-CNN algorithm and the airport detection method proposed in [13] (this method will be denoted by LSD + AlexNet further in the text). We used the default parameter setting in the code (Available online: https://github.com/rbgirshick/py-faster-rcnn) for both the original Faster R-CNN algorithm and our proposed method. The average IoU overlap ratio was used to assess location accuracy and evaluate the performance. In addition, the commonly used average precision (AP) [27] was adopted as metric for evaluating detection performance. An obtained bounding box was considered correct if it had an IoU overlap with the ground truth that is higher than 50%. Otherwise it was considered incorrect. Other metrics of precision, recall, and accuracy were also used to evaluate the results [28,29]. The false alarm rate (FAR), which is defined as the ratio of the number of falsely detected airports to the number of testing images, was used as an additional metric for performance evaluation. Figure 6 shows the detection results of the Tianshui Maijishan airport by the Faster R-CNN algorithm ( Figure 6a) and by the proposed method ( Figure 6b). The Faster R-CNN algorithm wrongly identified an elongated water body as airport while our proposed method did successfully detect the airport in the scene. Although the LSD + AlexNet method also detected this airport in the scene (Figure 6c), the location accuracy was clearly a little lower than the result shown in Figure 6b.

Results on Test Images
We qualitatively and quantitatively compared our proposed method with the original Faster R-CNN algorithm and the airport detection method proposed in [13] (this method will be denoted by LSD + AlexNet further in the text). We used the default parameter setting in the code (Available online: https://github.com/rbgirshick/py-faster-rcnn) for both the original Faster R-CNN algorithm and our proposed method. The average IoU overlap ratio was used to assess location accuracy and evaluate the performance. In addition, the commonly used average precision (AP) [27] was adopted as metric for evaluating detection performance. An obtained bounding box was considered correct if it had an IoU overlap with the ground truth that is higher than 50%. Otherwise it was considered incorrect. Other metrics of precision, recall, and accuracy were also used to evaluate the results [28,29]. The false alarm rate (FAR), which is defined as the ratio of the number of falsely detected airports to the number of testing images, was used as an additional metric for performance evaluation. Figure 6 shows the detection results of the Tianshui Maijishan airport by the Faster R-CNN algorithm (Figure 6a) and by the proposed method (Figure 6b). The Faster R-CNN algorithm wrongly identified an elongated water body as airport while our proposed method did successfully detect the airport in the scene. Although the LSD + AlexNet method also detected this airport in the scene (Figure 6c), the location accuracy was clearly a little lower than the result shown in Figure 6b. The Frankfurt-Hahn airport (Figure 7) was detected by all the three algorithms, but the proposed method (Figure 7b) is more accurate as part of the airport area was not detected by the Faster R-CNN algorithm (Figure 6a). The LSD + AlexNet method detected the three runways separately in this scene. The Meixian Changgangji airport (Figure 8) was also detected more accurately by the proposed method than by the Faster R-CNN algorithm. The LSD + AlexNet method wrongly identified an elongated water body as airport although part of the airport area was included in the bounding box. All the three approaches successfully detected the smaller Fuyang Xiguan airport ( Figure 9). The LSD + AlexNet method only detected the runway, while both the proposed method and the Faster R-CNN algorithm successfully detected the whole airport area (including the terminal building).  The Frankfurt-Hahn airport (Figure 7) was detected by all the three algorithms, but the proposed method (Figure 7b) is more accurate as part of the airport area was not detected by the Faster R-CNN algorithm (Figure 6a). The LSD + AlexNet method detected the three runways separately in this scene. The Meixian Changgangji airport (Figure 8) was also detected more accurately by the proposed method than by the Faster R-CNN algorithm. The LSD + AlexNet method wrongly identified an elongated water body as airport although part of the airport area was included in the bounding box. All the three approaches successfully detected the smaller Fuyang Xiguan airport ( Figure 9). The LSD + AlexNet method only detected the runway, while both the proposed method and the Faster R-CNN algorithm successfully detected the whole airport area (including the terminal building). The Frankfurt-Hahn airport (Figure 7) was detected by all the three algorithms, but the proposed method (Figure 7b) is more accurate as part of the airport area was not detected by the Faster R-CNN algorithm (Figure 6a). The LSD + AlexNet method detected the three runways separately in this scene. The Meixian Changgangji airport (Figure 8) was also detected more accurately by the proposed method than by the Faster R-CNN algorithm. The LSD + AlexNet method wrongly identified an elongated water body as airport although part of the airport area was included in the bounding box. All the three approaches successfully detected the smaller Fuyang Xiguan airport ( Figure 9). The LSD + AlexNet method only detected the runway, while both the proposed method and the Faster R-CNN algorithm successfully detected the whole airport area (including the terminal building).  The Frankfurt-Hahn airport (Figure 7) was detected by all the three algorithms, but the proposed method ( Figure 7b) is more accurate as part of the airport area was not detected by the Faster R-CNN algorithm (Figure 6a). The LSD + AlexNet method detected the three runways separately in this scene. The Meixian Changgangji airport (Figure 8) was also detected more accurately by the proposed method than by the Faster R-CNN algorithm. The LSD + AlexNet method wrongly identified an elongated water body as airport although part of the airport area was included in the bounding box. All the three approaches successfully detected the smaller Fuyang Xiguan airport (Figure 9). The LSD + AlexNet method only detected the runway, while both the proposed method and the Faster R-CNN algorithm successfully detected the whole airport area (including the terminal building).  from similar ground objects. This is because the runway is the only useful geometric feature for detecting small airfields, and in medium-resolution satellite images its elongated linear geometric shape is very similar to some other ground objects like highways, long ditches, etc. There is no airport, for example, in the scene shown in Figure 11, but all the three methods nevertheless falsely detected one. The detected area contains an elongated ground object that is visually similar to a runway.  Provided with a scene without airport in it ( Figure 10), both the Faster R-CNN algorithm and LSD + AlexNet method wrongly detected an area with an elongated linear feature similar to a runway as airport (show in red rectangles). Our proposed method, by contrast, successfully detected that there was no airport present in this image. Provided with a scene without airport in it (Figure 10), both the Faster R-CNN algorithm and LSD + AlexNet method wrongly detected an area with an elongated linear feature similar to a runway as airport (show in red rectangles). Our proposed method, by contrast, successfully detected that there was no airport present in this image.
In our experiments we found that most of the relatively large airports were successfully detected by the proposed method. Most of the errors occurred on small airfields that are difficult to distinguish from similar ground objects. This is because the runway is the only useful geometric feature for detecting small airfields, and in medium-resolution satellite images its elongated linear geometric shape is very similar to some other ground objects like highways, long ditches, etc. There is no airport, for example, in the scene shown in Figure 11, but all the three methods nevertheless falsely detected one. The detected area contains an elongated ground object that is visually similar to a runway.  In our experiments we found that most of the relatively large airports were successfully detected by the proposed method. Most of the errors occurred on small airfields that are difficult to distinguish from similar ground objects. This is because the runway is the only useful geometric feature for detecting small airfields, and in medium-resolution satellite images its elongated linear geometric shape is very similar to some other ground objects like highways, long ditches, etc. There is no airport, for example, in the scene shown in Figure 11, but all the three methods nevertheless falsely detected one. The detected area contains an elongated ground object that is visually similar to a runway. We carried out three experiments with different random training sets and reported the mean performance of the proposed method in Table 1. After adding the additional scale and aspect ratios (denoted as T1 in Table 1), the detection performance increased. In particular, the location accuracy and the precision of the proposed method are better than the result obtained by the original Faster R-CNN algorithm. After sieving low quality positive samples (denoted by T2 in the table), the detection performance increased further. This demonstrates that our specific improvement techniques, which were proposed to handle the elongated airport runway shapes, effectively improved detection performance. The final recall value (0.837) of the proposed method is slightly less than the recall value (0.838) of the original Faster R-CNN algorithm. This means that the former is only slightly more conservative than the latter, and that the proposed method can achieve better precision and accuracy with almost identical recall. The Receiver Operating Characteristics (ROC) graphs and the precisionrecall curves of the results (Figures 12 and 13) also demonstrate that the proposed method outperforms the original Faster R-CNN algorithm. The proposed method can achieve a lower false positive rate with the same true positive rate and also has a higher precision with the same recall value. The proposed method and the original Faster R-CNN algorithm, which both generate proposals based on the RPN, outperformed the LSD + AlexNet method. On our computational platform, it took about 0.16 s to process an image for both our proposed method and the Faster R-CNN algorithm, and about 3.23 s for the LSD + AlexNet method.   We carried out three experiments with different random training sets and reported the mean performance of the proposed method in Table 1. After adding the additional scale and aspect ratios (denoted as T1 in Table 1), the detection performance increased. In particular, the location accuracy and the precision of the proposed method are better than the result obtained by the original Faster R-CNN algorithm. After sieving low quality positive samples (denoted by T2 in the table), the detection performance increased further. This demonstrates that our specific improvement techniques, which were proposed to handle the elongated airport runway shapes, effectively improved detection performance. The final recall value (0.837) of the proposed method is slightly less than the recall value (0.838) of the original Faster R-CNN algorithm. This means that the former is only slightly more conservative than the latter, and that the proposed method can achieve better precision and accuracy with almost identical recall. The Receiver Operating Characteristics (ROC) graphs and the precision-recall curves of the results (Figures 12 and 13) also demonstrate that the proposed method outperforms the original Faster R-CNN algorithm. The proposed method can achieve a lower false positive rate with the same true positive rate and also has a higher precision with the same recall value. The proposed method and the original Faster R-CNN algorithm, which both generate proposals based on the RPN, outperformed the LSD + AlexNet method. On our computational platform, it took about 0.16 s to process an image for both our proposed method and the Faster R-CNN algorithm, and about 3.23 s for the LSD + AlexNet method.  We carried out three experiments with different random training sets and reported the mean performance of the proposed method in Table 1. After adding the additional scale and aspect ratios (denoted as T1 in Table 1), the detection performance increased. In particular, the location accuracy and the precision of the proposed method are better than the result obtained by the original Faster R-CNN algorithm. After sieving low quality positive samples (denoted by T2 in the table), the detection performance increased further. This demonstrates that our specific improvement techniques, which were proposed to handle the elongated airport runway shapes, effectively improved detection performance. The final recall value (0.837) of the proposed method is slightly less than the recall value (0.838) of the original Faster R-CNN algorithm. This means that the former is only slightly more conservative than the latter, and that the proposed method can achieve better precision and accuracy with almost identical recall. The Receiver Operating Characteristics (ROC) graphs and the precisionrecall curves of the results (Figures 12 and 13) also demonstrate that the proposed method outperforms the original Faster R-CNN algorithm. The proposed method can achieve a lower false positive rate with the same true positive rate and also has a higher precision with the same recall value. The proposed method and the original Faster R-CNN algorithm, which both generate proposals based on the RPN, outperformed the LSD + AlexNet method. On our computational platform, it took about 0.16 s to process an image for both our proposed method and the Faster R-CNN algorithm, and about 3.23 s for the LSD + AlexNet method. Table 1. Numerical performance. T1 denotes adding additional scale and aspect ratios, and T2 denotes sieving low quality positive samples. "Faster R-CNN+T1+T2" is the proposed method.  We also tested the performance of the proposed method with a different number of training airports. Figure 14 indicates that the performance increased when more airport images were used. Using more airport images for training will therefore be beneficial for further increasing the detection performance. However, due to the difficulty of collecting many airport images from remote sensing scenes, we consider this to be part of our future work.

Results on Landsat 8 Images
We also evaluated how the proposed method performed on airport appearances in different seasons. The CNN model weights, which were obtained using the airport dataset described previously, were also used for this experiment. Figure 15 shows the detection results of the proposed method on Landsat 8 images (in true-color composite of 8-bit quantization) of the Jiagedaqi airport in four seasons. These images, collected in different seasons, were pansharpened to a spatial resolution of 15 m and show differing background colors. In winter, most of the background objects are covered with snow and the cleaned runway still demonstrates the typical elongated linear geometric shape. Our proposed method successfully detected this airport in all seasons, although visually the location accuracy in winter is not as high as in the other three seasons. The results of the original Faster R-CNN algorithm are shown in Figure 16. Although the detection results in spring (Figure 16a), summer ( Figure 16b) and autumn (Figure 16c) are similar to the results shown in Figure  15, visually the original Faster R-CNN algorithm has a lower location accuracy on the winter image ( Figure 16d) than our proposed method. The detection results on Landsat 8 images in different seasons demonstrates the generalization capability of the CNN-based airport detection methods when applied to images that are recorded by other sensors and in other seasons. We also tested the performance of the proposed method with a different number of training airports. Figure 14 indicates that the performance increased when more airport images were used. Using more airport images for training will therefore be beneficial for further increasing the detection performance. However, due to the difficulty of collecting many airport images from remote sensing scenes, we consider this to be part of our future work. We also tested the performance of the proposed method with a different number of training airports. Figure 14 indicates that the performance increased when more airport images were used. Using more airport images for training will therefore be beneficial for further increasing the detection performance. However, due to the difficulty of collecting many airport images from remote sensing scenes, we consider this to be part of our future work.

Results on Landsat 8 Images
We also evaluated how the proposed method performed on airport appearances in different seasons. The CNN model weights, which were obtained using the airport dataset described previously, were also used for this experiment. Figure 15 shows the detection results of the proposed method on Landsat 8 images (in true-color composite of 8-bit quantization) of the Jiagedaqi airport in four seasons. These images, collected in different seasons, were pansharpened to a spatial resolution of 15 m and show differing background colors. In winter, most of the background objects are covered with snow and the cleaned runway still demonstrates the typical elongated linear geometric shape. Our proposed method successfully detected this airport in all seasons, although visually the location accuracy in winter is not as high as in the other three seasons. The results of the original Faster R-CNN algorithm are shown in Figure 16. Although the detection results in spring (Figure 16a

Results on Landsat 8 Images
We also evaluated how the proposed method performed on airport appearances in different seasons. The CNN model weights, which were obtained using the airport dataset described previously, were also used for this experiment. Figure 15 shows the detection results of the proposed method on Landsat 8 images (in true-color composite of 8-bit quantization) of the Jiagedaqi airport in four seasons. These images, collected in different seasons, were pansharpened to a spatial resolution of 15 m and show differing background colors. In winter, most of the background objects are covered with snow and the cleaned runway still demonstrates the typical elongated linear geometric shape. Our proposed method successfully detected this airport in all seasons, although visually the location accuracy in winter is not as high as in the other three seasons. The results of the original Faster R-CNN algorithm are shown in Figure 16. Although the detection results in spring (Figure 16a

Results on a Gaofen-1 Scene
A Gaofen-1 satellite scene (Figure 17), which covers the Beijing-Tianjin-Hebei area in China, was also tested in our experiments. The size of the scene is 16,164 × 16,427 pixels in 16 m spatial resolution, and it was recorded on 2 November 2014. It is a winter scene with vegetation in the leaf-off state. The grass fields next to the airport runways are therefore less abundant. As the multispectral Gaofen-1 data has a radiometric resolution of 10 bits while the CNN used for airport detection was trained with 8-bit images, we pre-processed the scene by converting it to 8-bit quantization and we used a true-color composite in our experiment. Of the 10 airports in the scene, the Beijing Capital airport (Figure 18a) and the Tianjing Binhai airport (Figure 18b) are two large international civilian airports while the others are mid-sized or small-sized civilian airports or military airports. Because the image is too large to be processed in its entirety due to memory limitations, we first divided it into several small sub-images of size 600 × 1000. We maintained an overlap distance of 300 pixels between two neighboring sub-images, which is about the length of the runway of the Beijing Capital airport to avoid splitting an airport in two parts over two sub-images. Airport detection was then performed on each of these sub-images. An NMS was performed on the merged result that was combined from the detection results of the sub-images. If the score of a proposal is larger than 0.9, an airport is claimed to be detected.

Results on a Gaofen-1 Scene
A Gaofen-1 satellite scene (Figure 17), which covers the Beijing-Tianjin-Hebei area in China, was also tested in our experiments. The size of the scene is 16,164 × 16,427 pixels in 16 m spatial resolution, and it was recorded on 2 November 2014. It is a winter scene with vegetation in the leaf-off state. The grass fields next to the airport runways are therefore less abundant. As the multispectral Gaofen-1 data has a radiometric resolution of 10 bits while the CNN used for airport detection was trained with 8-bit images, we pre-processed the scene by converting it to 8-bit quantization and we used a true-color composite in our experiment. Of the 10 airports in the scene, the Beijing Capital airport (Figure 18a) and the Tianjing Binhai airport (Figure 18b) are two large international civilian airports while the others are mid-sized or small-sized civilian airports or military airports. Because the image is too large to be processed in its entirety due to memory limitations, we first divided it into several small sub-images of size 600 × 1000. We maintained an overlap distance of 300 pixels between two neighboring sub-images, which is about the length of the runway of the Beijing Capital airport to avoid splitting an airport in two parts over two sub-images. Airport detection was then performed on each of these sub-images. An NMS was performed on the merged result that was combined from the detection results of the sub-images. If the score of a proposal is larger than 0.9, an airport is claimed to be detected. Our proposed method successfully detected almost all airports in about 95 s by using the pretrained CNN model, which was trained with the airport dataset introduced in Section 3.1. Nine airports were successfully detected in the scene by applying the proposed method, and no false detection was found ( Figure 18). The Faster R-CNN algorithm, on the other hand, made two false positive detections (shown in Figure 19j,k). This demonstrates that, compared to the Faster R-CNN algorithm, our proposed method is better in learning feature representations from the same training dataset, which was obtained from Google Earth remote sensing images. It also has a better generalization capability when applied to other sensors. Our proposed method successfully detected almost all airports in about 95 s by using the pre-trained CNN model, which was trained with the airport dataset introduced in Section 3.1. Nine airports were successfully detected in the scene by applying the proposed method, and no false detection was found ( Figure 18). The Faster R-CNN algorithm, on the other hand, made two false positive detections (shown in Figure 19j,k). This demonstrates that, compared to the Faster R-CNN algorithm, our proposed method is better in learning feature representations from the same training dataset, which was obtained from Google Earth remote sensing images. It also has a better generalization capability when applied to other sensors.
There is an airport in the Gaofen-1 scene that was not successfully detected by both methods (shown in Figure 20). This airport has two runways that are separated far from each other and there is no obvious terminal building. Because Gaofen-1 has different spectral characteristics than the remote sensing images from Google Earth, we expect that using specific Gaofen-1 images for CNN training will lead to a better detection performance. There is an airport in the Gaofen-1 scene that was not successfully detected by both methods (shown in Figure 20). This airport has two runways that are separated far from each other and there is no obvious terminal building. Because Gaofen-1 has different spectral characteristics than the remote sensing images from Google Earth, we expect that using specific Gaofen-1 images for CNN training will lead to a better detection performance.

Conclusions
In this paper, we proposed a fast and automatic airport detection method using convolutional neural networks. Based on the Faster R-CNN algorithm, we forwarded some specific improvement techniques to handle the typical elongated linear geometric shape of an airport. By using additional scale and aspect ratios to increase the quality of positive samples, and by using a constraint to sieve low quality positive samples, a better airport detection performance was obtained. Note that we used the same parameter setting for both our method and the original Faster R-CNN algorithm to demonstrate that the increase in performance really benefited from the proposed specific improvement techniques. Compared to the reported results on airport detection in literature, this is to our best knowledge the fastest automatic airport detection method that can achieve state-of-theart performance. Compared to other airport detection algorithms, the proposed method needs relatively little pre-processing, and its independence from the difficult effort of handcrafted feature design is a major advantage over many traditional methods.
Our experimental results demonstrated that although the Faster R-CNN algorithm can obtain good results for ordinary objects, a better detection accuracy can be achieved by using dedicated improvement techniques for domain specific objects (like the elongated linear airport). The CNN model was trained by a limited number of images collected from Google Earth, and our results on several Landsat 8 images acquired in four seasons and a Gaofen-1 scene demonstrated its generalization capability as it was successfully applied to images recorded by other sensors. We believe that by including more airport images, such as images from other sensors and images of high spatial resolution, further improvement on the proposed airport detection method can be expected. This is a topic of our future research work.
In our experiments, the Faster R-CNN algorithm was used due to its state-of-the-art detection accuracy and acceptable computational speed. Other CNN-based object detection algorithms, such as YOLO [19] and YOLO9000 [20], are also valuable to be tested. It is believed that, the proposed specific improvement techniques for airport detection can also be applied to other approaches that are originally proposed for detecting ordinary objects in daily life scenes, and this is under our future research work.

Conclusions
In this paper, we proposed a fast and automatic airport detection method using convolutional neural networks. Based on the Faster R-CNN algorithm, we forwarded some specific improvement techniques to handle the typical elongated linear geometric shape of an airport. By using additional scale and aspect ratios to increase the quality of positive samples, and by using a constraint to sieve low quality positive samples, a better airport detection performance was obtained. Note that we used the same parameter setting for both our method and the original Faster R-CNN algorithm to demonstrate that the increase in performance really benefited from the proposed specific improvement techniques. Compared to the reported results on airport detection in literature, this is to our best knowledge the fastest automatic airport detection method that can achieve state-of-the-art performance. Compared to other airport detection algorithms, the proposed method needs relatively little pre-processing, and its independence from the difficult effort of handcrafted feature design is a major advantage over many traditional methods.
Our experimental results demonstrated that although the Faster R-CNN algorithm can obtain good results for ordinary objects, a better detection accuracy can be achieved by using dedicated improvement techniques for domain specific objects (like the elongated linear airport). The CNN model was trained by a limited number of images collected from Google Earth, and our results on several Landsat 8 images acquired in four seasons and a Gaofen-1 scene demonstrated its generalization capability as it was successfully applied to images recorded by other sensors. We believe that by including more airport images, such as images from other sensors and images of high spatial resolution, further improvement on the proposed airport detection method can be expected. This is a topic of our future research work.
In our experiments, the Faster R-CNN algorithm was used due to its state-of-the-art detection accuracy and acceptable computational speed. Other CNN-based object detection algorithms, such as YOLO [19] and YOLO9000 [20], are also valuable to be tested. It is believed that, the proposed specific improvement techniques for airport detection can also be applied to other approaches that are originally proposed for detecting ordinary objects in daily life scenes, and this is under our future research work.