Exploit Direction Information for Remote Ship Detection

: Ship detection in remote sensing has been achieving increasing signiﬁcance recently. In remote sensing, ships are arbitrary oriented and the detector has to learn the object features of arbitrary orientation by rote, which demands a large amount of training data to prevent overﬁtting. In addition, plenty of ships have a distinct direction from the center point to the head point. However, little attention has been paid to the direction information of ships and previous studies just predict the bow directions of ships. In this paper, we propose to further exploit the ship direction information to solve the arbitrary orientation problem, including direction augmentation, direction prediction, and direction normalization. A Variable-Direction Rotated RoI Align module is designed for direction augmentation and normalization with an additional feature extraction direction as input. The direction augmentation method directly augments the features of ship RRoIs and brings great diversities to the training data set. The direction prediction introduces additional direction information for learning and helps to reduce noise. In the direction normalization method, the predicted ship directions are utilized to normalize the directions of ship features from stern to bow through the VDR RoI Align module, making the ship features present in one orientation and easier to be identiﬁed by the detector. On the L1 task of the HRSC2016 data set, the direction augmentation method and direction normalization method boost the RoI Transformer baseline from 86.2% to 90.4% and 90.6%, respectively, achieving the state-of-the-art performance.


Introduction
With the advances in object detection and remote sensing technologies, ship detection has come into wide use in military and civilian areas such as fishing management, illegal smuggling, and vessel surveillance [1][2][3]. Detecting ships accurately and efficiently in remote sensing images has also received great concern in recent years [4,5]. Wei et al. [6] proposed a novel ship detection method for high-resolution SAR imagery based on a high-resolution ship detection network. Wu et al. [7] proposed a new coarse-to-fine ship detection network (CF-SDN) that directly achieved an end-to-end mapping from image pixels to bounding boxes with confidences. Zhang et al. [8] presented a fast, regional-based convolutional neural network (R-CNN) to detect ships from high-resolution remote sensing imagery to avoid the influence caused by the sea surface model, especially on inland rivers and in offshore areas. Ship detection technology in remote sensing images belongs to the area of object detection. General object detection technology adopts horizontal bounding boxes to locate objects, which is suited for describing the natural objects in front views. However, in remote sensing images, ships are taken with bird views, presenting the character of being arbitrary oriented. Hence, horizontal bounding boxes cannot precisely describe ships, leading to the misalignments between ships and bounding boxes and the introduction of a large area of backgrounds. Moreover, when two rotated ships locate rection task made the detector hard to converge.
We proposed to normalize the ship direction information to boost the ship detector performance. The VDR RoI Align was adopted to extract and normalize the features of ships using the predicted ship directions. The normalized features were easier for the detector to identify compared with the original features of two possible directions.
On the L1 task of the HRSC2016 data set, the proposed direction augmentation and the direction normalization methods both achieved the state-of-the-art performance.

Horizontal Detection
With the coming of the deep learning era, convolutional neural network (CNN) is widely investigated and applied to image classification [19], object detection [20], and image segmentation [21]. Girshick et al. [22] first introduced CNN for object detection in R-CNN and, since then, CNN has been playing a key role in object detection. Fast RCNN [23] sped up R-CNN by sharing the feature maps of images among different RoIs. Faster RCNN [24] replaced region proposal algorism [25] by region proposal networks (RPN), making the whole detector "end-to-end" and approximately real-time. To further accelerate object detection algorithms, single-stage object detectors were proposed. YOLO [26] divided feature maps into grids and directly predicted at each grid, to avoid the region proposal stage and save time. Over several years, YOLO developed to YOLOv4 [27] and reaches hundreds of FPS. Except for the breakthrough in model design, other investigations contribute tremendously to the improvement of detection accuracy and efficiency. Feature pyramid network [28] fused the features from top down and detected at different We proposed to normalize the ship direction information to boost the ship detector performance. The VDR RoI Align was adopted to extract and normalize the features of ships using the predicted ship directions. The normalized features were easier for the detector to identify compared with the original features of two possible directions.
On the L1 task of the HRSC2016 data set, the proposed direction augmentation and the direction normalization methods both achieved the state-of-the-art performance.

Horizontal Detection
With the coming of the deep learning era, convolutional neural network (CNN) is widely investigated and applied to image classification [19], object detection [20], and image segmentation [21]. Girshick et al. [22] first introduced CNN for object detection in R-CNN and, since then, CNN has been playing a key role in object detection. Fast RCNN [23] sped up R-CNN by sharing the feature maps of images among different RoIs. Faster RCNN [24] replaced region proposal algorism [25] by region proposal networks (RPN), making the whole detector "end-to-end" and approximately real-time. To further accelerate object detection algorithms, single-stage object detectors were proposed. YOLO [26] divided feature maps into grids and directly predicted at each grid, to avoid the region proposal stage and save time. Over several years, YOLO developed to YOLOv4 [27] and reaches hundreds of FPS. Except for the breakthrough in model design, other investigations contribute tremendously to the improvement of detection accuracy and efficiency. Feature pyramid network [28] fused the features from top down and detected at different layers according to the size of objects. Oksu et al. [29] surveyed various imbalance problems and their corresponding solutions, including class imbalance, scale imbalance, objective imbalance, and spatial imbalance. Recently, transformer [30] was introduced to object detection task by DETR [31] and became a hot point. Swim transformer [32] reaches 58.7% mAP on the COCO data set and exceeds the highest accuracy of previous object detection works. The above works focused on nature image detection and adopted a horizontal bounding box to locate objects.

Rotated Detection
Horizontal bounding boxes fail to locate objects precisely in some scenes such as text detection and remote sensing, and rotated detection added an orientation parameter to describe arbitrary-oriented objects. Rotated detection is preliminarily popular in text detection. Ma et al. [33] first introduced rotated bounding boxes for arbitrary-oriented text detection based on rotation RPN. Zhou et al. [34] proposed an anchor-free detector and local-aware NMS to accelerate the detection pipeline. In remote sensing, Liu et al. [15] proposed rotated RRoI pooling to extract features of rotated ships in a rotated region-based CNN. Ding et al. [16] transformed horizontal bounding boxes to rotated bounding boxes by an RoI Learner instead of setting anchors of several orientations to reduce computational complexity. Apart from describing oriented objects by the center point, width, height, and orientation, Xu et al. [35] defined oriented objects by gliding the vertex of horizontal bounding boxes on each side, including eight parameters in total. Zhang et al. [18] adopted center point, head point, width, and heights to locate oriented objects. Qian et al. [13] described oriented objects by four corner points of rotated bounding boxes. In this paper, we adopted five parameters to locate oriented ships, including the abscissa of the center point, the ordinate of the center point, width, height, and orientation.

Data Augmentation
Data augmentation plays an important role in machine learning since it brings significant improvements to learning tasks, data augmentation, including geometric transformations, color space augmentations, and so on. During training, data augmentation enhanced the size and quality of training data sets, which helped to reduce the chance of overfitting. During testing, data augmentation provided more testing images for result fusion at the cost of consuming more inferring time. Hence, it is recommended to adopt various data augmentation technologies during training. However, when applying data augmentation during testing, we should make a trade-off between accuracy and efficiency. Being a simple but effective data augmentation method, flip operation exchanges the pixels of left and right or up and down in images, and then objects are moved to a different place of images. When applying flip operation during training, it flips the image in a specific probability, and the original image or the flip image would be sent to training. When applying flip operation during testing, the original and the flip images are fed for inference, costing twice the testing time of that without flip augmentation.

Proposed Methods
We proposed three methods to exploit direction information and deal with the arbitrary orientation problem for remote object detection, including direction augmentation, direction prediction, and direction normalization. We used the RoI Transformer detector [16] as our baseline to verify the effectiveness of the three proposed methods. RoI Transformer is a three-steps detector designed for rotated object detection in remote sensing images, including the proposal of HRoIs, the transformation from HRoIs to RRoIs, and the refinement of RRoIs. In the first step, the RPN takes the feature maps from the backbone as input and proposes HRoIs. In the second step, the RoI Transformer module transforms the proposed HRoIs to RRoIs. Finally, the refinement module classifies and regresses the features extracted from the RRoIs for better RRoIs. Our proposed methods worked in the refinement step, and the model structure of direction normalization is shown in Figure 2. worked in the refinement step, and the model structure of direction normalization is shown in Figure 2. For the RPN stage, suppose we obtained the HRoIs ( ℎ , ℎ , ℎ , ℎ ℎ ), horizontal groundtruths ( ℎ * , ℎ * , ℎ * , ℎ ℎ * ), and horizontal anchors ( , , , ℎ ), where , denotes the center point coordinate of the box and , ℎ denotes the width and height of the box, the loss function of RPN can be written as: = log ( ℎ / ) (6) ℎ = log (ℎ ℎ /ℎ ) (7) * = ( ℎ * − )/ (8) * = ( ℎ * − )/ℎ (9) * = log ( ℎ * / ) (10) ℎ * = log (ℎ ℎ * /ℎ ) (11) where denotes the possibility of the anchor being a ship, and the ground-truth * is labeled 1 if the anchor is positive or labeled 0. In Equation (2), includes , , , ℎ and * includes * , * , * , ℎ * . For the RoI Transformer stage, we, firstly, transformed obtained HRoIs ( ℎ , ℎ , ℎ , ℎ ℎ ) into RRoIs form ( ℎ , ℎ , ℎ , ℎ ℎ , ℎ ), where ℎ = π/2 . Suppose we obtained RRoIs ( , , , ℎ , ) and rotated ground-truths ( * , * , * , ℎ * , * ), where denotes the orientation of the rotated box, then the loss function of RoI Transformer can be written as: = ℎ 1( , * ) For the RPN stage, suppose we obtained the HRoIs ( , and horizontal anchors (x a , y a , w a , h a ), where x, y denotes the center point coordinate of the box and w, h denotes the width and height of the box, the loss function of RPN L RPN can be written as: where p denotes the possibility of the anchor being a ship, and the ground-truth p * is labeled 1 if the anchor is positive or labeled 0. In Equation (2), t includes t x , t y , t w , t h and t * includes t x * , t y * , t w * , t h * . For the RoI Transformer stage, we, firstly, transformed obtained HRoIs ( where θ h = π/2. Suppose we obtained RRoIs (x r , y r , w r , h r , θ r ) and rotated ground-truths (x r * , y r * , w r * , h r * , θ * ), where θ denotes the orientation of the rotated box, then the loss function of RoI Transformer L Tra can be written as: L cls = CrossEntropy(p, p * ) where p denotes the possibility of the anchor belonging to one specific class, and the ground-truth p * denotes the class of the matched ground-truth boxes. The loss function of the refinement stage was the same as the RoI Transformer stage.

Variable-Direction Rotated RoI Align
In the RoI Transformer, the features of the RRoIs were extracted by Rotated Position Sensitive RoI Align after transforming HRoIs to RRoIs. The extracted features were, firstly, expanded into a column and then sent to FC layers for classification and location regression. FC layers are sensitive to locations, and the classification and regression results would change if the order of the feature column is inverted. For example, when extracting the features of ships, the features from bow to stern differ from those from stern to bow and would result in different predictions.
Therefore, we proposed the VDR RoI Align to extract features from a specific direction of ships. Different from previous RoI Align methods, the proposed VDR RoI Align demanded an additional feature extraction direction τ (τ ∈ {0, 1}) as input. Consider an RRoI (x c , y c , w, h, θ) and a feature-extracting direction τ, where (x c , y c ) denotes the center of RoI and w, h, θ denote the width, height, and orientation. We divided the RRoI into K × K bins. To transform the point P(x, y) in the rotated bins into the point P (x , y ) in the vertical coordinate of feature maps, we adopted the following transformation: where τ controls the direction correspondence of the two points. If τ = 0, the VDR RoI Align is equal to RPS RoI Align methods and extracts the features from left-top to rightdown. However, when τ = 1, the feature extraction order is reversed and the features from right-down to left-top are attained. We adopted the channel mapping, which was the same as the original RPS RoI Align in RoI Transformer, and sampled two points in each bin. Bilinear interpolation was then employed to get the value of the transformed points.

Direction Augmentation
The VDR RoI Align module enabled us to extract the features of the Rotated RoI from any direction. Through the VDR RoI Align, we proposed a new augmentation method named direction augmentation, and the model structure of the direction augmentation is shown in Figure 3.
After transforming HRoIs to RRoIs, The VDR RoI Align module extracted the features of RRoIs two times, with τ set to 0 and 1, respectively, and outputted two features of the opposite directions. As shown in Figure 3, the augmented features had an opposite direction from the original features. The direction augmentation can be applied during both training and testing. During training, the two features were fed to FC layers for classification and regression, outputting two sets of predictions. Then, two loss functions were calculated between the ground-truths and the two sets of predictions. In the backpropagation stage, we could choose to optimize the detector by the loss of the original predictions, the minimum loss of the two sets of predictions, or the summary loss of the two sets of predictions. Adopting the direction augmentation during training provided more ship features of the opposite extraction directions for the classification and regression Remote Sens. 2021, 13, 2155 7 of 18 module to learn, reducing the chance of overfitting. During the testing stage, except for the original ship features, the augmented ship features were fed for classification and regression, and the two sets of detections were fused by NMS with an IoU threshold of 0.1. The direction augmentation during testing provided more data for the detector to test. After transforming HRoIs to RRoIs, The VDR RoI Align module extracted the features of RRoIs two times, with set to 0 and 1, respectively, and outputted two features of the opposite directions. As shown in Figure 3, the augmented features had an opposite direction from the original features. The direction augmentation can be applied during both training and testing. During training, the two features were fed to FC layers for classification and regression, outputting two sets of predictions. Then, two loss functions were calculated between the ground-truths and the two sets of predictions. In the back-propagation stage, we could choose to optimize the detector by the loss of the original predictions, the minimum loss of the two sets of predictions, or the summary loss of the two sets of predictions. Adopting the direction augmentation during training provided more ship features of the opposite extraction directions for the classification and regression module to learn, reducing the chance of overfitting. During the testing stage, except for the original ship features, the augmented ship features were fed for classification and regression, and the two sets of detections were fused by NMS with an IoU threshold of 0.1. The direction augmentation during testing provided more data for the detector to test.

Direction Prediction
In the HRSC2016 data set [36], the head points of ships were labeled, providing the chance to improve the detector by using direction information. We further exploited direction information by predicting the direction of ships. When labeling the direction of ships, the ship direction was assigned to 0 if the head point of one object was located in the left of the center point and set to 1 if the head point sat in the right. Figure 4 draws the model structure of the direction prediction. To predict the direction of ships, the features of RRoIs were extracted by the VDR RoI Align with = 0, and were then sent to an FC layer, followed by the direction prediction, classification, and regression processes. The direction prediction module consisted of an FC layer with two output nodes, a softmax operation, and an argmax operation, and then generated the ship directions′α ′ . During the training stage, Cross-entropy Loss was adopted to minimize the error between the predicted directions and ground-truth directions. The total loss function can be written as: where denotes the ground-truth directions, ′ denotes the predicted direction, and is the hyper-parameter to control the weight of the direction prediction task.

Direction Prediction
In the HRSC2016 data set [36], the head points of ships were labeled, providing the chance to improve the detector by using direction information. We further exploited direction information by predicting the direction of ships. When labeling the direction of ships, the ship direction was assigned to 0 if the head point of one object was located in the left of the center point and set to 1 if the head point sat in the right. Figure 4 draws the model structure of the direction prediction. To predict the direction of ships, the features of RRoIs were extracted by the VDR RoI Align with τ = 0, and were then sent to an FC layer, followed by the direction prediction, classification, and regression processes. The direction prediction module consisted of an FC layer with two output nodes, a softmax operation, and an argmax operation, and then generated the ship directions' α . During the training stage, Cross-entropy Loss L dir was adopted to minimize the error between the predicted directions and ground-truth directions. The total loss function can be written as: where α gt denotes the ground-truth directions, α denotes the predicted direction, and γ is the hyper-parameter to control the weight of the direction prediction task.  Direction prediction was not first put out in our paper. In [17], the directions of rotated ships were represented by the orientation and direction , where ∈ [0, /2] and is a one-hot coding with four variables. They adopted an FC layer with 4 × output nodes to regress the direction , where denotes the class number of ships. We proposed to describe the directions of rotated objects by an orientation with the range of Direction prediction was not first put out in our paper. In [17], the directions of rotated ships were represented by the orientation θ and direction β, where θ ∈ [0, π/2] and β is a one-hot coding with four variables. They adopted an FC layer with 4 × n output nodes to regress the direction β, where n denotes the class number of ships. We proposed to describe the directions of rotated objects by an orientation θ with the range of [0, π) and a direction τ belonging to {0, 1}. Compared with [17], our direction prediction method was more efficient since we needed only an FC layer with two output nodes.

Direction Normalization
After obtaining the directions of ships, the direction information was utilized to normalize the direction of ship feature extraction, from bow to stern. As shown in Figure 5, we extracted the normalized features of RoI by VDR RoI Align with the feature extracting direction τ set to the predicted direction α . The directions of the original ship features can be from bow to stern or from stern to bow, but the directions of the normalized ship features were all from bow to stern. Instead of the original features of RoI, the normalized features were used for classification and regression. Therefore, the model needed not to identify the ship features from bow to stern and was able to focus on extracting efficient features. Direction prediction was not first put out in our paper. In [17], the directions of rotated ships were represented by the orientation and direction , where ∈ [0, /2] and is a one-hot coding with four variables. They adopted an FC layer with 4 × output nodes to regress the direction , where denotes the class number of ships. We proposed to describe the directions of rotated objects by an orientation with the range of [0, ) and a direction belonging to{0,1}. Compared with [17], our direction prediction method was more efficient since we needed only an FC layer with two output nodes.

Direction Normalization
After obtaining the directions of ships, the direction information was utilized to normalize the direction of ship feature extraction, from bow to stern. As shown in Figure 5, we extracted the normalized features of RoI by VDR RoI Align with the feature extracting direction set to the predicted direction ′ . The directions of the original ship features can be from bow to stern or from stern to bow, but the directions of the normalized ship features were all from bow to stern. Instead of the original features of RoI, the normalized features were used for classification and regression. Therefore, the model needed not to identify the ship features from bow to stern and was able to focus on extracting efficient features.

Experimental Results
We implemented the proposed methods based on the code of RoI Transformer (https://github.com/dingjiansw101/AerialDetection, accessed on 11 May 2021) and experimented with them on a server with GeForce 1080 and 8G memory. The model was trained on the HRSC2016 data set for 70 epochs with a batch size of 4. The learning rate started from 0.01 and decreased by 10 times at 50 and 60 epochs. ResNet50 serves as the backbone

Experimental Results
We implemented the proposed methods based on the code of RoI Transformer (https:// github.com/dingjiansw101/AerialDetection, accessed on 11 May 2021) and experimented with them on a server with GeForce 1080 and 8G memory. The model was trained on the HRSC2016 data set for 70 epochs with a batch size of 4. The learning rate started from 0.01 and decreased by 10 times at 50 and 60 epochs. ResNet50 serves as the backbone to extract feature maps from images with a size of 800 × 512, where 800 denotes the length of the long side and 512 denotes the length of the short side of images.
The HRSC2016 data set contains images with sizes ranging from 300 × 300 to 1500 × 900. There are three tasks in the HRSC2016 data set, including L1 task, L2 task, and L3 task. The L1 task categorizes all of the objects into "ship" class and is a one-class detection task. The L2 task categorizes all of the ships into four classes, consisting of the "ship", "warcraft", "merchant ship", and "aircraft carrier" classes. The ship classes in the L3 task are more detailed, containing 18 classes. Previous studies [12,14,16,17,35] experimented on one-class detection on the L1 task of the HRSC2016 data set. In this paper, we categorized the ships into three classes according to the L2 task of the HRSC2016 data set. Apart from keeping the original "warcraft" and "aircraft carrier" classes, we categorized the "merchant ship" and the other "ship" classes into the "civil ship" class since they share similar shapes.

Evaluation Metrics
We adopted the mean Average Precession (mAP) and direction accuracy to evaluate the performance of the proposed methods quantitatively. The mAP is widely used to estimate the quality of object detectors, and the PASCAL VOC2007 metric was adopted to compute mAP. If the IOU between a prediction and a ground-truth exceeded 0.5 and they own the same class label, the prediction was assigned to True Positive (TP) or the prediction would be marked as False Positive (FP). The ground-truths without corresponding TP predictions were labeled as False Negative (FN). The precision and recall were calculated by: If keeping only the predictions whose confidence exceeded a certain threshold, there would be different precession and recall under each threshold value. For the recalls' range from 0 to 1.0, 11 points in total, AP was calculated by the average of precision under each recall, and mAP was the means of APs over all the classes. To evaluate the performance of the direction prediction module, we defined direction accuracy by the ratio of the TPs with the same direction as the corresponding ground-truths to all of the TPs.

Direction Augmentation
Flip augmentation is similar to the direction augmentation, so we compared the flip augmentation with the proposed direction augmentation to investigate the effectiveness of direction augmentation. Table 1 presents the results of flip augmentation and direction augmentation. The original loss, minimum loss, and summary loss represent without direction augmentation, minimum loss for direction augmentation, and summary loss for direction augmentation during training. Direction augmentation testing symbolizes whether to utilize direction augmentation during testing. From lines 3 and 5 of Table 1, the minimum loss for direction augmentation training brought an improvement of 3.06% mAP to the original model, higher than the gain of 0.92% mAP of the summary loss, and the minimum loss was, therefore, selected for direction augmentation training.
From lines 1, 2, 3, and 7, the flip augmentation, direction augmentation training, and direction augmentation testing could all boost the detection mAP, increasing by 1.47%, 3.06%, and 0.65%, respectively. From lines 3, 4, 7, and 8, direction augmentation training outperformed flip augmentation when they were used separately or combined with direction augmentation testing. From lines 3 and 9, the flip augmentation improved the behavior of direction augmentation training slightly, from 86.22% to 86.33%, and the improvement was less than the improvement of adopting flip augmentation only, from 83.16% to 84.63%. It can be explained that the flip augmentation shared similarities with the direction augmentation, so their improvements were coupled. From lines 1 and 10, when the flip augmentation, direction augmentation training, and direction augmentation testing were adopted simultaneously, the detection mAP reached the peak, rising from 83.16% to 86.92%. The set of ablation experiments demonstrated the effectiveness of the direction augmentation method.
During training, the flip augmentation cost 4.62% extra training time and the direction augmentation cost 5.28% training time. The model sizes of flip augmentation and direction augmentation were the same as the original model. During testing, we directly flipped the RRoI features for direction augmentation, costing 11.11% extra testing time.
The ground truths, detection results without direction augmentation training (line 1), and detection results with direction augmentation training (line 3) are shown in Figure 6, and the ships with a confidence lower than 0.4 were treated as background and removed. In the first row, the model without direction augmentation training missed three civil ships and the model with direction augmentation training detected the three ships with high confidence. In the second row, the model without direction augmentation training wrongly treated a warcraft as a civil ship and the model with direction augmentation training correctly identified the class of the warcraft. In the third row, the model without direction augmentation training located only half of a civil ship and the model with direction augmentation training located the entire civil ship. In the fourth row, the model without direction augmentation training mistook the background as a civil ship and the model In the first row, the model without direction augmentation training missed three civil ships and the model with direction augmentation training detected the three ships with high confidence. In the second row, the model without direction augmentation training wrongly treated a warcraft as a civil ship and the model with direction augmentation training correctly identified the class of the warcraft. In the third row, the model without direction augmentation training located only half of a civil ship and the model with direction augmentation training located the entire civil ship. In the fourth row, the model without direction augmentation training mistook the background as a civil ship and the model with direction augmentation training identified and removed the background. The detection results of the model without and with direction augmentation proved the superiority of the proposed direction augmentation.

Direction Prediction
The highest direction accuracy and highest detection mAP under different weights of direction task γ are listed in Table 2. Only flip augmentation was adopted here. From Table 2, the direction prediction accuracy increased with γ becoming bigger and reached the highest 96.37% when γ = 2. When γ = 5, the model could not generate TP predictions and the direction accuracy was 0. The direction accuracy results presented that the weight of direction task benefitted direction accuracy, but when the weight became too large, the model failed to converge. When γ was below 1, the ship detection mAP increased slightly with γ increases, indicating that the direction prediction task had a positive impact on the ship detection task. When γ exceeded 2, the detection mAP declined distinctly, and the model diverged when γ = 5. The results revealed that the direction prediction task was feasible and boosted the detection mAP by 0.97%.
The ship detection results of adding the direction prediction task are shown in Figure 7, and the triangle direction denotes ship direction. It can be seen that most of the ship directions, including civil ship, warcraft, and aircraft carrier, could be correctly predicted.  When was below 1, the ship detection mAP increased slightly with increases, indicating that the direction prediction task had a positive impact on the ship detection task. When exceeded 2, the detection mAP declined distinctly, and the model diverged when = 5. The results revealed that the direction prediction task was feasible and boosted the detection mAP by 0.97%.
The ship detection results of adding the direction prediction task are shown in Figure  7, and the triangle direction denotes ship direction. It can be seen that most of the ship directions, including civil ship, warcraft, and aircraft carrier, could be correctly predicted.

Direction Normalization
The direction information was utilized to normalize the ship features and improve the ship detection accuracy. The detection mAP and corresponding direction accuracy under various are presented in Table 3. When setting to0, it meant without direction normalization and the original features were used for ship detection. As listed in Table 3, the direction normalization boosted detection mAP, and setting to 1 brought a 3.38%

Direction Normalization
The direction information was utilized to normalize the ship features and improve the ship detection accuracy. The detection mAP and corresponding direction accuracy under various γ are presented in Table 3. When setting γ to 0, it meant without direction normalization and the original features were used for ship detection. As listed in Table 3, the direction normalization boosted detection mAP, and setting γ to 1 brought a 3.38% improvement to the detection mAP. When smaller than 1, a larger γ brought more improvement to detection mAP. The detection mAP decreased when γ was larger than 1, and the model could not converge when γ = 5, indicating that a too-large L dir damaged other tasks such as classification and regression. The direction normalization cost 22.22% extra testing time than the model without direction normalization. The mAP results demonstrated the effectiveness of the direction normalization method.   In the first row, the model without direction normalization mistook the background for a civil ship and the direction normalization model successfully removed the background. In the second row, the model without direction normalization wrongly regarded a civil ship as an aircraft carrier and the direction normalization model correctly identified the class of the civil ship. Furthermore, the direction normalization model located the ship In the first row, the model without direction normalization mistook the background for a civil ship and the direction normalization model successfully removed the background. In the second row, the model without direction normalization wrongly regarded a civil ship as an aircraft carrier and the direction normalization model correctly identified the class of the civil ship. Furthermore, the direction normalization model located the ship more tightly. In the third row, the civil ship detected by the direction normalization model scored higher than that of the model without direction normalization. In the third row, the direction normalization model located the warcraft better than that without direction normalization. The results demonstrated the advantages of direction normalization over the original model.
To validate whether the normalized ship features benefited ship detection, we experimented on the direction of ship feature extraction based on the result of 88.22% detections, and the results are listed in Table 4. If setting the directions of ship feature extraction opposite to the predicted ship directions during testing, the detection mAP declined remarkably, as presented in the third row, from 88.22% to 84.15%. The results revealed that the detector preferred the normalized ship features extracted using the ship directions. If removing the direction normalization and setting the feature extraction direction as random, the detection mAP reached 87.47%, as listed in the fourth row, which exceeded the results of γ = 0 by 2.71%. The two models both extracted the ship features in random directions during testing, but the direction normalization training performed much better than that of the original model, which indicated that the direction normalization training was capable of extracting better ship features.
We further experimented with the combination of direction augmentation and direction normalization. During testing, combining the normalized features extracted by the predicted ship directions and the opposite directions decreased the detection mAP from 88.22% to 87.96%, as listed in the fifth row. The detection precision declined from 51.08% to 44.78% because the detections using the opposite feature directions were included and they introduced plenty of False Positives. Meanwhile, the recall dropped off, meaning that the detections using the opposite directions made some True Positives abandoned. The results indicated that the direction augmentation and the direction normalization were mutually exclusive.

Comparasion with SOTA
We compared the proposed method with other SOTA methods on the L1 task of the HRSC2016 data set, and the results are listed in Table 5.
Our proposed direction augmentation and direction normalization were used to experiment on the RoI transformer baseline. On ResNet101 backbone, the direction augmentation boosted the detection mAP from 86.2% to 90.4%, and the direction normalization boosted the detection mAP from 86.2% to 90.6%. The performances of the proposed methods both exceeded the other state-of-the-art methods, demonstrating the effectiveness of utilizing ship direction information when detecting ships in remote sensing.

Discussion
Through comprehensive experiments and comparisons, the proposed methods that further exploit direction information were proven to be beneficial for ship detection. Inspired by flip augmentation, the direction augmentation was proposed to augment the features of ship RRoIs. When adopted during model training, direction augmentation cost 5.28% extra training time and did not influence the testing time, but increased the detection mAP from 83.16% to 86.22%, which was higher than an improvement of 1.47% caused by the flip augmentation. We then compared the similarities and differences of the two augmentation methods. Flip augmentation during training transported objects from one place to another place, and the ship directions were flipped meanwhile. The direction augmentation doubled the ship features with an opposite direction, and the features with a better prediction result were used for back propagations. The two augmentations could both exchange the ship feature directions, bringing great diversities to the training data set. The flip augmentation changed the location of ships, and the features for classification and regression may be different because the corresponding RoI generated by the RPN may change. However, the direction augmentation only flipped the feature direction, and the corresponding RoI remained the same. Furthermore, the direction augmentation chose the minimum loss features of the two directions for back propagations, while the flip augmentation randomly chose a direction. These factors distinguished the detection results of the direction augmentation from that of the flip augmentation.
To further prove the effects of direction augmentation, we adopted the COCO error analysis tools in mmdetetcion to rotated bounding boxes and compared the errors between the model without and with direction augmentation. The COCO error analysis tools adopted the PASCAL VOC2012 metric to calculate the mAP. Therefore, the detection mAP was higher than the previous results but kept the same tendency. Figure 9 shows the error analysis results of the model without and with direction augmentation.
In Figure 9, "C50" denotes that the mAP was calculated under the IOU threshold of 0.5 when deciding TPs and FPs. "Loc" denotes the errors caused by localization, including the boxes classified correctly but localized incorrectly. The localization errors in the modes without and with direction augmentation were the same and took up merely 0.7% (0.860 minus 0.853). "Oth" denotes the boxes classified incorrectly. In the model without direction augmentation, 3.4% of the objects were classified incorrectly, but the objects wrongly classified decreased to 2.4% when applying the direction augmentation. "BG" denotes the backgrounds that were wrongly regarded as ships by the detectors and would reduce precision. The backgrounds in the model without direction augmentation caused a 3.7% reduction in detection mAP but descended to 3.1% when adopting the direction augmentation. Furthermore, the model with direction augmentation recalled 96.7% of the ships, but the model without direction augmentation omitted 6.9% of the ships. Therefore, the direction augmentation method helped to classify and recall ships. results of the direction augmentation from that of the flip augmentation.
To further prove the effects of direction augmentation, we adopted the COCO error analysis tools in mmdetetcion to rotated bounding boxes and compared the errors between the model without and with direction augmentation. The COCO error analysis tools adopted the PASCAL VOC2012 metric to calculate the mAP. Therefore, the detection mAP was higher than the previous results but kept the same tendency. Figure 9 shows the error analysis results of the model without and with direction augmentation. In Figure 9, "C50" denotes that the mAP was calculated under the IOU threshold of 0.5 when deciding TPs and FPs. "Loc" denotes the errors caused by localization, including the boxes classified correctly but localized incorrectly. The localization errors in the modes without and with direction augmentation were the same and took up merely 0.7% (0.860 minus 0.853). "Oth" denotes the boxes classified incorrectly. In the model without direction augmentation, 3.4% of the objects were classified incorrectly, but the objects wrongly classified decreased to 2.4% when applying the direction augmentation. "BG" denotes the backgrounds that were wrongly regarded as ships by the detectors and would reduce precision. The backgrounds in the model without direction augmentation caused a 3.7% Introducing an additional direction prediction task can slightly improve the detection mAP. It can be interpreted that the direction prediction task provided more prior knowledge and helped to eliminate noise, therefore preventing overfitting and improving the detection accuracy. The model diverged when the weight of the direction prediction task became too large. When setting γ to 1, the direction prediction loss accounted for over 80% of the total loss. Hence, setting γ to 5 was equivalent to enlarging the learning rate of the model by 4 times. A large learning rate can easily result in diverging, and the model, therefore, did not converge when γ exceeded 5.
Normalizing the feature extraction direction for ship detection boosted the detection mAP significantly, from 84.76% to 88.22%. It can be interpreted that the model could more easily distinguish the normalized ship features. Figure 10 shows the error analysis results of the models without and with direction normalization. The localization error in the model without direction normalization resulted in 2.30% mAP decreases, but the localization error in the direction normalization model was 1.00%. The localization errors revealed that the direction normalization could better locate ships and, therefore, caused fewer localization errors. In the models without and with the direction normalization, the class errors were 2.40% and 1.60%, respectively, indicating that the direction normalization could better classify ships. The model with direction normalization introduced fewer backgrounds than the model without direction normalization, 3.60% vs. 4.00%. On the other hand, the model with direction normalization recalled 97.70% of the ships, while the original model recalled 96.40% of the ships. In conclusion, the direction normalization method introduced fewer errors and recalled more ships, demonstrating that the normalized ship features benefitted ship detection.
The arbitrary orientations problem required the ship detection model to learn the ship features of various orientations. The direction augmentation augmented the ship features and provided more data for the model to learn. The direction normalization normalized and simplified the ship features, making the model easier to learn. The two methods were mutually exclusive but both benefitted detection. class errors were 2.40% and 1.60%, respectively, indicating that the direction normalization could better classify ships. The model with direction normalization introduced fewer backgrounds than the model without direction normalization, 3.60% vs. 4.00%. On the other hand, the model with direction normalization recalled 97.70% of the ships, while the original model recalled 96.40% of the ships. In conclusion, the direction normalization method introduced fewer errors and recalled more ships, demonstrating that the normalized ship features benefitted ship detection. The arbitrary orientations problem required the ship detection model to learn the ship features of various orientations. The direction augmentation augmented the ship features and provided more data for the model to learn. The direction normalization normalized and simplified the ship features, making the model easier to learn. The two methods were mutually exclusive but both benefitted detection.

Conclusions
In this paper, we proposed to exploit the ship direction information to handle the arbitrary orientations' problem and improve ship detection accuracy, including direction

Conclusions
In this paper, we proposed to exploit the ship direction information to handle the arbitrary orientations' problem and improve ship detection accuracy, including direction augmentation, direction prediction, and direction normalization. This is the first paper that pays attention to the feature extraction direction of ships and utilizes the ship directions for ship detection. We designed a Variable-Direction Rotated RoI Align module to augment and normalize direction information, which required an additional input, τ, to decide the direction of extracting RRoI features. Our main conclusions are as follows.
(1) We proposed the direction augmentation method that augmented the RRoI features from the opposite feature extraction direction by the proposed VDR RoI Align module. The direction augmentation methods provided more data for the detector to fit the ship features of two possible directions. When applied during training, the direction augmentation improved detection mAP by 3.06%, while costing 5.28% extra training time. (2) We added a direction prediction task to predict the direction of ships, which helped to reduce noise and boosted ship detection accuracy by 0.97%. The ship direction accuracy reached the highest, of 96.37%, when setting the weight of the direction prediction task to 2. (3) We normalized the ship directions from bow to stern by inputting the predicted ship directions to the VDR RoI Align module when extracting the features of ship RRoIs. The normalized ship features were easier for the detector to identify compared with the original features of two possible directions. The direction normalization method boosted the ship detection mAP from 84.76% to 88.22%. (4) The direction augmentation and direction normalization achieved the mAP of 90. 4% and 90.6% on the L1 task of the HRSC2016 data set, which significantly boosted the baseline performance and reached the start-of-the-art performance.