Next Article in Journal
Impacts of the Tropical Cyclone Idai in Mozambique: A Multi-Temporal Landsat Satellite Imagery Analysis
Previous Article in Journal
Estimation of Urban Ecosystem Services Value: A Case Study of Chengdu, Southwestern China
 
 
Article
Peer-Review Record

Cascade Object Detection and Remote Sensing Object Detection Method Based on Trainable Activation Function

Remote Sens. 2021, 13(2), 200; https://doi.org/10.3390/rs13020200
by S. N. Shivappriya 1, M. Jasmine Pemeena Priyadarsini 2, Andrzej Stateczny 3,*, C. Puttamadappa 4 and B. D. Parameshachari 5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2021, 13(2), 200; https://doi.org/10.3390/rs13020200
Submission received: 9 November 2020 / Revised: 30 December 2020 / Accepted: 5 January 2021 / Published: 8 January 2021

Round 1

Reviewer 1 Report

The Keywords do not reflect the main content of abstract, for example, “Fourier series and linear combination” are not mentioned in abstract. The MS COCO and Pascal datasets are not necessary to include in keywords. Related with the last paragraph, what is the object detection in remote sensing proposed by the authors?

In Introduction, the authors present the objectives of paper. In my opinion, it is better to present the highlights of paper.

In line 105: “the cascade R-NN” --> the cascade R-CNN. Please check all terms in the paper.

Improve the Table 1, visually the values are difficult to read. I agree with the experimental results, these demonstrate the robustness of proposed method against others. 

A discussion of results should be presented in a new section.

Author Response

Response for Reviewer 1.

  1. The Keywords do not reflect the main content of abstract, for example, “Fourier series and linear combination” are not mentioned in abstract. The MS COCO and Pascal datasets are not necessary to include in keywords. Related with the last paragraph, what is the object detection in remote sensing proposed by the authors? In Introduction, the authors present the objectives of paper. In my opinion, it is better to present the highlights of paper. In line 105: “the cascade R-NN” --> the cascade R-CNN. Please check all terms in the paper. Improve the Table 1, visually the values are difficult to read. I agree with the experimental results, these demonstrate the robustness of proposed method against others. A discussion of results should be presented in a new section.

Answer

Thank you for your constructive comments

Keywords are modified to reflect the main contents of abstract. The MS COCO and Pascal datasets are removed from the keywords.

The objects detected in remote sensing by proposed method are given in Introduction as “In remote sensing object detection, objects such as airplane, ships, storage tanks, baseball diamonds, tennis courts, basketball courts, ground track fields, harbors, bridges, and vehicles are detected by the proposed AAF-Faster RCNN method.”.

The objectives of the paper are re-written as highlights and contribution of the paper.

Term “the cascade R-NN” is changed as “R-CNN”. Terms in the paper verified.

Discussion of the result is provided as

In section 5.1

“The Fourier series linear combination of activation function is used in the proposed AAF-Faster RCNN model to improve the efficiency of the detection. The proposed AAF-Faster RCNN model uses the cascade detection to increases the learning rate and effectively detect the small object. Existing methods [16, 39, 40] have overfitting problem in cascaded object detection. The proposed model overcame the overfitting problem using good convergence and clear bounded variation in the analysis.”

In section 5.2

“The proposed AAF-Faster RCNN model uses the reward function to improve the learning based on the Fourier series and linear combination of activation function. Cascade object detection method is used in proposed AAF-Faster RCNN model to effectively detect the small object. Existing models [16, 40] have limitation of doesn’t fully account the samples diversity and lower efficiency. The proposed AAF-Faster RCNN model consider samples diversity based on cascade object detection.”

In section 5.3

“The proposed AAF-Faster RCNN model uses the cascade learning method to improve the learning model. The Fourier series and linear combination of activation function is used in proposed AAF-Faster RCNN model is used to improve the efficiency of object detection. Existing detection models [16, 40] have limitation of overfitting due to vanishing sample for large threshold. The proposed AAF-Faster RCNN model uses reward function to improve the learning and solve the overfitting problem.”

In section 5.4

“The proposed AAF-Faster RCNN model applies the Fourier series and linear combination of activation function to effectively detect the small object in the dataset. Cascade object detection improve the learning performance of proposed AAF-Faster RCNN model. Existing model [16] doesn’t fully account the samples diversity, thus leaving more precision location and small objects. The proposed AAF-Faster RCNN model uses loss function to consider samples diversity that improves the efficiency.”

In section 5.4.1

“In remote sensing images, the objects are present in various scale and existing models have lower efficiency due to poor localization. The proposed AAF-Faster RCNN model uses Fourier series and linear activation function to improve the localization.”

Reviewer 2 Report

The manuscript must be improved. Some tabless pass from one page to another and is difficult to read the information. The results are not well described. In the chapter 5.4.1 confirm that the:" The proposed AAF – Faster RCNN model has
512 mAP value of 95.21 % and existing MS-OPN with fuse method has value of 94.87 % mAP"

and in conclussions are:

"The AAF-Faster RCNN model has the higher mAP of 83.1% and existing PAT-SSD512 method 525 has 81.7 % mAP." 

Is confused. This is only an example.

 

Author Response

Reviewer 2:

  1. The manuscript must be improved. Some tables pass from one page to another and is difficult to read the information. The results are not well described. In the chapter 5.4.1 confirm that the:" The proposed AAF – Faster RCNN model has 512 mAP value of 95.21 % and existing MS-OPN with fuse method has value of 94.87 % mAP" and in conclusions are: "The AAF-Faster RCNN model has the higher mAP of 83.1% and existing PAT-SSD512 method 525 has 81.7 % mAP." Is confused. This is only an example.

Answer

Thank you for your constructive comment

 

The results are clearly described as

In abstract

The AAF-Faster RCNN model has mean Average Precision (mAP) of 83.1 % and existing PAT-SSD512 method has the 81.7 % mAP in Pascal VOC 2007 dataset.

In section 5.1

The proposed AAF-Faster RCNN method achieves the mAP of 83.1 % and existing PAT-SSD512 method achieves 81.7 % mAP in Pascal VOC 2007 dataset.

“The Fourier series linear combination of activation function is used in the proposed AAF-Faster RCNN model to improve the efficiency of the detection. The proposed AAF-Faster RCNN model uses the cascade detection to increases the learning rate and effectively detect the small object. Existing methods [16, 39, 40] have overfitting problem in cascaded object detection. The proposed model overcame the overfitting problem using good convergence and clear bounded variation in the analysis.”

In section 5.2

The proposed AAF-Faster RCNN model has the mAP of 81.11 % and state of art method of PAT-SSD512 method has 80.6 % mAP value in PASCAL VOC 2012 dataset.

“The proposed AAF-Faster RCNN model uses the reward function to improve the learning based on the Fourier series and linear combination of activation function. Cascade object detection method is used in proposed AAF-Faster RCNN model to effectively detect the small object. Existing models [16, 40] have limitation of doesn’t fully account the samples diversity and lower efficiency. The proposed AAF-Faster RCNN model consider samples diversity based on cascade object detection.”

 

In section 5.3

The proposed AAF-Faster RCNN method has the mAP of 43.4 % in Medium area and existing PAT-SSD 800 method has 41.5 % mAP in MS COCO dataset.

“The proposed AAF-Faster RCNN model uses the cascade learning method to improve the learning model. The Fourier series and linear combination of activation function is used in proposed AAF-Faster RCNN model is used to improve the efficiency of object detection. Existing detection models [16, 40] have limitation of overfitting due to vanishing sample for large threshold. The proposed AAF-Faster RCNN model uses reward function to improve the learning and solve the overfitting problem.”

In section 5.4

The sensitivity of proposed AAF-Faster RCNN method is 84 % for Medium area and existing PAT-SSD300 method has 83 % sensitivity on small object detection.

“The proposed AAF-Faster RCNN model applies the Fourier series and linear combination of activation function to effectively detect the small object in the dataset. Cascade object detection improve the learning performance of proposed AAF-Faster RCNN model. Existing model [16] doesn’t fully account the samples diversity, thus leaving more precision location and small objects. The proposed AAF-Faster RCNN model uses loss function to consider samples diversity that improves the efficiency.”

In section 5.4.1

“In remote sensing images, the objects are present in various scale and existing models have lower efficiency due to poor localization. The proposed AAF-Faster RCNN model uses Fourier series and linear activation function to improve the localization.”

The proposed AAF – Faster RCNN model has mAP value of 95.21 % and existing MS-OPN with fuse method has value of 94.87 % mAP in NWPU VHR-10 dataset.

In Conclusion

The AAF-Faster RCNN model has the higher mAP of 83.1% and existing PAT-SSD512 method has 81.7 % mAP in Pascal VOC 2007 dataset.

Reviewer 3 Report

  1. In the Introduction section, the moving object detection is related. But in the following section, no related problems are discussed and solved. The object detection in video and in image are two different problems, so it is better not to confuse them.
  2. Fig 1 is not very clear. From this fig, I cannot understand the relationship of each part. And where is the output, and the proposed part is embedded where? Please redraw a new flowchart and give a detailed description.
  3. there are also some references about detection small objects via improved faster-RCCN, like "Squeeze and excitation rank faster R-CNN for ship detection in SAR images, IEEE GRSL, 2018, Dense attention pyramid networks for multi-scale ship detection in SAR images IEEE TGRS 2020. The difference of each version of faster-RCNN should be introduced.
  4. The best performance of network with Fourier series as activation function has higher performance than any activation function. I am doubt about it, and why you give this conclusion? Besides, most of time, the objective signals are not perioded, if Fourier series is used, and in this paper, the rank is chosen 5, there must be some loss. How do you solve this problem?
  5. linearly combine the multiple functions, how to make sure the rang of each function is similar?
  6. Please cancel all the subjective words, like ‘good’. All the description should be objective.
  7. NWPU VHR-10 dataset, how many samples for training and test?
  8. Table 1 is hard to read, please change the format.
  9. The real detection results of comparison should be also given, only the tables are not enough. It is better to show some error detection results.

Author Response

Reviewer 3: 

  1. In the Introduction section, the moving object detection is related. But in the following section, no related problems are discussed and solved. The object detection in video and in image are two different problems, so it is better not to confuse them.

Answer

            Thank you for constructive comments

            Moving object detection is corrected in the Introduction.

--------------------------------------------------------------------------------------------------------------

  1. Fig 1 is not very clear. From this fig, I cannot understand the relationship of each part. And where is the output, and the proposed part is embedded where? Please redraw a new flowchart and give a detailed description.

Answer

            Thank you for constructive comments

            Figure 1 is re-drawn, the output and proposed part is clear given in the diagram, and detailed description is provided for the figure 1.

--------------------------------------------------------------------------------------------------------------

  1. there are also some references about detection small objects via improved faster-RCCN, like "Squeeze and excitation rank faster R-CNN for ship detection in SAR images, IEEE GRSL, 2018, Dense attention pyramid networks for multi-scale ship detection in SAR images IEEE TGRS 2020. The difference of each version of faster-RCNN should be introduced.

Answer

            Thank you for constructive comments

            The difference between the faster-RCNN methods is described in the section 2.

            The difference between existing faster R-CNN is provided in section 2.1 as

Differ from Existing Faster R-CNN model: The faster R-CNN model of squeeze and excitation method [28] has the limitation of the second-stage classification. The DAPN with R-CNN [29] learning performance is affected by the small object and high loss in the activation function. The proposed AAF-Faster R-CNN method applies linear combination of three activation function to improve the learning. The proposed AAF-Faster R-CNN uses the positive and negative reward update in the feature map to solve the problem of overfitting.”

--------------------------------------------------------------------------------------------------------------

  1. The best performance of network with Fourier series as activation function has higher performance than any activation function. I am doubt about it, and why you give this conclusion? Besides, most of time, the objective signals are not perioded, if Fourier series is used, and in this paper, the rank is chosen 5, there must be some loss. How do you solve this problem?

Answer

            Thank you for constructive comments

            The sentence is modified as “The best performance of network with Fourier series as activation function has higher performance than Sigmoid, ReLU and tanh activation function.”

Reason is given as

“Fourier series activation function satisfy the Dirichlet Fourier series conditions and Fourier series can represent any function that is suitable for activation function (e.g. ReLU, Sigmoid, and tanh). Hence the performance of Fourier series activation function have higher performance than ReLU, Sigmoid and tanh function.”

How the loss is handled

“The positive and negative reward function is used in the cascade learning to handle the loss in the activation function.”

--------------------------------------------------------------------------------------------------------------

  1. linearly combine the multiple functions, how to make sure the rang of each function is similar?

Answer

            Thank you for constructive comments

The sentence is presented in section 3.4.1 before Eq. (12)

            The gradient descent algorithm normalize the range of the activation function and helps to compute the activation function easily. The gradient descent method normalizes the value of three activation function into the range of 0 to 1 to update the loss function.

--------------------------------------------------------------------------------------------------------------

  1. Please cancel all the subjective words, like ‘good’. All the description should be objective.

Answer

            Thank you for constructive comments

            All the subjective words are replaced with suitable objective words.

--------------------------------------------------------------------------------------------------------------

  1. NWPU VHR-10 dataset, how many samples for training and test?

Answer

            Thank you for constructive comments

            The answer is provided in section 4 as “In NWPU VHR-10 dataset evaluation, 60% is used for training and 40% is used for testing. Generally, training-testing of 80-20 was applied to estimate the performance of the object detection model. The training-testing of 60-40 is used to evaluate the efficiency of the proposed AAF-Faster RCNN mode.”

--------------------------------------------------------------------------------------------------------------

  1. Table 1 is hard to read, please change the format.

Answer

            Thank you for constructive comments

            Table 2 format has been changed and presented in readable manner.

--------------------------------------------------------------------------------------------------------------

  1. The real detection results of comparison should be also given, only the tables are not enough. It is better to show some error detection results.

Answer

            Thank you for constructive comments

            Detection error of the proposed model images is provided in Figure 6 and Figure 7. The description of the detection error is provided below Figure 6 and 7.

Round 2

Reviewer 3 Report

I am satisfied with the response.

one little comment: after one equation, if there is the word 'where' 'with', it means that the sentence is not finished, so the word 'where' 'with' should be at the top of each line and the first letter should not be capitalized.

Back to TopTop