Fine-Grained Recognition of Surface Targets with Limited Data
Round 1
Reviewer 1 Report
The work is an interesting review of classification techniques using neural networks. However, the experimental part, which is not fully objective, raises doubts. I would like to make the following comments:
- First of all, there is a lack of precise determination of the size of individual classes in the test systems, which can be easily determined from the level of the code of the pytorch. By determining the size of the classes it is possible to determine whether the system is balanced.
- I am convinced that the used test systems are unbalanced after reading the comment 'each category contains only 100-200 original images. " The problem of using total accuracy to assess the performance of the network is related to the above point. Since the use of a total value instead of a balanced one, in terms of the average efficiency of individual classes (balanced accuracy), leads to inaccurate results.
- The authors should count the number of correctly classified objects in particular classes and divide these values by their sizes, finally drawing the average from the accuracy of classification of all classes.
-
Another big mistake is to carry out a single test and treat the result as a final one that can be compared with. It is necessary to carry out multiple tests Train and Test, with a random split into training and validation part - the Monte Carlo Cross Validation test. Draw the same lines as those shown in Figure 11, graphs from all five subtests and present the average results as the final result. In addition the standard deviation of the results for the individual learning iterations should be calculated and presented.
- The results that are presented in the paper do not in any way oblige authors to write that they are better than other methods, because they could have been obtained at random.
- Only after a multiple test and the standard deviation is given, can we talk about comparable results. And after many tests and statistical tests, one can say that one technique is better than another.
- After a single test, there can be no question that the method is stable, either, it has to be seen on the results from multiple tests. Presenting the confusion matrix as the final result only makes sense here if it is a matrix with average results,
- When presenting the results from individual sub-tests, the accuracy of the classification and the corresponding value of the loss function should be presented. It is best to show it on the training system and on the validation system so that the level of overtraining is visible.
- The codes used in the experimental part should be at least partly placed in the work, and at least the parts defining the networks, so that other authors can compare themselves with the proposed technique. It is also a natural step towards the same goal to make the data used in the experiments available.
There are still some less important bugs at work:
- please correct word "ateetion",
- please correct word Adams,
- define in detail MAMC
The work is interesting and I think it has great potential, but the experimental part needs to be improved. As it stands, it is impossible to verify whether the method is good.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Overall, I am quite positive about this submission. Most importantly, the method proposed by the authors is sound and sufficiently novel, and the empirical evaluation is reasonably convincing too.
Having said the above, before I can recommend the submission for acceptance, there are a number of presentation related issues which must be corrected. At present, there are a number of statements which are unclear and questionable, and these unnecessarily damage the credibility and overall quality of the work. Here are some examples and issues which really should be addressed:
- "...people have proposed various algorithms for surface target recognition" - The use of the word "people" here is really awkward. The sentence would read much better is phrased in passive voice.
- "...are more susceptible to factors,..." - This makes no sense. Something becomes a factor by its effect, making this sentence between something nebulous and a truism (neither being a good thing).
- "...water scene are more susceptible to factors, such as illumination, occlusion, and weather." - This claim is entirely untrue. There is no reason why somehow water scenes would be more challenging (if anything, one could argue the opposite, as the background is much easier to deal with). Why would illumination affect water scenes more than land ones? It does not. Occlusion? Again, if anything, it is the other way round, with overhead bridges, tree canopies, etc. all being problematic in common land scenes. This sentence should be removed or its meaning changed.
- "...untrained experts." - Again, this makes no sense. How can one be an expert if they are untrained?
- "In fact, it can be seen in Figure 3(b) that manually captured images are more susceptible to weather and shooting angle." - More than what? This sentence too is not very well thought-through.
- "...the number of network layers is getting deeper and deeper,..." - Clumsy phrasing. The number of layers may be increasing, it is the network that is getting deeper (by virtue of the former).
- "Deep learning is a data-driven algorithm." - Deep learning is not an algorithm!
- "The following figure shows an example of data enhancement for the training set." - Firstly, the figure does not show data "enhancement" but data augmentation examples. Secondly, the authors should reference related work in this area which has long ago proposed and used the same idea with success, e.g. "Automatic vehicle tracking and recognition from aerial image sequences" (AVSS 2015)
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Thank you for making the corrections, the current version of the work is acceptable.
Reviewer 2 Report
I am happy with the authors' revisions which adequately address my concerns. Good job!