Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Improving Semi-Supervised Image Classification by Assigning Different Weights to Correctly and Incorrectly Classified Samples

Appl. Sci. 2022, 12(23), 11915; https://doi.org/10.3390/app122311915

by Xu Zhang¹, Huan Zhang², Xinyue Zhang³, Xinyue Zhang⁴

, Cheng Zhen⁵

, Tianguo Yuan⁶ and Jiande Wu^1,*

Reviewer 1:

Soheila Nazari

Reviewer 2:

Olaide Oyelade

Reviewer 3:

Siyamalan Manivannan

Appl. Sci. 2022, 12(23), 11915; https://doi.org/10.3390/app122311915

Submission received: 26 September 2022 / Revised: 9 November 2022 / Accepted: 17 November 2022 / Published: 22 November 2022

(This article belongs to the Special Issue Big Data Analysis and Management Based on Deep Learning)

Round 1

Reviewer 1 Report

Good paper, interesting coding approach and decent accuracies but paper needs some language revision, sometimes the ideas are not clear.

Author Response

Hello, dear reviewer! I appreciate your taking the time to review my manuscript. After seeing your suggestions, I carefully reviewed my paper and changed the English language. Thank you very much for your advice.

Author Response File: Author Response.pdf

Reviewer 2 Report

Improving Consistency-Based Semi-Supervised Learning by Assigning Different Weights to Correctly and Incorrectly Classified Samples. The following are comments for the authors to improve the manuscript:

1. The title of the manuscript needs some rewording. The phrase ‘Improving Consistency-Based’ suggests that other studies have worked on the approached described in the manuscript.

2. There is lack of clarity in the sentence ‘correct our model parameters by mapping the sample labels as 'benchmark' compared with the corresponding sample features within the network’. This must be clearly stated to allow for understanding what done in the study.

3. Technically, the discriminator is expected to have a classifier. But the addition of a classifier close to the discriminator in Fig 2 is very confusing since the classifier embedded with discriminator should perform the expected operation. This must be corrected.

4. The use of sentences like ‘Its loss function is defined as follows’ to reference equations (in the case of equations 3, 4, 5, 6 and many others) is not valid. Authors must always reference equations by their numbers in in-text mentions.

5. The use of functions like BCE() and CE() in equations (5) and (6) is not clear as there is no definition to those functions.

6. When describing the loss of D and G, there is no place for mentioning loss of C (classifier) since no loss values are computed in C. and this is why I maintain that no need for C component in the model since C (classifier) is already part of D.

7. The loss obtained in the student model is grouped into two: L_c and L_ic. Please a detailed explanation of how these loss values are generated and combined is required.

8. The use of some notations in equations (8-10) makes it impossible to read the equations and understand what is being computed. For instance, what is the meaning of the notation ‘//’?

9. The condition used in Fig 6 which leads to a ‘Yes’ or ‘No’ is not useful at all since the outcome of the computation are not used anywhere. Take for example, when the condition is ‘Yes’, two boxes are connected, but after those two boxes, an input LS is feedback into the second box again. The same error is repeated in equation 7. Moreover, how is it possible to compare the probability of predicted labels p_t(max) and a label p_t(L)?

10. Where is the applicability of the total loss values Loss=L_c+ L_ic +L_iu? Moreover, what and how are the loss values used in the system to correct or adjust network weight?

11. It appears there is no architectural difference between the student and teacher models. Any further motivation for such duplication?

12. Authors said that ‘?_? = (1 − ??(?)) 2 is the weight that should be given to samples judged correctly, and ?_? = 1 is the weight given to samples judged incorrectly’. Samples do not have weights, rather it is network models which have weights.

13. The use of WRN-28-2 as the backbone network is not clear since student and teacher models have been specified as the choice networks used. In fact this is now a much more reason why the network structure of both student and teacher be clearly outlined.

14. The sentence ‘However, it can be seen from Table 1 that as the number of labeled samples increases, the accuracy of the model improves less and less significant in the test set’ is contradictory. How can something be said to improve and at the same time be less and less significant?

15. Why was another graph imposed on the major graph of Fig 10? Authors can separate this and give clear explanation for each graph.

Author Response

Hello, dear reviewer! I appreciate your taking the time to review my manuscript. After seeing the suggestions you gave me, I have carefully reviewed my paper and made the following changes.
1. I revised my title to "Improving Semi-Supervised image classification by Assigning Different Weights to Correctly and Incorrectly Classified Samples."
2. I changed the sentence "correct our model parameters by mapping the sample labels as 'benchmark' compared with the corresponding sample features within the network" to "the sample label mapping as the "benchmark" is compared to the corresponding sample features in the network as an additional loss to complement the original supervisory loss, aiming to correct our model parameters better" to make it easier for readers to understand.
3. I define and describe the loss function of the classifier in the image generation model. The presence of the classifier in the image generation model is necessary because the two tasks of classification and determination of truth and falsity are independent, and the model performance can be better.
4. I have modified the sentence "Its loss function is defined as follows" to ensure that the equations are referenced using the numbers mentioned in this paper.
5. The BCE() and CE() functions are binary cross-entropy and cross-entropy functions.
6. I have added details to both LIC and LC functions to make them easier to understand.
7. I have modified equations (8-10) to make it easier for readers to understand them in conjunction with the previous parts of the article.
8. The following steps can be performed only when the conditions are satisfied.
9. the total loss function I also explained the explanation of it.
10. The teacher model is obtained by moving the student model by exponential means.
11. weights will be assigned to the sample loss
12. The student-teacher model does not have a fixed network structure; here, we use "WRN-28-2" as the network's backbone.
13. The smaller the number of samples, the better the performance of our model compared to the previous model. Still, when the number of pieces increased, the performance was better than the previous model, but the improvement was not as significant.
14. I modified Figure 10.
If there are any deficiencies in the paper, please give your corrections. Thanks again for your help.

Author Response File: Author Response.pdf

Reviewer 3 Report

This work proposes a semi-supervised learning approach, where, first a GAN is trained to generate new labeled images, and the original and the generated images with their corresponding labels were then used to train a teacher-student model with a modified loss. Although some parts of the paper are well written the methodology part is written poorly. More experiments are needed to validate the proposed loss. My main comments are:

1. Regarding related work: The related work section could be expanded to include more semi-supervised approaches. The teacher-student model could be explained somewhere as it was used in the proposed method. Many of the proposed methodology were included in the related work section. The differences of the proposed method and the existing work could be explained in the related work section. However, in this manuscript a significant part of the related work section is allocated to explain the proposed approach.

2. Regarding the proposed method: This section needs significant improvements to explain the methodology clearly.

a. Section 3.1: “generator to generate a pair of fake images” – why a “pair”?

b. Section 3: “The supervised loss consists of two components, L_C and L_{IC}”, but the explanation of L_{IC} is given in Section 3.2. When introducing something it is better to say a few words about it.

c. Section 3.2: C is used to represent channels, and earlier the same C is used to represent the classes. Better to use a different character to represent the no of channels.

d. It is very much unclear about “f” in Eqn. 9. It is given that “f represents the vector unit that maps the label information of that ..” – further explanation of “f” is needed as it is crucial to understand the proposed loss.

e. P_t(max), p_t(L) – not good notations. What is “max” here? – further explanations are needed. Max could be a sub/superscript.

f. Section 3.3: I feel the most appropriate title of this section could be “consistency loss” than “unsupervised loss” as the purpose is to enforce consistency between the student and the teacher.

g. Section 3.3: “L_U is the traditional unsupervised loss” – what is “traditional unsupervised loss”?

h. Section 3.4: L_C and L_{IC} are in different scales. L_C has not been explained in Section 3, but I hope it is a cross-entropy loss. In that case, L_C is in the log scale, but L_{IC} is a mean squared error term. Directly combining them is not a good idea. They should be combined with weights.

i. Eqn. 13 – L_{IU} should be L_{U}

3. Regarding the experiments:

a. Although different terms are included in the proposed loss it is unclear about how they contribute to the performance. For example, what is the effect of the term L_{IC} and L_U. What is the performance if you remove those terms?

b. “CIFAR-10 .. Hintons’s students Alex and Ilys …” – an unnecessary statement.

c. Figure 10 – Should be a bar graph as there is no meaning by the lines which connect two methods. E.g., what are interpretation of the values in the line which connects MixMatch and UDA?

Author Response

Hello, dear reviewer! Thank you for taking the time out of your busy schedule to review my manuscript. After seeing the suggestions you gave me, I have carefully reviewed my paper and made the following changes.
1. The previous semi-supervised learning methods we summarized in the introduction module, the related work only describes the methods used in our model and presents our ideas for their shortcomings.
2. I have modified the sentence "Generate a pair of images" in Section 3.1.
3. C is used for the number of channels, and c is used for the class. So I did not change the symbols.
4. I changed the title of the article. And I have detailed the traditional supervised loss, traditional unsupervised loss, weighted supervised and unsupervised loss, and additional loss LIC in the report.
5. I have also explained f() in Equation 9.
6. I have also explained P_t(max) and p_t(L) in the text to make it easier for the reader to understand.
7. I have revised Figure 10
Please correct me if there are any shortcomings in the paper. Thanks again for your help.

Author Response File: Author Response.pdf

Article Menu

Improving Semi-Supervised Image Classification by Assigning Different Weights to Correctly and Incorrectly Classified Samples

Further Information

Guidelines

MDPI Initiatives

Follow MDPI