Review Reports - Automatic Development of Deep Learning Architectures for Image Segmentation

Round 1

Reviewer 1 Report

This manuscript is an enhancement of the manuscript with the identifier sustainability-835842 .

I believe that the manuscript has been improved on the basis of the suggestions made by me.

Reviewer 2 Report

The paper presents a method to design Deep Neural Networks (DNN) strongly limiting the computational cost. This is obtained by building the (DNN) by using cells, which are designed separately by means of a recurrent neural network. In spite of the important reduction of computational burden, the performance of the DNN have performance which is similar to that ones developed using conventional approaches.

Minor points

line 464: "pretain"

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The paper deals with an automatic derivation of the architecture of deep learning neural net for the segmentation of eye glasses.

Deep learning neural nets need a lot of examples for training purposes. For the validation holds the same. Recurrent neural nets needs a long training phase for the recurrence because of convergence.

You use only few examples for training and as I read not much recurrent training cycles. That is a little bit astonishing.

In the part "Related Work" you mix related work and your approach and you repaet also your introduction. Should be shortened!

Line 63 to 65 is irrelevant at this point.

Line 75 an following is part of the introduction, skip it here.

Line 96 ff: search space is large, random search does not take of what was found. What was found? Explain here.

You mentioned evolutionary algorithms. What are the benefits and the disadvantages. Mentioning approaches without explaining that does noit help the reader.

Line 117 ff ist a repetition.

Line 179 and 180: This sentance is confusing. The cell is right means if it is part of the first layer - the input layer.

Line 201/202: number of filters is confiogurable - how, criteria?
each node in a given cell has the same number of filters? really

Line 207: your cell have a variable number of inputs? What does this mean? Adaptating weights and skipping weights with low values is standard. Or what do you mean?

Line 210: You use a RNN for deriving the architecture at every layer. How do you improve the cell funtions locally with respect to the overall performance? That is not described and menas mathematically to define an error function at the output level to calculate the influence of laocal variations to the output behaviour. Please explain.

Line 226: How is that apporach more than a random search? Explain please.

Line 241: You use reinforcement learning for the RNN deriving the architecture? What is the local criteria? What is the gloabl criteria?

Line 249: You do not allow the training to converge? How do a RNN can derive an architecture and how can the performance be measured of you do not get the network in a state near convergence? Please explain.

The performance is done much faster than ... Which performance is that, how is it measured without convergence?

Line 271: You generate the data set based on asmall set of real data. The deviations of the generated data from the real data should be described in a more detail. Otherwise the reader will not be clear about the performance of your approach, because the similarity may be large.

You put random shapes on the image, which are homogenous. Is that real?

Line 293: In a training step the proposer generates one cell. Then you use a random select for exploring the space. In the introduction you mentioned that approach as not usable???

Line 320: 10 epochs with 50 samples is sufficient to train the network?

Line 338: How are the pooling layer derived? By the RNN or?

Line 399: Input is 1024x1024x3 including the 3 color channels, output is 1024x1024x8 based on the 8 filters. How are the colors processed?

Line 416: 5 days to 2 years: based on whoch calculation?

Line 432: PUE factor is 1.58? I calculate 1.63 from 165W, 0.41 H and result of 0.11. Explain please.

Line 450: Your approach generates cells not a network, or? Does your approach generate also the number of necessary layers? If so, it should be explained in more clear manner.

Using an RNN with reinforcement learning ist a hard job. Using that to generate cells for a deep learning network is much harder. Your proposed efficiency seems to positive.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors of this study show a solution in which they apply Machine Learning to an image segmentation problem helping to reduce various costs and improve sustainability and the environment compared to other possible solutions.

I consider the manuscript to be of high quality: up-to-date references, well formed structure, well explained material and methods, and clear conclusions.

Congratulate your authors on this manuscript, there is a great deal of work behind it.

I only suggest a few minor changes:

The term "artificial intelligence" should always be in lower case, unless its acronym is used. On line 1 it is in capital letters.
On line 90 it is possible to read: "Neural architecture search algorithms", it must be accompanied by its acronym (NAS).
It would be interesting to have a description of the CNN models used in more detail regarding the definition of layers.
I do not know if it is possible, but I suggest the authors to offer the reader the possibility to download the data.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper proposes a reinforcement learning method in order to define the hyper-parameters of a deep neural network. A recurrent neural network is used to determine the best parameters of the deep neural network, in terms of connections and operations. in order to limit the computational cost of the procedure, the proposing procedure is performed by evaluating the score of the alternative solutions on a simplified test. A growing procedure is adopted to defining the final structure of the deep neural network, by terminating the procedure when the performance saturates.

Remarks

The title feels like a strech. Obviously a faster algorithm implies a reduction of energy consumption, but the level of power cannot be considered an issue for sustainability.
The training of the proposer should be described more in detail.
The performance of the proposed method should be tested on a benchmark dataset and compared with other approaches proposed in literature

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I'm still not shure about the performance of the approach. You generate the cell similar for all layers, right? The proposer defines the cells in a local step und this cell is then used for all layers. Please correct me, if that does not hold.

In this case, I disbelieve the overall performance especially when not trained in a full manner. We have done scientific work on detecting parcels labelled an unlabbeld and used GANNs too. We have learned that we must check the performance on the feature level to get a good performing architeture. So it seems that your approach misses details in validation. Please explain. Thanks.

Reviewer 3 Report

the structure of the article has been improved compared to the first version, but the concerns that I expressed in the first evaluation have not been resolved. In the order:

1) the term "Sustainability" has been removed from the title, but in the results section reference is still made for example to the saving of CO2 emissions. These are insignificant quantities for an operation that furthermore is carried out theoretically only one time in the system design phase, therefore it is not clear how this aspect can be considered relevant.
2) the training process of the recurrent neural networks has not been clarified. The Authors speaks on Reinforcement Learning, but it is not clarified how the Agent's actions are generated. On lines 319-320 the authors state that they do not have a reference value for training RNNs. Does this mean that the network parameters are randomly assigned? If so, what is the advantage of using a network as a proposer rather than directly defining the parameters that the proposer is called to provide with reinforcement learning?
3) as regards the fact that the authors do not compare the performances of the proposed method with those of literature, the reasons given by the authors still leave doubts. I believe that the validity of the proposed method is not limited to the dataset used, therefore it makes sense and, indeed, it is necessary to compare the performances with the literature. As an example of a dataset used for semantic segmentation of images, the Authors could refer to datasets such as PASCAL VOC 2012