Next Article in Journal
A New Approach for the Incorporation of the End-User’s Smart Power–Electronic Interface in Voltage Support Application
Next Article in Special Issue
A Feature-Based Approach for Sentiment Quantification Using Machine Learning
Previous Article in Journal
Adaptive Label Allocation for Unsupervised Person Re-Identification
Previous Article in Special Issue
PCA-Based Advanced Local Octa-Directional Pattern (ALODP-PCA): A Texture Feature Descriptor for Image Retrieval
 
 
Article
Peer-Review Record

Realistic Image Generation from Text by Using BERT-Based Embedding

Electronics 2022, 11(5), 764; https://doi.org/10.3390/electronics11050764
by Sanghyuck Na 1, Mirae Do 2, Kyeonah Yu 2 and Juntae Kim 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2022, 11(5), 764; https://doi.org/10.3390/electronics11050764
Submission received: 23 November 2021 / Revised: 7 February 2022 / Accepted: 11 February 2022 / Published: 2 March 2022

Round 1

Reviewer 1 Report

In this paper, the authors proposed a new text-to-image generation model using pre-trained BERT. The topic in this paper is quite interesting. I have the following comments

  1. Do the author plan to release the code and the datasets if the paper is published?
  2. Is it possible to deploy the proposed model on an edge device? If it is , how to implement acceleration?

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

1. Section 2 can be improved by more recent papers.
- RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER, 2021, The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
- Correlation-Guided Representation for Multi-Label Text Classification, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
- arXiv:2001.07966v2 [cs.CV] 23 Jan 2020
- arXiv:2003.12137v1  [cs.CV]  26 Mar 2020
review-https://doi.org/10.1016/j.inffus.2021.07.009
other applications: https://doi.org/10.3390/s21010133
https://doi.org/10.1038/s43856-021-00008-0
2. Algorithms 1-3 look like print screen in low resolution. Quality of the text outlook should be improved.
3. Rearrange (3) and (4). No. of eq. should be at the end of the row.
4. Check for spelling and grammar errors. Eg: "set. . In " (line 189).

5. You should explain your methods and details about execution in more details. For example, which software was used for what purpose and why. How did you implement technical details, etc. However, this could be solved by citing some reference. It is not necessary to rewrite the entire section.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

I have a number comments.

1.Its necessary to supplement references with modern works with realistic image generation  for text.

2.Its known that there are 3 types of contextualized word embeddings from BERT using transfer learning. Show the difference in your approach.

3.In this work you need context-averaged pre-trained embeddings.This embedding approach will give the average value of the word.

4.What is the novelty of Stack GAN because this model has long been used for text to photorealistic images synthesis. In addition, its better to use Stack GAN v2 for such tasks to obtain hagh-resolution photorealistic images.

5.In addition to comparing different GAN models listed in Tabl.2 for IS-score and FID-score,you need to use MSE and SSIM metrics.

6.It was necessary to compare your model with such models  as AttnGAN, BigGAN, InfoGAN.

7.The article should show that the algorithms as RAKE, NPL, LexRank and others are used for effective analysis and compression of the input text.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Overall I'm happy with the responses to my comments other than the Q5.The changes and additions have significantly improved the scientific and practical weigth of the article.

Back to TopTop