Next Article in Journal
Multi-GeV Laser Wakefield Electron Acceleration with PW Lasers
Previous Article in Journal
Optimal-Setting for Ore and Water Feeding in Grinding Process Based on Improved Case-Based Reasoning
 
 
Communication
Peer-Review Record

A Personalized Machine-Learning-Enabled Method for Efficient Research in Ethnopharmacology. The Case of the Southern Balkans and the Coastal Zone of Asia Minor

Appl. Sci. 2021, 11(13), 5826; https://doi.org/10.3390/app11135826
by Evangelos Axiotis 1,2,*, Andreas Kontogiannis 3, Eleftherios Kalpoutzakis 1 and George Giannakopoulos 4,5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2021, 11(13), 5826; https://doi.org/10.3390/app11135826
Submission received: 20 April 2021 / Revised: 18 June 2021 / Accepted: 21 June 2021 / Published: 23 June 2021
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

A personalized machine-learning-ebabled method seems to be beneficial proof-of-concept which can be used in ethnopharmacology research. My recommendation is to mentioned in more detail how this method can be accessible to ethnopharmacology researchers in practical use.

Furthermore the abbreviations SVM (page 3, line 84) and MeSH (page 7, line 241) should be explained in the text. Abbreviations ML and AL are explained in the abstract only, their explanation in the text is missing.

Author Response

Comment 1.1 

personalized machine-learning-enabled method seems to be beneficial proof-of-concept which can be used in ethnopharmacology researchMy recommendation is to mentioned in more detail how this method can be accessible to ethnopharmacology researchers in practical use.  

Response to comment 

Thank you for your commentWe added how our system can practically support the ethnopharmacological research in Section 5: “For future work...focus”. 

Comment 1.2 

Furthermore the abbreviations SVM (page 3, line 84) and MeSH (page 7, line 241) should be explained in the textAbbreviations ML and AL are explained in the abstract onlytheir explanation in the text is missing. 

Response to comment 

Thank you very much for the commentWe corrected the aforementioned abbreviationsPlease check their corresponding pages. 

Author Response File: Author Response.pdf

Reviewer 2 Report

an interesting and well written article

Author Response

Comment 2.1 

an interesting and well written article 

Response to comment 

We thank the reviewer for the encouraging comment. 

Reviewer 3 Report

As a reviewer and an expert in computer science,
I'm unable to understand the real goal and the real contribution of the paper.

- Is it focused on results obtained by the research on 
Ethnopharmacology?
- Is it focused on showing the advantages obtained
by aplying machine learning, in paritcular AL and RL to speed up the search of possibly relevant documents?

I suppose that the latter option is the real goal of the paper, but in this case I have to say that the authors
were totally unable to reach the goal. In fact:

- I would expect a clear and sound presentation of the methodology and techniques you adopted; however, in this regard, Section 2 is confused, imprecise and not comprehensible. If I wanted to replicate your approach,
I would be totally unable to do. And remember, reproducibility is the key success factor for sciene.
- Why you insist so much on Ethnopharmacology? in half a page you could present the case study and that's all.
There is no need to tell so much about the geographical area, the types of plants, amd so on.
- If you refer to specific methodologies, techniques and acronyms, you should always put the related references on the paper. In contrast, you often miss to do this.
-Finally, I found a quite strange style for English. I'm not a mother tongue, but certainly you cannot use 
"their" to refer to a single expert; you should use "her/his". Furthermore I have the impression that often you miss some pronom.

So, I really think that the quality of the paper is very poor, thus I propose for rejecting it.

I attach the paper with my my annotations for your convenience.

Comments for author File: Comments.pdf

Author Response

Comment 3.1 

I'm unable to understand the real goal and the real contribution of the paper. 

Is it focused on results obtained by the research on  Ethnopharmacology? 

Is it focused on showing the advantages obtained by aplying machine learning, in paritcular AL and RL to speed up the search of possibly relevant documents? 

suppose that the latter option is the real goal of the paper,… 

Response to comment 

Thank you very much for the comment.  

Concerning the goal, we have updated the abstract, the introduction and the conclusion to clarify the focus of the paperThus: 

This study aims to understand if and how an ethnopharmacology expert can be supported effectively through intelligent tools in the task of ethnopharmacological literature researchTo this end we utilize a real case of ethnopharmacology researchaimed at the Southern Balkans and Coastal zone of Asia Minor”. 

Concerning the contribution, the study has shown that the utilization of AI-based methods can provide significant boost to the effectiveness and efficiency of ethnopharmacology research and, thus, the development of such end-to-end tools is meaningful. The quantification of this improvement is also a significant contribution of the study. 

 

Comment 3.2 

- I would expect a clear and sound presentation of the methodology and techniques you adoptedhowever, in this regardSection 2 is confusedimprecise and not comprehensible.  

If I wanted to replicate your approach, I would be totally unable to do. And rememberreproducibility is the key success factor for sciene. 

Response to comment 

Thank you for your comment. 

To further clarify our methodsincrease the conciseness of the description and clarity, we have performed the following updates: 

  • We updated Fig. 2 to clarify the interaction between AL and RL in the overview of the flow. 
  • We have extensively revised Section 2.5, describing the focused crawling algorithm: (“Finally … by the focused crawler”) 
  • We now provide more information about the hyper-parameters used in the reinforcement learning processSection 2.5 (“We note that we use Adam … uniform sampling.”) 
  • We have provided the list of seed documents in a GitLab Repo URL. 
  • We have mentioned the software components and tools used to run the study in Section 2.5 (last paragraph) 

Comment 3.3 

Why you insist so much on Ethnopharmacology? in half a page you could present the case study and that's all.  

There is no need to tell so much about the geographical area, the types of plantsamd so on. 

Response to comment 

This work is a joined venture of Ethnopharmacology experts and Machine Learning researchersThus, the original setting is that of EthnopharmacologyWe understand the clear potential of transferrability to other domainshowever –given also the multi-disciplinary audience of the journalhave opted to keep the original view as the point of focus for this work. In this viewethnopharmacology is central. 

Comment 3.4 

If you refer to specific methodologiestechniques and acronymsyou should always put the related references on the paper. In contrastyou often miss to do this. 

Response to comment 

Thank you for the commentWe have updated the text to refer to all appropriate algorithms and methods we have utilized in the text, as suggested. 

 

Comment 3.5 

-Finally, I found a quite strange style for EnglishI'm not a mother tonguebut certainly you cannot use "theirto refer to a single expertyou should use "her/his". Furthermore I have the impression that often you miss some pronom. 

Response to comment 

Thank you for your commentThe use of “they” is considered standard practice (cfhttps://en.wikipedia.org/wiki/Singular_they ) for gender-neutral sentences. In any caseto facilitate your suggestionwe have appropriately revised the suggested parts of the text and performed extensive double checking of the grammar. 

 

Author Response File: Author Response.pdf

Reviewer 4 Report

In the manuscript titled “A Personalized Machine-Learning-enabled Method for Efficient Research in Ethnopharmacology. The case of Southern 3 Balkans and Coastal zone of Asia Minor” the authors propose an approach for a more efficient research activity in ethnopharmacology.

The proposed approach is modern and potentially useful while the manuscript attempts to address most the theoretical aspect involving the use of ML, AL, RL along with LSTM technique.

Overall, the manuscript is well structured and the invoked references are adequate.

 

  1. I would kindly suggest to the authors to present in more details their contribution the field and the clearly define the nature of this contribution: is it methodological, technological, an apparatus?
  2. The overall presentation seems frugal and in “is matter of fact” fashion. The last part of chapter “Materials and Methods” would require more in-depth details of the proposed approach/solution. It is not clear from the manuscript if the processing algorithms were developed by the authors or they use a commercial framework to which the training and test data was supplied.
  3. The results concern mostly the automatic URL extraction through crawling and to a less extent the ML processing contribution.

 

Line 162 - unclear formulation:

“We consider that the agent - the crawler - fetches a new URL at each timestep and exists in a crawling environment, which provides states, actions, and rewards.”

 

Line 174 – the last term of the formula does not seem to be consistent with the development of the series:

“The goal of the agent is to find a policy, to maximize the discounted cumulative received reward Gt = Rt + γRt+1 + γ2Rt+2 + ... + γΤ-tRT, where T is the fixed number of total documents that the crawler should fetch and γ is the discount factor.”

 

Line 181 – consider a cleaner, more detailed presentation of the formulas:

“We employ a Deep Q-learning approach, utilizing the Deep Q-Network (DQN) agent [12], which is based on the TD Error, Rt+1 + maxaQπ' (St+1, a; θ-) - Qπ (St, At; θ), where Qπ and Qπ' are the action-value functions under the policies π and π', respectively. That is Qπ (St, At) = EU(D) [Rt+1 + maxaQπ'(St+1, a; θ-) | St, At]. The DQN agent consists of two neural networks with the same architecture - a Q-Network (θ) and a Target Q-Network (θ-) - in order to approximate Qπ and Qπ', respectively.”

 

 

 

Author Response

Comment 4.1 

  

  1. would kindly suggest to the authors to present in more details their  to the field they clearly define the nature of this contributionis it methodologicaltechnologicalan apparatus? 

Response to comment 

Thank you very much for the commentTake it into considerationwe added “Thus ... ethnopharmacology” (lines 60,61) in abstract. Furthermorewe edited the first sentence of Section 5 (adding “methodology...”).  Last but not leastwe referred to our major contribution in the field of focused crawling (“Our major contribution … (RL)).”, end of Section 1)  

Comment 4.2 

 

  1. The overall presentation seems frugal and in “is matter of fact” fashion. The last part of chapter “Materials and Methods” would require more in-depth details of the proposed approach/solution. 

Response to comment 

Thank you a lot for this comment 

To further clarify our methodsincrease the conciseness of the description and clarity, we have performed the following updates: 

  • We updated Fig. 2 to clarify the interaction between AL and RL in the overview of the flow. 
  • We have extensively revised Section 2.5, describing the focused crawling algorithm: (“Finally … by the focused crawler”) 
  • We now provide more inforamtion about the hyper-parameters used in the reinforcement learning process: Section 2.5 (“We note that we use Adam … uniform sampling.”) 
  • We have provided the list of seed documents in a GitLab Repo URL. 
  • We have mentioned the software components and tools used to run the study in Section 2.5 (last paragraph) 

 

Comment 4.3 

 

  1.  It is not clear from the manuscript if the processing algorithms were developed by the authors or they use a commercial framework to which the training and test data was supplied. 

Response to comment 

Thank you for the commentWe developed the focused crawling framework that follows the Expert-Apprentice paradigm (“In our Artificial Intelligence … process of focused crawlerSection 2.1), but utilized common ML algorithms like LSTM (“In our work … for the AL settingSection 2.1) and Deep Q-learning (We employ a Deep Q-learning … with respect to θ”, Section 2.5). 

Comment 4.4 

  1. The results concern mostly the automatic URL extraction through crawling and to a less extent the ML processing contribution. 

Response to comment  

Thank you for your commentWe added a few appropriately-placed sentences to illustrate that –indeedour aim was to optimize the automatic URL extraction process through crawling with the use of ML (RL):  

(“Thuswe utilize RL … URL extraction process”, Section 2.1), 

(“Since DQN … of the RL agent.”, Section 2.5) 

(“Recall that …  process“, Section 3.2) 

As we mentioned in these sentences, the evaluation metric for the automatic URL extraction process through crawling (harvest rateis always equal to the mean cumulative reward the RL agent receives. Thusoptimizing the crawling process is the same as optimizing RL. 

 

Comment 4.5 

Line 162 - unclear formulation: 

We consider that the agent - the crawler - fetches a new URL at each timestep and exists in a crawling environmentwhich provides statesactions, and rewards.” 

Response to comment 

Thank you for this comment. We also make this formulation clearer. (“An agent … Formally”, section 2.5)  

 

Comment 4.6 

Line 174 – the last term of the formula does not seem to be consistent with the development of the series: 

“The goal of the agent is to find a policyto maximize the discounted cumulative received reward Gt = Rt + γRt+1 + γ2Rt+2 + ... + γΤ-tRTwhere T is the fixed number of total documents that the crawler should fetch and γ is the discount factor.”  

Line 181 – consider a cleanermore detailed presentation of the formulas: 

We employ a Deep Q-learning approachutilizing the Deep Q-Network (DQN) agent [12], which is based on the TD Error, Rt+1 + maxaQπ' (St+1, a; θ-) -  (StAt; θ), where  and are the action-value functions under the policies π and π', respectivelyThat is  (StAt) = EU(D) [Rt+1 + maxaQπ'(St+1, a; θ-) | StAt]. The DQN agent consists of two neural networks with the same architecture - a Q-Network (θ) and a Target Q-Network (θ-) - in order to approximate  and ', respectively.” 

Response to comment 

Thank you a lot for this commentUnfortunately, in the submitted version of the paperincosistency was caused due to software incompatibility errorsWe have done our best  to remove such risks in this version of the paper and have updated all formulas.  
 

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

I read again all the paper and I'm disappointed to find the paper more or less in the original shape, just with minor changes.

For example, in my first review I wrote:

"- I would expect a clear and sound presentation of the methodology and techniques you adopted; however, in this regard, Section 2 is confused, imprecise and not comprehensible.  

If I wanted to replicate your approach, I would be totally unable to do. And remember, reproducibility is the key success factor for sciene. "

well, the organization of Section 2 is the same, I think it could be understood only by experts in Neural Networks.

I also wrote
"- If you refer to specific methodologies, techniques and acronyms, you should always put the related references on the paper. In contrast, you often miss to do this. ",

well, a lot (and I say "a lot") of references are still missing.

In general, the style of the paper look like a homework made by a high-school student. There is not a precise structure of exposition within sections, ideas and details are wrote together seamlessly.
Furthermore, you rely on bachground knowledge that you do not explain.
Some parts appear to be the descriptions of algorithms: usually, in computer science algorithms are presented in the form of pseudo-code, so as to provide a clear definition of them and allow for clearly explaining them.
Again, a "Related Work" section is missing, so it is difficult to understand on what technical background you rely. 
Furthermore, there is not a section that introduces the background techiques you adopted. SHould I think that you used Python libraries in a total black-box way, without understanding what they actually do?

In the following, I report detailed comments.

---------------
Abstract
The abstract has been improved.

Introduction
The introduction has been slightly improved, even though a final paragraph that introduces the overall organization of the paper is missing.

Line 70.
"The Expert holds explicit and 70 implicit interests."
Is this sentence really meaningful?

Lines 90-92
SVM, Logistic Regrassion, LSTM
references are missing.
Furthermore, at the end you used LSTM: you should explain, in a dedicated background section, how it works.

Line 128
RA, CK, KA
references are missing, as well as their formal definition.
You should inotrudce them in a specific background section.

Line 136
What is a "gold standard"? 
You should explain this concept.

Lines 148-154
This paragraph is comprehensible only by experts in neural networks.
What is a "layer"? what is a "Mean Pooling layer"? what is "SoftMax"?
What is a "timestep"?

The scientific soundness in computer science means that you have to provide adequate background of techniques you adopt.


Line 157
"4-fold validation"? what is it? You should explain and provide the reference.

Lines 174-184
It appears that you are describing an iterative algorithm.
Well, without pseudo-code? The algorithm should be 
formalized as pseudo-code and described accordingly.


Line 179
"URLs ... extracted from a state transition"
does a state transition contain URLs?


Line 185
"The goal of the agent is to find a policy"
Do you mean that the goal of the algorithm is to find a policy?
So, how is a policy represented and how (pseudo-code)
is it discovered?


Lines 189-195
What is the formal definition of "Harvest Rate"?


Line 197
"TLD Error"?
It is not defined and the reference is missing.


Line 199
What is the meaning of the formula reported within the text?

Lines 198-213
This paragraph can be understood only by an expert in the specific type of neural networks.


Lines 220-231
Is the crawling algorithm (missing) described in this paragraph the same described in Section 2.1?
I'm quite confused, because it also seems that this crawler encompasses both the AL and the RL phase.
If so, the overall presentation of your technique has failed to be clear and sound.

Another doubt about crawling is that it appears to be a "backword discovery technique": provided a pool of seeds, it looks for cited papers, and so on.
So, it is unable to perform a forward search, i.e.,
looking for papers that cite the seeds.
Am I right?


Line 229
What is the "Experience Reply"? this term appears here for the first time.

What is the "closure"? in computer science the term "closure" has so many meanings...
What is the "closure structure"?

Lines 235-242
Again a description of an algorithms. And, again, the pseudo-code is missing.

Line 243
What is an "Adam optimizer"?
It is again background information that is not properly presented and whose reference ismissing.


Line 243
"batch size"? what "batch" are you talking about?

------------------

So, as you can see I have a lot of concerns about the quality of your paper, from my point of view of comptuer scientist.
In some sense, your paper looks like a workshop paper, written with strong page limitations, presented to a dedicated workshop in neural network applications.
A journal paper is much much more.

 

 

 

Author Response

We would like to thank you for the valuable comments. Based on your suggestions, overall: You can find detailed responses to the comments, below. The changes are also evident in the submitted document (through the “track changes” functionality). 

Yours sincerely, 

The authors 

Comment 1.1

Line 70.

"The Expert holds explicit and 70 implicit interests."

Is this sentence really meaningful?

Response to comment

Line 70 “Our work follows an “Expert-Apprentice” paradigm. The Expert holds explicit and implicit interests”

We have rephrased the sentence to reflect the fact that the Expert has some interest that are made explicit to the system, while others remain implicit and need to be identified:

“The Expert has his/her personal interests and understanding of which publications actually relate to these interests...”

Comment 1.2

Lines 90-92
SVM, Logistic Regrassion, LSTM
references are missing.
Furthermore, at the end you used LSTM: you should explain, in a dedicated background section, how it works.

Response to comment

We thank you for the feedback. We have updated the references as requested. However, given the fact that the paper is submitted as a “communication" submission, i.e. should be limited in size, we argue that adding definitions and background regarding well-established and known methods, which have been covered extensively by related book chapters, reviews and other papers, would be inappropriate in our paper type and format.

However, to facilitate the reader and accomodate the comment to the best of our ability, we have added qualitative explanations of what each of these methods does:

“SVM (a well-established classifier based on identifying representative instances that separate the classes of interest in a feature space) and Logistic Regression (relying on a thresholded probability estimate mapping the input features of an instance to the probability of the instance to belong to each class).”

“LSTM (a neural network embedding sequences to a vector space, making sure that similar sequences are positioned close to each-other in the embedding space)"

Comment 1.3

Line 128
 RA, CK, KA
 references are missing, as well as their formal definition.
 You should inotrudce them in a specific background section.

Response to comment

According to your suggestion, we have briefly introduced the rationale of each measure in the text, providing an appropriate reference to a survey paper that allows the eager reader to learn more details.

“Raw agreement..for each pair of labels”

Comment 1.4

Line 136
 What is a "gold standard"?
 You should explain this concept

Response to comment

We have updated the text to indicate what a “gold standard” is, as follows.

“...a means to obtain reference, agreed upon, opinions –referred to as “gold standard” opinions-...”

Comment 1.5

Lines 148-154
 This paragraph is comprehensible only by experts in neural networks.
 What is a "layer"? what is a "Mean Pooling layer"? what is "SoftMax"?
 What is a "timestep"?

The scientific soundness in computer science means that you have to provide adequate background of techniques you adopt.

Response to comment

Since this section is –indeed- appropriate for an expert audience, which would be need to understand more to reproduce the results, we have provided footnote with a reference to an appropriate handbook for all non-expert readers.

We note that the above basic neural network concepts are being used consistently in a multitude of related papers and, thus, we consider that repeating their definitions is out of the scope of this work (also given its “communication” designation).

Comment 1.6

Line 157
 "4-fold validation"? what is it? You should explain and provide the reference.

Response to comment

We have corrected the text to write:

“4-fold cross-validation"

Comment 1.6

Lines 174-184
 It appears that you are describing an iterative algorithm.
 Well, without pseudo-code? The algorithm should be
 formalized as pseudo-code and described accordingly.

Response to comment

We thank the reviewer for the comment. We have opted to retain a narrative description, since it is sufficient (and essentially equivalent to the pseudo-code format), provides better readability and is also meaningful for multi-disciplinary audiences, as is the audience of the journal.

Comment 1.7

Line 179
 "URLs ... extracted from a state transition"
 does a state transition contain URLs?

Response to comment

As we describe in lines "When the crawling process...by the focused crawler”, yes, a state transition contains a URL.

Comment 1.8

Line 185
 "The goal of the agent is to find a policy"
 Do you mean that the goal of the algorithm is to find a policy?
 So, how is a policy represented and how (pseudo-code)
 is it discovered?

Response to comment

It is the goal of the agent to find a policy. In a human agent this would happen in an appropriate, natural manner. In the case of the virtual agent, the policy is found using a specific algorithm.

We have updated the text to define what a policy is, adding the snippet “In other words, the agent seeks to find mapping between states and actions, in order to get high long-term rewards".

We have also added a reference to a reinforcement learning handbook, to allow further reading for non-expert readers.

Comment 1.9

Lines 189-195
 What is the formal definition of "Harvest Rate"?

Response to comment

We have updated the text to reflect the formula of “Harvest Rate”, which can also be found in the provided reference, as you have suggested.

Comment 1.10

Line 197
 "TLD Error"?
 It is not defined and the reference is missing

Response to comment

We have added the appropriate reference, as requested. Thank you for the comment.

Comment 1.11

Line 199
 What is the meaning of the formula reported within the text?

Response to comment

We have updated the text to clarify the rationale/meaning of the formula as follows:

“which reflects the expected cumulative ...reward r(t+1)”

Comment 1.12

Lines 198-213
 This paragraph can be understood only by an expert in the specific type of neural networks.

Response to comment

We agree that this paragraph is provided to facilitate related experts who may want to reproduce the results and understand the inner workings of what we do, and is there for completeness. The reference in the beginning of the section supports other readers by providing appropriate guidance, if needed.

Comment 1.13

Lines 220-231
 Is the crawling algorithm (missing) described in this paragraph the same described in Section 2.1?
 I'm quite confused, because it also seems that this crawler encompasses both the AL and the RL phase.
 If so, the overall presentation of your technique has failed to be clear and sound.

Response to comment

The crawling algorithm described here indeed faces the second task described in Section 2.1.

As we state early in the section you refer to, the AL results are a given for the crawling and, thus, the two processes (AL, RL) are effectively decoupled.

Comment 1.13

Another doubt about crawling is that it appears to be a "backword discovery technique": provided a pool of seeds, it looks for cited papers, and so on.
 So, it is unable to perform a forward search, i.e.,
 looking for papers that cite the seeds.
 Am I right?

Response to comment

For each seed/fetched document we utilize, as you describe, the papers it cites as possible next destinations. In our understanding, this is a “foreward” discovery, but it may be a point of view.

Comment 1.14

Line 229
 What is the "Experience Reply"? this term appears here for the first time.

What is the "closure"? in computer science the term "closure" has so many meanings...
 What is the "closure structure"?

Response to comment

The term is “experience replay”, which appears in the DQ-learning paragraph in Section 2.5.

We have updated the first occurrence of the term “closure” (in Section 2.5), to clarify that it  "represents a utility structure, essentially a map/dictionary (essentially a set of key-value pairs)”.

 Comment 1.15

Lines 235-242
 Again a description of an algorithms. And, again, the pseudo-code is missing.

Response to comment

As also stated in previous comments, we have opted to retain a narrative description, since it is sufficient (and essentially equivalent to the pseudo-code format), provides better readability and is also meaningful for multi-disciplinary audiences, as is the audience of the journal.

Comment 1.16

Line 243
 What is an "Adam optimizer"?
 It is again background information that is not properly presented and whose reference ismissing.

Line 243
 "batch size"? what "batch" are you talking about?

Response to comment

This paragraph is, once again, addressed to neural network experts and has been added based on the reviewers’ previous comments, to support reproducibility. The interested reader should follow the reference provided in the footnote in Section 2.4 for more information.

Reviewer 4 Report

The authors addressed properly the suggestions that I made on the first version of the manuscript. Congratulations!

Author Response

We thank the reviewer for the encouraging comment.

Round 3

Reviewer 3 Report

At a rapid look, I saw that only a little maquillage has been done to the paper.
The essential concerns, i.e., (1) a related work section and (2) pseudo-code that properly and formally presents the algorithms have not been addressed.

So my judge cannot change.

Sorry, but the things I ask for are minimum established requirements in the area of computer science.
Perhaps does the paper fall into another area?

Back to TopTop