Next Article in Journal
Dataset for Scheduling Strategies for Microgrids Coupled with Natural Gas Networks
Next Article in Special Issue
Point of Sale (POS) Data from a Supermarket: Transactions and Cashier Operations
Previous Article in Journal
Biogenic Volatiles Emitted from Four Cold-Hardy Grape Cultivars During Ripening
Previous Article in Special Issue
Gaussian Mixture and Kernel Density-Based Hybrid Model for Volatility Behavior Extraction From Public Financial Data
Article Menu
Issue 1 (March) cover image

Export Article

Open AccessArticle

Peer-Review Record

Data Preprocessing for Evaluation of Recommendation Models in E-Commerce

Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Received: 10 December 2018 / Revised: 18 January 2019 / Accepted: 28 January 2019 / Published: 31 January 2019
(This article belongs to the Special Issue Data Analysis for Financial Markets)

Round 1

Reviewer 1 Report

Comments on manuscript DATA-414761 “Data preprocessing and validation techniques for online evaluation of recommendation engines on e-commerce platforms”

The manuscript deals with an important topic of data quality improvement in e-commerce, namely the accurate assessment of the recommendations’ influence on customer clicks and buys. The authors therefore explore three sources of errors: customer behavior, data collection and user-interface. The basic idea of the manuscript is very interesting and the presented considerations have potentials for publication.

However, in its current stage, the manuscript still suffers from several shortcomings which prevent it from publishing:

There is no sufficient literature review available at the beginning that shows (a) the current state of research in the considered field and (b) uncovers the research gap to be filled with the current contribution.

The authors mention that they identified 13 major contributors to erroneous data analysis but do not further substantiate this finding. Which contributors do they have in mind?

Why do they focus on the fields customer behavior, technical, and user-interface? What is the motivation behind the three fields?

The manuscript contains several statements or estimates (e.g.: „Mostly such duplicate actions are rare occurrences“ on page 8) that need some further substantiation.

On the same page the authors state that „The proportion of such customers arriving every day is called the bounce rate of a website and the Hits for these customers must be removed.“ The concerning problem is important and worth to be treated. Accordingly, the question arises how this issue is discussed in the relevant literature?

On page 6, the statement „... whose population sizes vary from 

20k to 35k views per day per customer and 500 to 1000 buys per day per buyer.“ should be further specified. What does „per customer“ and „per buyer“ mean against the given numbers?

What is the overall conclusion that results from the three cases considered on page 6 and 7.

A general concern about this paper results from the fact that the presented methods and results can not be varified since no systematic comparisons with existing approaches are provided. Why, for example, not comparing the algorithm suggested on page 10 and 11 with existing approaches that serve a similar purpose?

On page 12, the authors state that „all data stored should be uniform and consistent“. This is correct and obvious, but the question is: How to achieve this? Any suggestions in this respect?

Statements like „In our observations from multiple e-commerce sites, we noticed cases where the reason for low user interaction was not due to the recommended products, but the result of an unintuitive placement of the recommendation widget on a home or product display page.“ are less helpful because of there limited generalizability. What exactly is an „unintuitive placement“?

Is there any empirical evidence for the classification (top 5 products versus products ranked 6 to 10) used in the paragraph about the design of the PLP on page 13?

The result presented in Figure 4 is of limited worth since it is just an example, without any further evidence / motivation regarding its generalizability. 

All in all, the manuscript could significantly increase its scientific contribution by improving its empirical substantiation and by a systematic reflection of its findings against current research / literature in the field of interest.

I therefore would like to encourage the authors to give their manuscript a comprehensive revision according to the above suggestions in order to better exploit the potential of their research.

Good luck with that!

Author Response

All reviewer responses have been upload as a Word file below

Author Response File: Author Response.docx

Reviewer 2 Report

I would like to commend the authors for their effort in drafting this paper. I really enjoyed reading it and I believe the results are interesting, novel, and will have impact. One comment I have is that although you talk about reccomendation engines in e-commerce settings you dont describe very well the value and impact that they have. There are several papers that look at the the user perceptions when purchasing online and especially in e-commerce settings. For instance the papers of Mikalef et al, 2012 and 2013 highlight the impact that these engines have on enhancing purchase behavior. Perhaps you could discuss this in your conclusion section.

Mikalef, P., Giannakos, M., & Pateli, A. (2013). Shopping and word-of-mouth intentions on social media. Journal of theoretical and applied electronic commerce research8(1), 17-34.

Mikalef, P., Giannakos, M. N., & Pateli, A. G. (2012, June). Exploring the Business Potential of Social Media: An Utilitarian and Hedonic Motivation Approach. In Bled eConference (p. 21).

Best of luck with your revisions

Author Response

All reviewer responses have been uploaded in the word file uploaded below.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors are to be commended for an interesting paper guiding how to preprocess e-commerce transaction data and validate the effectiveness of a recommendation model. Considering the principle of GIGO(garbage in, garbage out), appropriate data preprocessing is a very important issue when building recommendation models for e-commerce platforms. Nonetheless, data preprocessing step is often neglected in the field. Moreover, a comprehensive guide for proper data preprocessing is very rare. From this viewpoint, this paper makes a significant contribution for the academicians or practitioners who deal with the recommendation engines for e-commerce sites. Though depth of each issue is quite shallow, this paper deals with twelve data preprocessing issues – that is, its coverage is extensive. Thus, I believe this paper is worth to be read by the readers of Data journal. However, I expect that the quality of this paper could be more enhanced after revision in accordance with the following comments:


The title of the first section: Beginning with ‘Summary’ looks unfamiliar in an academic paper. It would be better to change it into ‘Introduction’.

The position of Subsection 1.1 (definition of terms): It looks inappropriate. It is recommended to move it to Section 2.

The position of Section 3 (Related work) also looks unnatural. It is recommended to switch its position with Section 2 (Data description).

In Section 3 (Related work), only two prior studies are referenced. Moreover, the main author of these two studies are same. That is, literature review is too simple and short. Please elaborate the related works in the manuscript. It is recommended to review the recent studies on various data preprocessing issues.

On pages 5-7, more detailed description on example e-commerce website data used for the experiments in this paper (Case 1, 2, and 3) is required. To understand the dataset better, it is recommended to include the additional information on the data source (i.e. the e-commerce site) such as its main products, the size of total members, data collection period, and so on.

Regarding issues pertaining to customer behavior, ‘the customers who buy products using price comparison shopping engines’ are not considered in this study.

I cannot understand how the method proposed in Section 5 works. It is general to use both ‘experimental group’ and ‘control group’ to check the effect of the stimulus (in this case, recommendation model). In this case, experimental group is exposed to the stimulus, whereas control group is separated from the stimulus. This enables us to understand the effect of the stimulus by measuring the differences between two groups. The method proposed in Section 5 also propose to use two groups (i.e. split 1 and split 2), but they are both exposed to the stimulus (i.e. recommendation). In this case, we cannot measure the effect of the stimulus. Please enhance your explanation on how this works. (Or, you may consider removing the whole Section 5. I think data preprocessing issues alone are enough.)

Section 6 (Conclusion) is also simple and short. Especially, the limitation and the future research directions of this study are missing. Please elaborate the conclusion.

[Typo error]
On page 2, line 11: “from from B to B”
à “from B to B”

To sum up, though this research contains some problems, I believe that it deals with an interesting topic, and is potentially capable of making a contribution. Finally, I’m recommending ‘acceptance with minor revision’ into Data journal.


Author Response

All reviewer responses have been uploaded in the word file uploaded below.

Author Response File: Author Response.docx

Reviewer 4 Report

It is an interesting article yet it should be strengthened. First of all, the language should be checked since there are syntax and grammar errors.

Even works are repeated. e.g. p2 2nd par. line 5 "from from" / p4 2nd line "can can".

The title is quite verbose. rephrase appropriately.

Remane summary section to Introduction.

More references are needed. e.g. section 1 needs references that will support the claims/statements.

p2. 1st par. "through our exploration.." justify by providing evidence etc

Related work section discusses ony two works by R. Kohavi. More cases should be presented. Furthermore, these related works should be compared with the proposed approach in order to support its contribution.

section 4 is fair although it needs more justification at some points. check it carefully.

section 5 needs more discussion.

Author Response

All reviewer responses have been uploaded in the word file below.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

No further comments on the available revision of the original version.

Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top