Data Preprocessing Method and API for Mining Processes from Cloud-Based Application Event Logs
Round 1
Reviewer 1 Report
Thank you for the opportunity to review. The paper reads well and should be published to show current trends. Here are some suggestions:
- Lines 65-68: Please adjust the Section 0s to represent correct sections
- Line 81, Line 111 and look for other similar references: Fix references [14][17] - use correct way to include multiple references.
- Line 134: is it exiting or existing?
- Line 141: Previous chapter or section?
- Line 141-142: Why informally? Please explain.
- Line 158: Section 0 again. Please go through the entire paper and fix these kinds of errors - - There are repeated concepts/content. Please remove any such repeats.
- It will be good to see some of the unconverted log to compare it with Figure 4.
- It will be good to discuss how this is translatable in other settings and other context?
- Line 603: How can others use the API: Can you please state, what other existing PM tools?
Author Response
Please see attached PDF.
Author Response File: Author Response.pdf
Reviewer 2 Report
Dear Editor and Authors
The article presents a method of data preprocessing for event logs generated by cloud-based IT systems. The development of such a function library is needed and I believe that it will significantly simplify the preparation of data for Process Mining analysis.
The authors conducted a very thorough literature review, from which they drew the correct conclusions.
The authors emphasize the possibility of using it in the study of event logs from click-stream applications. Can there be any other places of application of this method?
In the introduction, the authors rightly point out that specific processes are often more complex than their idealized models. It is worth emphasizing at this point what the PM is to serve.
Minor remarks:
In the Related Work and Challenges chapter, you can add a sub-clause 10, pointing to the fact that cloud-based applications can also change - there may be new versions of them.
The authors implemented the method in the R language, and do they see the need and / or the possibility of transferring / using it to other environments (e.g. the Python language)?
In the Event Pattern Substitution Step section they write about "substituting them to a single click-refresh pair" - how will the timestamp be allocated in this case?
What will cleanHeaders do if the headers being cleaned have the potential to result in two identical headers?
Author Response
Please see attached PDF.
Author Response File: Author Response.pdf
Reviewer 3 Report
Review of “Data Preprocessing Method and API for Mining Processes from Cloud-Based Application Event Logs”
The paper tries to to develop and assess a method to pre-process event logs collected from cloud-based applications to discover user actions and processes. The paper proposes a method, called “Cloud Pattern API – Process Mining (CPA-PM)”, accompanied by a scriptable application programming interface (API) in R to pre-process little-structured cloud-based event logs and enable simpler and repeatable process discovery.
The issue of the paper is interesting since the application of Process Mining on Cloud databases is growing. The paper seems to be well-written and scientifically remarkable. However, in my opinion, the paper needs some improvements before the publication.
Below the main improvement points are highlighted:
- The introduction should be improved from a scientific point of view. I mean, the introduction should be more based on past literature. I suggest the authors to increase the number of reference of the paper in general and in particular of the introduction section (now, just one or two references). Otherwise, the reader may think that your paper is not justified/grounded on past literature but on your opinions. In my opinion, this limit is quite relevant. I also suggest to increase the part dedicated to the difficulties in analysing (and fix) event logs, also reporting references on this topic.
- The paper doesn't consider enough past scientific research. As also shown by the references section, few scientific papers were examined despite the Process Mining and Event Log processing is attracting increasingly attention in the last years. In my opinion, you should include more references in Section 1 and in Section 2. I think you should definitely include more PM research and at least the following ones: - Bose, R. J. C., Mans, R. S., & van der Aalst, W. M. (2013, April). Wanna improve process mining results?. In 2013 IEEE symposium on computational intelligence and data mining (CIDM) (pp. 127-134). IEEE. - Zerbino, P., Stefanini, A., & Aloini, D. (2021). Process science in action: A literature review on process mining in business management. Technological Forecasting and Social Change, 172, 121021. - Suriadi, S., Andrews, R., ter Hofstede, A. H., & Wynn, M. T. (2017). Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Information systems, 64, 132-150.
But, I think you need many more references.
- Discussion section is also partially affected by this problem. I think you should better discuss your results in light of past research (so, in my opinion, some references are needed)
- I suggest to treat more in-depth the potential managerial implications of your research. Maybe, also some more details on the case study can be reported.
- As minor issue, I also underline that at the end of Section 1 you describe the following sections reporting number 0 for all of them.
- I have not understood why you highlighted in yellow some signs in Section 5 and Section 6. I don’t know if it is a mistake or it means something. In the latter, you should better explain what it means.
Author Response
Please see attached PDF.
Author Response File: Author Response.pdf
Reviewer 4 Report
The work concerns the analysis of complex data generated by cloud-based applications, e.g. the log files of clicking events on a web page. A general method is suggested to sort, clean up, and aggregate the information, with the objective to reduce complexity. A library of functions implemented in R is provided to support the data analysis. I appreciate that the R routines are available to the reader for free download.
The paper is clearly written and informative. I support publication of the work. The only tiny correction to made is that the references to different sections are all broken. The text always refers to Section 0.
Author Response
Please see attached PDF.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
The authors implemented the majority of changes requested.
In my opinion, the paper deserves publication.
Good luck to the authors.