Process Mining of Sensor Data for Predictive Process Monitoring: A HACCP-Guided Pasteurization Study Case
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe following suggestions are for the authors to consider when making revisions.
(1) The monitoring equipment involves some sensors, and some examples are also given in the picture, but the specific meaning of these pictures is not clear enough, such as figure 2. Suggest providing an explanation.
(2) There are indeed some issues with sensor monitoring, such as drift and noise. The author claims in the abstract that a solution has been proposed to address these issues. However, in section 5.4.1 of the paper, it is argued that such data fluctuations can still affect the model's prediction results. Is this statement appropriate before and after?
(3) The pasteurization process of dairy products studied by the author is a commonly used process. Why choose to study this process?
(4) The author mentioned the use of synthetic data and suggested providing a detailed introduction, as well as the quality of the data?
(5) Why did the author choose Hidden Markov Models instead of others? Is it necessary to compare similar algorithms?
Author Response
Comment 1) The monitoring equipment involves some sensors, and some examples are also given in the picture, but the specific meaning of these pictures is not clear enough, such as figure 2. Suggest providing an explanation.
Answer 1): Dear reviewer, thank you for raising this important point. In response, we have revised the caption and accompanying text related to Figure 2 to clarify the meaning and role of the sensors shown in the image. The figure now includes a detailed explanation of each sensor type (e.g., temperature, pH, conductivity, turbidity, viscosity, flow, and pressure) and their placement within the pasteurization process.
Additionally, we explicitly describe how these sensors contribute to monitoring different stages of the process and serve as the foundation for activity recognition in the proposed framework. These improvements can be found in Section 4 and in the updated caption of Figure 2.
We hope these changes make the monitoring configuration and its interpretation more transparent to the reader.
Section 4 Pasteurization as a Process
Pasteurization, particularly in HTST systems, is not a monolithic operation but a sequence of coordinated stages, such as filling, heating, holding, cooling, and discharging, each governed by specific physical and regulatory conditions. Modeling pasteurization as a structured process enables the application of process-aware analysis techniques, which are essential for ensuring compliance with food safety standards such as HACCP. From a data-driven perspective, this abstraction facilitates the transformation of continuous multivariate sensor streams into interpretable process states, allowing for both real-time monitoring and retrospective process mining. Recognizing pasteurization as a process, rather than merely a temperature-time function, thus provides a conceptual foundation for integrating control systems, sensor data, and predictive analytics in a unified monitoring framework.
Figure~\ref{fig:overview} presents an overview of the proposed framework, which models pasteurization as a multistage process driven by multivariate sensor data. The diagram illustrates how raw measurements from inline sensors—monitoring variables such as temperature, pH, conductivity, viscosity, turbidity, flow, and pressure—are processed through a HMM to detect process stages, align with CCPs, and enable the construction of interpretable event logs. These logs are then used for process discovery and online predictive monitoring, including estimations of batch progression, stage durations, transition forecasting, and SLA compliance.
Comment 2) There are indeed some issues with sensor monitoring, such as drift and noise. The author claims in the abstract that a solution has been proposed to address these issues. However, in section 5.4.1 of the paper, it is argued that such data fluctuations can still affect the model's prediction results. Is this statement appropriate before and after?
Answer 2): We thank the reviewer for this thoughtful observation. In the revised version of the manuscript, we have removed the mention of sensor drift and noise from the abstract to avoid giving the impression that these issues are fully addressed by the proposed framework. As correctly noted, Section 5.4.1 discusses these challenges explicitly as limitations that remain to be tackled in real-world deployment.
The revised abstract now reflects the core contribution of our work—namely, the transformation of multivariate sensor signals into process-aware event logs for predictive monitoring—without overclaiming robustness to drift and noise. This ensures consistency between the claims made in the abstract and the discussion of open challenges in the main text.
Abstract:
However, extracting actionable process insights from raw sensor data remains a non-trivial task, largely due to the continuous, multivariate, and often high-frequency characteristics of the signals, which can obscure clear activity boundaries and introduce significant variability in temporal patterns.
Comment 3) The pasteurization process of dairy products studied by the author is a commonly used process. Why choose to study this process?
Answer 3 ) We thank the reviewer for the question regarding the selection of pasteurization as the focus of this study. Although pasteurization is a well-established process, its regulatory relevance, multistage structure, and reliance on sensor-based monitoring make it an ideal use case for demonstrating the value of process mining and predictive monitoring techniques. To clarify this motivation, we have added a dedicated paragraph in the Introduction section.
Introduction
Pasteurization represents one of the most widespread and safety-critical thermal treatments in the food industry, particularly in the dairy sector. While it is a well-established process governed by standardized regulations—such as those defined by HACCP protocols, the Codex Alimentarius, and the U.S. Pasteurized Milk Ordinance, its execution involves multiple tightly controlled operational stages, including heating, holding, cooling, and product discharge. These stages must be continuously monitored to ensure compliance with temperature–time specifications and to maintain product quality and safety. Precisely due to its regulatory importance and reliance on multivariate sensor data, pasteurization offers an ideal testbed for exploring the integration of process mining and predictive monitoring techniques. It provides a realistic and structured scenario in which to investigate how continuous signals can be transformed into interpretable process representations that support conformance analysis, traceability, and real-time forecasting.
Comment 4) The author mentioned the use of synthetic data and suggested providing a detailed introduction, as well as the quality of the data?
Answer 4 )We thank the reviewer for this insightful comment. As suggested, we have revised the manuscript to provide a more detailed introduction to the synthetic dataset (see Section 5.1). Specifically, we clarified the rationale behind using synthetic data, the underlying assumptions, and the generation procedure. We have also made the synthetic dataset and simulation code available as supplementary material to promote transparency and reproducibility.
To ensure realism and validity, the simulation is based on domain knowledge and published literature on HTST pasteurization, incorporating expected ranges and dynamic behaviors for each sensor (e.g., temperature, pH, conductivity, viscosity, turbidity, flow, and pressure). Table 2 summarizes the reference values and variability used to define each process stage.
Although synthetic, the dataset was designed to reflect operational variability and stage transitions observed in real pasteurization processes, which enables controlled benchmarking of the proposed methods under reproducible conditions. Furthermore, we discuss the limitations of synthetic data and its implications for generalization in Section 5.4.1.
Section 5.1 Synthetic sensor Data Generation
Developing and validating process mining frameworks for industrial applications often faces practical constraints due to the limited availability of high-quality, fully annotated real-world datasets. These limitations are particularly evident in safety-critical domains such as food processing, where data access is restricted for confidentiality and traceability reasons, and manually labeled ground truth is seldom available due to the high cost and operational disruption required for data annotation.
To overcome these challenges, we constructed a synthetic dataset grounded in domain knowledge and supported by validated literature on HTST pasteurization dynamics \cite{tokatli2004fault,tokatli2005haccp}. This simulation-based approach enables reproducible experimentation in a controlled, parameterized environment that mimics real operational complexity—including multivariate sensor behavior, activity transitions, and regulatory thresholds. By modeling realistic temperature, pH, conductivity, viscosity, turbidity, flow, and pressure signals over full batch cycles, we ensured that the synthetic data reflects not only nominal behavior but also variability and overlaps typical of industrial settings.
Such an approach is common in the study of process mining from sensor data (e.g., \cite{review2025,elkodssi2023}), where synthetic traces serve both to test methodological assumptions and to provide a benchmark for evaluating segmentation and prediction techniques. Moreover, it allows researchers to include additional variables and controlled anomalies that may not be simultaneously accessible in a single installation, thereby supporting generalization across multiple monitoring scenarios.
Section 5.4.1 Limitation
Finally, while synthetic data allow for controlled experimentation and reproducibility, they may not fully capture the complexity, noise characteristics, or unexpected behavior present in real industrial environments, which could affect the performance of the proposed methods when deployed in practice.
Comment 5) Why did the author choose Hidden Markov Models instead of others? Is it necessary to compare similar algorithms?
Answer 5 ) We appreciate this valuable question. We have added a paragraph at the end of Section Introduction clarifying the motivation for selecting HMMs. Specifically, HMMs capture sequential dependencies between latent operational stages, offering interpretable probabilistic transitions that align with HACCP-regulated process requirements. Alternative methods such as k-means, GMMs, or deep neural models were considered less suitable due to their independence assumptions, need for labelled data, or limited interpretability.
Introduction
The choice of HMMs in this study was motivated by their ability to explicitly model sequential dependencies between latent operational stages and observable sensor dynamics. Unlike clustering-based techniques such as k-means or Gaussian Mixture Models, which assume independent samples, HMMs capture the probabilistic transitions between process states (an essential property for continuous, stage-driven industrial processes such as pasteurization). In contrast, deep sequence models (e.g., recurrent or convolutional neural networks) can also represent temporal behaviour but require large labelled datasets, entail higher computational cost, and offer limited interpretability, which constrains their applicability in HACCP-regulated environments. HMMs therefore provide a balanced solution that combines unsupervised segmentation, temporal awareness, and transparent probabilistic interpretation of the process flow. For these reasons, they were adopted as the core modelling approach in this work.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study presents the results of transforming raw sensor streams into a structured process of pasteurization and demonstrate how this enables the application of recent advances in Predictive Process Monitoring (PPM), thereby supporting proactive interventions and compliance with HACCP-based regulations.
Please add a complete address information including city, zip code, state/province, and country.
Equations 1-6. It is not explained what all the markings mean, e.g. a, A, π.
Line 192 the abbreviation “ALP” should be explained.
Figure 1. It should be “Acidity“ not “Acidità”, in figure caption “Proposal” not “Porposal”
Figure 1 and Table 2 are not mentioned in the text.
Line 390 the abbreviation “BPMN” should be explained.
Author Contributions, Funding, Data Availability Statement, Conflicts of Interest are missing. This informations should be added after Conclusion.
References: Abbreviated journal name should be given and all DOI should be completed.
Author Response
Reviewer #2 Comments
Comment 1)Please add a complete address information including city, zip code, state/province, and country.
Answer 1): Dear reviewer, thank you for your careful observation, we add the complete address information.
Addresses: Università degli Studi di Trieste, 34127 Trieste, Italy and Universidade Estadual de Campinas(UNICAMP), 13083-862 Campinas, São Paulo, Brazil
Comment 2)Equations 1-6. It is not explained what all the markings mean, e.g. a, A, π.
Answer 2) Thank you for this observation. We have revised Section 3.2 to explicitly define all symbols and parameters used in Equations (1)–(6).
Section 3.2 Hidden MArkov Models
The transition probabilities (ai,j) between two stages (i and j) are assumed to be time-independent as shown in Equation 1
where the matrix of A shows the transition probability matrix of the system. The distribution probability for stage i in time t is shown as π(t)i and initial stage distribution is equal to
π(0)i = P(X1 = si), where ∑Ni=1 π(0)i = 1. At each time t, we observe a multivariate
measurement Yt = yt ∈ Rd that is the observed multivariate sensor vector at time t.
Comment 3)Line 192 the abbreviation “ALP” should be explained.
Answer 3) Thank you, but the acronym ALP stands for alkaline phosphatase, line 169, which was defined in the preceding sentence in connection with the discussed context.
Comment 4)Figure 1. It should be “Acidity“ not “Acidità”, in figure caption “Proposal” not “Porposal”
Answer 4) Thank you, the grammatical corrections have been made
Section 4 pasteurization as a process, The Figure 1 is modified which the grammar corrections
Comment 5)Figure 1 and Table 2 are not mentioned in the text.
Answer 5): Dear reviewer, Thank you for this observation. The table and image, now are mentioned and discussed.
Figure 1 is mentioned in section 4 pasteurization as a process
Table 2 is mentioned in section 4.2 Sensor
Comment 6)Line 390 the abbreviation “BPMN” should be explained.
Answer 6): Dear reviewer, thank you for the suggestion, we add the acronym for BPMN.
Section 4.5 Process Discovery and Monitoring
Business Process Model and Notation (BPMN)
Comment 7)Author Contributions, Funding, Data Availability Statement, Conflicts of Interest are missing. This informations should be added after Conclusion.
Answer 7) Dear review, we add the author contributions part. Thank you for this observation
Author contribution
Conceptualization: A.M., A.P.A.C.B., S.B.J., I.M.G.; methodology: A.M., A.P.A.C.B, I.M.G. S.B.J.,; software: A.M, A.P.A.C.B., I.M.G., S.B.J. ; writing—original draft preparation: A.M, A.P.A.C.B., S.B.J., I.M.G.; supervision: D.F.B All authors have read and agreed to the published version of the manuscript.
Comment 8)References: Abbreviated journal name should be given and all DOI should be completed
Answer 8): We thank the reviewer for this observation. The entire reference list has been revised to follow MDPI’s formatting requirements. All journal titles have been replaced with their official ISO 4 abbreviations, and complete DOI links have been added for all references where available.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAfter revision, I personally believe that the quality of the article has significantly improved and the content is more complete, and it can be considered for publication.

