Next Article in Journal
Analyzing Factors Affecting Micro-Mobility and Predicting Micro-Mobility Demand Using Ensemble Voting Regressor
Next Article in Special Issue
Building Advanced Web Applications Using Data Ingestion and Data Processing Tools
Previous Article in Journal
Comparison of Selected Machine Learning Algorithms in the Analysis of Mental Health Indicators
Previous Article in Special Issue
Evolution of Popularity and Multiaspectual Comparison of Widely Used Web Development Frameworks
 
 
Article
Peer-Review Record

Advanced Examination of User Behavior Recognition via Log Dataset Analysis of Web Applications Using Data Mining Techniques

Electronics 2023, 12(21), 4408; https://doi.org/10.3390/electronics12214408
by Marcin Borowiec and Tomasz Rak *
Reviewer 1:
Reviewer 2: Anonymous
Electronics 2023, 12(21), 4408; https://doi.org/10.3390/electronics12214408
Submission received: 7 October 2023 / Revised: 20 October 2023 / Accepted: 23 October 2023 / Published: 25 October 2023
(This article belongs to the Special Issue Advanced Web Applications)

Round 1

Reviewer 1 Report

Please see the attached file report.

 

Comments for author File: Comments.pdf

Can be improved. See the report.

 

 

Author Response

Dear Reviewer,

Thank you for your comments. You may find our responses to your comments below. All changes were marked in the red font.

 

Question/Comment 1:

  1. How do the classification results of different architectures (S1 and S2) impact the prediction accuracy of user actions in the stock exchange system? What factors contribute to the differences in classification effectiveness?

 

Answer 1:

The article describes two different architectures used in the stock exchange system: S1 with a configuration of 8 CPUs and 20 RAM, and S2 with a configuration of 12 CPUs and 30 RAM. The architectural difference influences the classification results. For instance:

In the S1 architecture, as the training set increases, there are situations where the effective-ness of classifying user actions varies. There were instances when increasing the training set led to diminished prediction accuracy. In contrast, the S2 architecture, which is equipped with more computational resources, displayed smaller deviations in results. The results for the S2 architecture were more consistent and tended to be linear as the size of the training sets in-creased. The size of the training set impacted the classification results differently for the two architectures. For S1, larger training sets sometimes led to decreased prediction accuracy, while for S2, the results were more consistent.

 

For the training dataset size of 3600s, the results weren't very positive, leading to the decision to increase the training dataset size.

For a training dataset size of 32400s, the $XGBClassifier$ turned out to be the most effective classifier, outperforming $RandomForestClassifier$ by a few percentage points in accuracy.

The size of the training dataset plays a vital role in prediction accuracy.

For the $replGroup$ log group, using a training dataset size of 32400s resulted in more accurate predictions compared to a size of 3600s.

For the $transGroup$ log group, the best results were achieved using a training dataset size of 32400s, while using a dataset size of 43200s resulted in overfitting.

 

We need to ensure the system has enough resources to perform without hiccups, then we are able use gathered error-free logs in the classification of future incoming data.

S2 logs were more precise, therefore prediction accuracy of data was better than S1.

System resources, Log mechanism, and system variables are all important to get "clean data" for further analysis (classification).

 

Question/Comment 2:

  1. What are the limitations and potential bottlenecks that arise from weaker architectures (S1) in terms of logging mechanisms and generating accurate logs? How can these limitations be addressed or mitigated?

 

Answer 2:

One immediate solution would be to move towards a more robust hardware setup, akin to the S2 architecture, which consistently demonstrated better stability and classification results even with larger training sets.

Interestingly, for the S1 architecture, as the training set size increased, the classifiers often exhibited a decline in prediction effectiveness. This contrasts with the more robust S2 architecture, which consistently showed stability and similar results across varying training set sizes, reflecting its ability to handle larger data sets without degradation in classification accuracy.

While weaker architectures like S1 introduce challenges in log generation and analysis, by employing a combination of optimized machine learning methods, and continuous monitoring, these challenges can be effectively addressed and mitigated.

Future implementations of data analysis methods should focus on hyperparameter optimization, ensuring that the methods are fine-tuned for the specific challenges posed by weaker architectures.

Exploring deep learning for log analysis can provide better insights and predictions, especially when data is massive.

 

Weaker architectures may not be able to generate "clean" data without errors. Technical aspects may also not be as separable as in a stronger platform.

Log gatherer overhead could be reduced to sort that issue without changing architecture for a better one.

 

Question/Comment 3:

  1. In the context of the stock exchange system, what are the key parameters that have a high impact on system operation and lower classification correctness? How can these parameters be optimized to enhance prediction accuracy?

 

Answer 3:

The optimization of prediction accuracy in the context of the stock exchange system involves careful consideration and tweaking of technical parameters, as well as the selection of the appropriate classification method. Also, it's essential to ensure that the training dataset size is optimal to prevent issues like overfitting.

In the context of the stock exchange system, as described in the article, the following key parameters have a high impact on system operation and potentially lower classification correctness: $t_R$ Parameter (Interval between user requests) and architecture (hardware). Lastly, considering alternate log analysis methods and incorporating deep learning (despite the challenges with large data sets) may provide more comprehensive insights and further enhance the accuracy of predictions.

 

466-473

Testing architecture and system variables in the testing stage (benchmark).

 

Question/Comment 4:

  1. How does the size of the training set impact the effectiveness and accuracy of the classification models (Decision Tree, Random Forest, XGBoost) for different simulation groups (rep|Group, transGroup, reqGroup, algGroup, allGroup)? Are larger data sizes always associated with higher accuracy?

 

Answer 4:

It depends on group datasets, as shown in the tables.

Larger data sizes may cause overlearning and therefore fall in accuracy.

 

While larger training datasets can often improve model performance, it's essential to monitor for overfitting and ensure that the dataset is representative of the problem at hand.

Further, the impact of the training set size is also influenced by the architecture (S1 vs. S2). With architecture S1, increasing the training set sometimes resulted in decreased classifier effectiveness. However, architecture S2, which has more computational resources, showed more stable results with minor deviations as the training set size increased.

From the analysis, it is evident that the XGBoost model generally performs better across different simulation groups. However, the size of the training set does not always linearly correlate with the accuracy of the model. For instance, while the allGroup performed best with a training set of 10800s, the reqGroup required a longer training set of 21600s to achieve optimal results.

 

Question/Comment 5:

  1. What are the implications of using the XGBoost method as the most effective classifier for predicting user actions in different simulation groups? How does it compare to other classification methods (Decision Tree, Random Forest) in terms of training time and accuracy?

 

Answer 5:

The use of the XGBoost method as a classifier for predicting user actions in different simulation groups presents distinct implications e.g.: efficiency and accuracy.

While the XGBoost classifier may require slightly more training time than the Decision Tree, its superior accuracy across various simulation groups makes it the most effective choice for predicting user actions in the given context.

 

In conclusion, the use of the XGBoost method for the prediction of user actions in the analyzed stock exchange system based on its logs has shown marked advantages in terms of accuracy and speed compared to other classification methods like Decision Tree and Random Forest. XGBoost was determined to be better than RandomForest and also shorter to train by as high as 50% of the time. Also, it's less susceptible to overlearning than RF.

 

However, when maximum accuracy is the priority, the difference between RF and XGBoost might be less pronounced, and thus the choice might be influenced by other factors, such as the nature of the data or the ease of implementation.

 

 

Question/Comment 6:

  1. Considering the potential for deadlock, system desynchronization, or incorrect logs due to weaker architectures, what strategies can be implemented to ensure the logging mechanism functions correctly and generates reliable logs in real-time scenarios?

 

Answer 6:

To ensure the logging mechanism functions correctly and generates reliable logs in real-time scenarios, the following strategies could be inferred:

- Getting enough system resources. It was observed that the server architecture used may impact the quality of predictions and the reliability of logs. It's crucial to understand the limitations of the architecture and address potential bottlenecks.

- Examining log mechanism and performance overhead (optimization). The article suggests that manipulating parameters like $t_R$ (interval between user requests) may influence server performance.

- Implement sync methods between components to get data in the same timestamps.

 

 

Question/Comment 7:

  1. How might the application of other log analysis methods, such as deep learning, enhance the effectiveness and efficiency of the stock exchange system's prediction models? What challenges might arise in implementing deep learning approaches with large datasets and multiple simulation sets?

 

Answer 7:

The application of other log analysis methods, such as deep learning, might significantly enhance the effectiveness and efficiency of the stock exchange system's prediction models. Deep learning models could automatically learn and improve from experience without being explicitly programmed. Their ability to learn and make decisions from large datasets offers better accuracy and prediction capabilities when applied to complex scenarios. For the stock exchange system in discussion, deep learning could potentially learn intricate patterns within the logs and may recognize subtle correlations that traditional methods like DT, RF, and XGBoost might overlook.

 

However, implementing deep learning approaches, especially with large datasets and multiple simulation sets, does come with challenges.

DL methods may capture intricate relationships and features that CART methods may struggle to represent.

 

While CART methods have advantages, such as interpretability and faster training times, deep learning excels in handling complex data types, capturing intricate patterns, and achieving state-of-the-art performance in many tasks, especially those involving unstructured data which may benefit significantly in OSTS systems.

 

 

Question/Comment 8:

  1. Clarify the Research Objectives: The introduction lacks a clear statement of the research objectives. It would be helpful if the authors provide a concise and focused objective of the study, highlighting the research gaps they aim to address.

 

Answer 8:

The primary objective of this article is to scrutinize the operation of a web application, specifically a stock market system, and predict the actions executed by the players (denoted by the attribute $endpointUrl$) using the performance parameters documented in the logs of all system components. These logs are derived during the function of a benchmark which assesses the hardware architecture upon which the application operates.

 

Furthermore, the research aimed to examine the effectiveness of these classification methods when all the data from the aforementioned groups was combined into a single group, termed allGroup.  The outcome of the research would then discern the most optimal classification method for the dataset produced by the stock exchange application during its simulation using the prepared benchmark. Measurements such as the time taken for classifier training, the Root Mean Square Error (RMSE), and the calculated accuracy were taken into account. Also, the study investigated the impact of enlarging the training set size on the prediction of stock exchange action methods.

 

Throughout the study, the authors aim to bridge the research gaps concerning the performance of machine learning classifiers under various user request scenarios and server loads.

 

Added to the article.

 

 

Question/Comment 9:

  1. Provide Clearer Explanation of Classifiers: The section describing the classifiers used (Decision Tree, Random Forest, XGBoost) would benefit from a more detailed explanation of their underlying principles and advantages. This will help readers understand the rationale behind the selection of these classifiers for the study.

 

Answer 9:

Decision Tree Classifier

This is a supervised machine-learning algorithm that uses a tree-like graph to make decisions. At each node of the tree, the algorithm checks the value of a certain attribute and decides which way to proceed down the tree, eventually reaching a leaf node (decision). One of the main advantages of a Decision Tree is its simplicity and ease of visualization. In the given article, the DecisionTreeClassifier learns the fastest due to its straightforward nature. However, it was observed to be the least effective among the three classifiers in the context of the simulations.

Random Forest Classifier

Random Forest is an ensemble learning method that constructs a collection of decision trees during training. When making a prediction, the Random Forest considers the output of each tree in its "forest" and selects the class label based on a majority vote. This method helps in improving accuracy and controlling over-fitting. In the provided analysis, the RandomForestClassifier performed notably, especially in the $replGroup$ where it demonstrated superior effectiveness compared to the single Decision Tree.

XGBoost Classifier

XGBoost stands for eXtreme Gradient Boosting. It is an optimization of gradient boosting machine learning algorithm. The principle behind boosting is to create a sequence of models to correct the mistakes of the models before them in the sequence. XGBoost is highly regarded for its efficiency and speed. In the study's context, the XGBoost was observed to perform exceptionally well, especially with the simulation set of 32400s, outpacing RF in terms of accuracy and having a faster training time.

 

Added to the article.

 

 

Question/Comment 10:

  1. Enhance the Presentation of Results: The presentation of results in Table 12 is confusing and difficult to interpret. It would be helpful if the results are presented in a more organized and reader-friendly manner, such as providing separate tables for each classifier. Additionally, consider providing statistical measures (e.g., standard deviation) to assess the reliability of the results.

 

Answer 10:

We split the results of each classifier into one table to avoid clutter and allow readers to clearly understand the performance of each classifier.

 

We used the mean forecast error (RMSE) in our work. It measures the difference between predicted values and actual observed values. RMSE is more commonly used in forecast analysis and modeling to assess the quality of a predictive model. Compares predicted values (e.g. from a statistical model) with actual values. RMSE is specifically used to evaluate forecast errors in modeling.

The standard deviation value does not exceed 1/3 of the average value.

 

 

Question/Comment 11:

  1. Discuss the Implications of Architecture Differences: The manuscript briefly mentions that the performance significantly impacts the quality of logs, but lacks a detailed discussion on the implications of architectural differences (S1 and S2) on the classification effectiveness. It would be beneficial to present a more thorough analysis of how these architectural differences affect the accuracy and reliability of the prediction models.

 

Answer 11:

The article primarily revolves around the classification of logs generated from stock exchange operation simulations. Various parameters, including the number of containers (replGroup), intervals between transactions (transGroup), intervals between queries (reqGroup), and traffic generator actions (algGroup), are explored in relation to their technical specifications.

 

The parameter t_R, representing the interval between user requests, significantly impacts the performance of server architectures, specifically its durability against traffic generated by the benchmarking platform. The smaller the t_R interval, the more intense the server activity.

The accuracy of the classifiers was determined to be high across different scenarios. For the reqGroup that uses the S1 architecture, increasing the dataset did not show significant improvements in results. Notably, the RF (RandomForest) and XGBoost methods had comparable efficiency results, but the XGBoost had unmatched speed in terms of training.

Looking at the detailed classifier results, in the reqGroup using the S1 architecture, the XGBoost method consistently showed better accuracy percentages compared to DecisionTree and RandomForest classifiers. This suggests that the XGBoost classifier is more effective and reliable in handling the given server architecture and dataset.

In the algGroup, which tested various user behaviors during the operation of a stock exchange application, there was a relatively small difference in classification accuracy between the simplest method, DecisionTree, and the more advanced RandomForest. This might suggest that architectural differences are less impactful when there are variations in user behaviors, and the intrinsic properties of the classifiers become more prominent.

 

The article underscores that architectural differences, especially concerning the number of application replicas, play a crucial role in determining the classification's accuracy and reliability.

 

Added to the article.

 

 

Question/Comment 12:

  1. Address Limitations and Future Directions: The manuscript briefly mentions open issues and futur models, but it lacks a comprehensive discussion of limitations and potential future directions. It woul strengthen the paper to include a section dedicated to discussing the limitations of the study and suggestions for further research, such as exploring other log analysis methods or refining feature selection.

 

 

Answer 12:

Exploring other log analysis methods could provide a more holistic understanding of stock exchange operation simulations (Deep Learning). Neural networks, support vector machines, or ensemble methods could be potential avenues.

 

To provide a more comprehensive analysis, future studies should consider incorporating a broader array of classifiers.

 

Added to the article.

 

 

Question/Comment 13:

  1. Language and Clarity: The manuscript would benefit from minor improvements in language clarity and organization. Some sentences are convoluted and could be rephrased for better readability.

Additionally, ensure that terminology and concepts are explained consistently throughout the manuscript.

 

Answer 13:

I hope these revisions will enhance the manuscript's clarity and make it more accessible to readers.

We have rephrased some complex sentences for better readability.

Changed.

 

 

Question/Comment 14:

  1. Update Related Works and References: The references section needs to be updated with the most recent and relevant literature. Additionally, consider expanding the related works section to include a more comprehensive review of existing studies related to stock exchange systems and log analysis.

 

Answer 14:

We recognize the importance of keeping our references updated and ensuring a comprehensive review of related works.

The related works section was expanded to provide a broader overview of existing studies, particularly those focusing on log analysis in stock exchange systems or other similar environments.

 

Added.

 

 

Question/Comment 15:

  1. Proofread and Format: There are several grammatical and typographical errors throughout the manuscript. It is recommended to carefully proofread and edit the document for clarity and adherence to proper formatting guidelines.

 

Answer 15:

Done.

 

Question/Comment 16:

  1. Looking at Table 12, what trends can be observed in the training times for each classifier across different simulation groups? Is there a consistent pattern or significant variations?

 

Answer 16:

There is no consistent pattern.

 

 

Question/Comment 17:

  1. In Figure 10, it is mentioned that the effectiveness of classifying user actions is dependent on the considered sets and architectures. Could you provide a detailed analysis of how the classification effectiveness varies across the simulation groups for each classifier?

 

 

Answer 17:

Previous Figure shows tendency in 43 200s.

 

DecisionTreeClassifier

8CPU_20RAM (a): The classification effectiveness starts higher for the replGroup and gradually decreases as we move to allGroup.

The transGroup and testLastGroup appear to have a somewhat consistent effectiveness throughout the different train set simulations, with minor fluctuations.

12CPU_30RAM (b): The effectiveness is almost consistent for all simulation groups, ranging close to 90%. Minor variations can be seen, but they aren't substantial.

 

RandomForestClassifier

8CPU_20RAM (c): replGroup starts with higher effectiveness, which then slightly drops and stabilizes.

The transGroup shows a dip initially and then stabilizes with a slight increase towards the end.

The testLastGroup maintains relatively consistent performance.

The allGroup is relatively stable, with slight fluctuations.

12CPU_30RAM (d): All simulation groups maintain a high and stable classification effectiveness, hovering around 90%. The differences between groups are minimal.

 

XGBClassifier:

8CPU_20RAM (e): The replGroup and regGroup both show high effectiveness at the start, which decreases a bit and then remains stable.

The transGroup effectiveness drops initially and then maintains a stable performance.

testLastGroup and allGroup show minimal fluctuations but remain stable throughout.

12CPU_30RAM (f): The effectiveness across all simulation groups is consistently high, close to 90%. There are minimal variations across groups.

 

The DecisionTreeClassifier displays varied effectiveness depending on the simulation group, especially in the 8CPU_20RAM architecture. However, its performance becomes more consistent in the 12CPU_30RAM setup.

The RandomForestClassifier shows some fluctuations in the 8CPU_20RAM architecture but remains relatively stable in the 12CPU_30RAM setup.

The XGBClassifier has a consistent performance across both architectures, with minor variations based on the simulation group.

In conclusion, the effectiveness of classifying user actions does vary based on the considered sets and architectures. However, the variations are more pronounced in the 8CPU_20RAM architecture. The 12CPU_30RAM setup generally offers more consistent and high classification effectiveness across all classifiers and simulation groups.

 

Added to the article.

 

 

Question/Comment 18:

  1. Considering the classification accuracy results in Table 12, how do the different classifiers (Decision Tree, Random Forest, XGBoost) perform in terms of predicting user actions in the stock exchange system? Are there significant differences in accuracy between the classifiers?

 

 

Answer 18:

Results summarized in Table 12 are very useful for analyzing used classifiers (Decision Tree, Random Forest, XGBoost).

The performance of each classifier in predicting user actions in the stock exchange system:

 

DecisionTreeClassifier

The highest accuracy achieved is 86.59% with a training set of 32400s and a testing set of 43200s.

The lowest accuracy is 41.71% with a training set of 43200s and a testing set of 3600s.

The accuracy fluctuates significantly, ranging from as low as 41.71% to as high as 86.59%.

 

RandomForestClassifier

The highest accuracy achieved is 94.57% with a training set of 32400s and a testing set of 43200s.

The lowest accuracy is 53.48% with a training set of 43200s and a testing set of 3600s.

The accuracy varies but reaches peak values above 94%, indicating high effectiveness in certain configurations.

 

XGBClassifier

The highest accuracy achieved is 95.10% with a training set of 32400s and a testing set of 43200s.

The lowest accuracy is 51.04% with a training set of 43200s and a testing set of 3600s.

Like the RandomForest, the XGBClassifier shows strong performance, achieving above 95% accuracy in its best configuration.

 

RandomForestClassifier and XGBClassifier both achieve peak accuracies above 94% with certain configurations, indicating their strong ability to classify user actions in the stock exchange system effectively.

DecisionTreeClassifier, on the other hand, achieves a lower peak accuracy compared to the other two classifiers.

All three classifiers display a decrease in accuracy when trained on the largest dataset (43200s) and tested on the smallest dataset (3600s), highlighting the challenge of generalizing from a large diverse dataset to a smaller one.

There are significant differences in accuracy between the classifiers, especially when comparing the peak performance of RandomForest and XGBClassifier with that of the DecisionTreeClassifier.

In conclusion, while all three classifiers show capability in predicting user actions, the RandomForestClassifier and XGBClassifier tend to outperform the DecisionTreeClassifier in the presented stock exchange system, achieving higher peak accuracies. The choice between RandomForest and XGBClassifier would depend on specific requirements and constraints, but both offer robust performance.

 

Added to the article.

 

 

Question/Comment 19:

  1. What are the main findings and implications of the classification results for the different simulation groups? Do certain groups consistently show higher accuracy or effectiveness in predicting user actions, and if so, what factors might contribute to these patterns?

 

Answer 19:

Certain groups like the $transGroup$ consistently showed higher accuracy in predicting user actions compared to the $replGroup$. The size of the training dataset plays a significant role in the effectiveness of the classifiers. The optimal size found was $32400s$ in both groups studied. The XGBoost algorithm showed promise in both groups, outperforming or matching the RandomForestClassifier in terms of accuracy while also requiring less training time. Factors contributing to these patterns include the nature of the data in each group, the parameters of the application being manipulated, and the size of the training datasets used.

 

Added to the article.

 

 

Question/Comment 20:

  1. In Figure 10, the XGBoost method is consistently shown to be the most effective classifier across different simulation groups. What are the key features or characteristics of XGBoost that make it more suitable for predicting user actions in the stock exchange system compared to other classifiers?

 

Answer 20:

Added in answer 9.

 

 

Question/Comment 21:

  1. Can you provide more insights into the significant features or parameters that contribute to the accuracy and effectiveness of the classifiers? Are there any specific features that have a higher predictive power in the classification of user actions?

 

Answer 21:

Was described in: "The experimental environment of the OSTS stock exchange system, discussed in [2], was used to obtain system logs."

 

[2. Borowiec, M.; Piszko, R.; Rak, T. Knowledge Extraction and Discovery about Web System Based on the Benchmark Application of Online Stock Trading System. Sensors 2023, https://doi.org/10.3390/s23042274.]

 

 

Question/Comment 22:

  1. How do the classification results align with the system's performance and functionality? Are there any trade-offs between accuracy and training time that need to be considered in practical applications?

 

Answer 22:

Considering the trade-off between accuracy and training time, it is evident that there are differences in between classifiers. For instance, the XGBoostClassifier often achieves high accuracy with relatively shorter training times compared to the RandomForestClassifier. For practical applications, if time is a significant constraint, then the XGBoostClassifier might be preferable given its efficiency. On the other hand, if the aim is to achieve the absolute highest accuracy, regardless of training time, then one might consider experimenting with various training set sizes, as the results indicate that bigger datasets don't always yield better accuracy.

Also, the study observed that accuracy was not always directly proportional to larger data sizes. For instance, a weaker architecture (S1) might produce incorrect logs or even malfunction due to system desynchronization, emphasizing the need for data verification.

 

 

Question/Comment 23:

  1. Considering the limitations mentioned in the paper, such as potential architectural limitations and system desynchronization, how might these challenges impact the accuracy and reliability of the classification models? What steps could be taken to mitigate these challenges in real-world scenarios?

 

 

Answer 23:

Added in answers 1-3.

 

Question/Comment 24:

  1. Considering the specific simulation groups analyzed (replGroup, transGroup, reqGroup, algGroup, allGroup), can you discuss the significance of these groups in the context of a stock exchange system?

How do these groups capture the key aspects and technical parameters of the system's operation?

 

Answer 24:

Added in answer 3.

 

 

Question/Comment 25:

  1. The paper reports that the XGBoost method consistently outperformed other classifiers in predicting user actions. Can you provide a detailed analysis of the factors that contribute to XGBoost's superior performance? What specific characteristics or mechanisms of XGBoost make it more effective in this context?

 

Answer 25:

Added in answer 9.

 

 

Question/Comment 26:

  1. In Figure 10, there are variations in the classification effectiveness depending on the considered datasets and architectures. Can you elaborate on the potential reasons for these variations? How do the characteristics of the datasets and architectures influence the performance of the classification models?

 

Answer 26:

Reason: Clean data vs incorrect data due to insufficient resources.

 

 

Question/Comment 27:

  1. The paper mentions that the performance significantly impacts the quality of logs, as confirmed by the classification effectiveness comparison between S1 and $2 architectures, Can vou provide further Insights into how the performance of the system affects the logging mechanism and leads to potential misinterpretation of the logs? What are the implications of these findings for real-world stock exchange systems?

 

Answer 27:

Added in answer 26.

 

 

 

Question/Comment 28:

  1. The results show that the increase in log data does not always lead to higher accuracy. Can you discuss the underlying reasons behind this observation? What factors might contribute to the absence of a direct correlation between data size and classification performance?

 

Answer 28:

The observation that an increase in log data doesn't always lead to higher accuracy could be attributed to various reasons: The phenomenon of overlearning/overfitting - the nature of CART.

Classification and Regression Trees (CART) are susceptible to overfitting, especially when dealing with large datasets. Overfitting, also known as overlearning, occurs when a model is trained too well on the training data, making it perform poorly on unseen data.

It was observed that while a training set of size 32400s seemed to be the most effective for building classifiers in both replGroup and transGroup, using a longer set like 43200s resulted in overfitting. Overfitting is a classic machine learning problem where a model trained on a lot of data starts to memorize the data rather than generalize from it. Hence, simply increasing data size may sometimes lead to decreased model performance on new, unseen data.

 

The classifiers, when tested on certain log groups like $replGroup$, demonstrated that they did not possess high effectiveness and repeatability in classification precision. This suggests that simply having more data does not necessarily mean the classifiers would perform optimally.

 

 

Question/Comment 29:

  1. Considering the limitations mentioned in the paper, such as potential deadlock, system desynchronization, and generation of incorrect logs, how might these limitations impact the overall reliability of the classification models? Can you provide suggestions or recommendations for mitigating these limitations in stock exchange systems?

 

Answer 29:

Added in answers 6 and 11.

 

 

Question/Comment 30:

  1. The paper mentions the application of other log analysis methods beyond the ones proposed, specifically deep learning. What are the potential advantages and challenges of applying deep learning techniques to this stock exchange system? How might deep learning approaches enhance the prediction quality and overcome limitations of the current classification methods?

 

Answer 30:

Added in answer 7.

 

 

Question/Comment 31:

  1. In Table 12, the training times for each classifier are presented. Can you discuss the practical implications of the training times for each classifier? How might these training times impact the feasibility and real-time applicability of the classification models in a stock exchange system?

 

Answer 31:

The model in a real system needs to be refreshed from time to time, and here this learning time could only be important in the context of a future update of the model analyzing the data.

Added to the article.

 

 

Question/Comment 32:

  1. The introduction effectively highlights the significance of predicting user actions in a stock exchange system and the potential impact on system performance and decision-making. The introduction could benefit from providing a clear and concise statement of the research objectives to guide the reader.

What specific research gaps and challenges does this study aim to address in the context of predicting user actions in a stock exchange system?

 

Answer 32:

Added in answers 7 and 8.

 

 

Question/Comment 33:

  1. The related works section demonstrates a good understanding of existing literature on stock exchange systems and machine learning techniques. It would be valuable to expand the discussion on log analysis methods and their applications in the context of stock exchange systems. What are the limitations or gaps in previous studies that this research aims to overcome? How does this work contribute to the existing knowledge in the field?

 

Answer 33:

Added in answer 14.

 

 

Question/Comment 34:

  1. The methods section provides a comprehensive overview of the experimental setup and the classification methods employed. It would be helpful to provide more detail on the specific features or parameters used in the classification models and the process of feature selection. How were the simulation groups (replGroup, transGroup, reqGroup, algGroup, allGroup) selected, and what reasons dictated their inclusion in the study?

 

Answer 34:

All groups were described in:

[2. Borowiec, M.; Piszko, R.; Rak, T. Knowledge Extraction and Discovery about Web System Based on the Benchmark Application of Online Stock Trading System. Sensors 2023, https://doi.org/10.3390/s23042274.]

 

The parameters of classification models are described in Table 2.

 

 

Question/Comment 35:

 

Answer 35:

 

 

Question/Comment 36:

  1. The results section presents detailed numerical results in tables and figures, facilitating a clear understanding of the findings. The presentation of results could be improved by providing statistical measures such as confidence intervals or p-values to assess the significance of differences between classifiers and simulation groups.

 

Answer 36:

Added in answers 10.

 

 

Question/Comment 37:

  1. The discussion section provides a comprehensive analysis of the findings and relates them back to the research objectives. The discussion could be strengthened by further exploring the practical implications of the results and their potential impact on real-world stock exchange systems.

 

Answer 37:

This statement is true, but access to such data is difficult. No such data is available in the public domain.

 

 

Question/Comment 38:

  1. How do the findings of this study contribute to the understanding of user actions prediction in stock exchange systems, and how can they be applied to improve decision-making processes or system performance?

 

Answer 38:

Added in answers 12.

 

 

Question/Comment 39:

  1. Limitations and Future Directions: The limitations and future directions section acknowledges the potential shortcomings of the study and discusses potential avenues for further research. It would be beneficial to provide more insight into the challenges associated with implementing deep learning methods in the context of large datasets and multiple simulation groups.

 

Answer 39:

Added in answers 2, 3, 7, 12, 30.

 

 

Question/Comment 40:

  1. Considering the identified limitations, how might these impact the reliability and generalizability of the classification models, and what steps could be taken to address these challenges in future studies?

 

Answer 40:

Added in answers 2, 12, 23, 29, 33.

 

 

 

 

Best regards,

Authors

Reviewer 2 Report

Overall, this paper presents an incremental contribution in applying established machine learning techniques for log analysis and user behavior modeling. The main weakness is the lack of novelty. To improve, I would recommend:

  • Comparing to more baselines beyond just DT, RF, and XGBoost - how does this approach compare to basic classifiers like SVM, naive Bayes etc?
  • Performing deeper analysis on the results - digging into the misclassified instances, feature importance, effects of tuning hyperparameters, etc.
  • Softening claims of having an "exceptional" or "groundbreaking" new approach when this is more of an incremental contribution
  • Adding more recent and relevant references on related work, review following paper and cite appropriately:

P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

Rak, T.; Żyła, R. Using Data Mining Techniques for Detecting Dependencies in the Outcoming Data of a Web-Based System. Appl. Sci. 2022, 12, 6115. https://doi.org/10.3390/app12126115

Author Response

Dear Reviewer,

Thank you very much for your review and comments. You may find our responses to your comments below. All changes were marked by a red font.

 

Question/Comment 1:

Comparing to more baselines beyond just DT, RF, and XGBoost - how does this approach compare to basic classifiers like SVM, naive Bayes etc?

 

Answer 1:

Preliminary research results of the methods ("(linear regression, logistic regression, Support Vector Machines (SVM)), nonlinear models (naive Bayes classifier, k-nearest neighbors (KNN) algorithm, decision trees, neural networks)") showed a mismatch to the problem under study.

 

Question/Comment 2:

Performing deeper analysis on the results - digging into the misclassified instances, feature importance, effects of tuning hyperparameters, etc.

 

Answer 2:

A deeper analysis of the results was added. We have corrected the paper. We have re-edited “Conclusions” section.

 

 

Question/Comment 3:

Softening claims of having an "exceptional" or "groundbreaking" new approach when this is more of an incremental contribution

 

Answer 3:

We changed these sentences to more accurate: incremental.

 

 

Question/Comment 4:

Adding more recent and relevant references on related work, review following paper and cite appropriately:

 

Answer 4:

Added new references (2023).

 

Thank you once again.

Best regards,

Authors

Round 2

Reviewer 1 Report

The authors have comprehensively addressed all comments and suggestions. Consequently, I would recommend the paper for acceptance into the publication.

Minor editing of English language required.

Back to TopTop