Data-Driven Decision Support in SaaS Cloud-Based Service Models
Abstract
1. Introduction
2. Related Work
- Presents a systematic review process.
- Includes only data-driven methods.
- Presents of performance comparison to support model selection.
- Discusses of challenges the SaaS vendors face regarding maintaining a core user base like churn, user engagement and user retention.
- A comprehensive review of research initiatives aiming to support decision-making for SaaS providers, expanding beyond single-task approaches to address broader strategic goals.
- An overview of various data inputs utilized across studies, identifying frequently used data sources and highlighting cases where multiple kinds of data were integrated for more robust insights.
- A comparative evaluation of employed machine learning techniques, analyzing their relative strengths, weaknesses, and performance in different SaaS decision-making contexts.
- An overview of user-faced output types proposed by researchers to help SaaS providers take effective, data-driven actions.
- A synthesis of best practices and key takeaways from the literature, offering actionable recommendations for enhancing user retention and ensuring the long-term sustainability of SaaS business models.
3. Research Methodology
- Not related to SaaS;
- Although related to SaaS they were not about supporting decisions;
- Not data-driven, theoretical.
4. Data-Driven Decision Support in SaaS
4.1. Main Focus
4.2. Data Sources
4.3. Machine Learning Methods
4.4. Form of Outputs Presented to SaaS Providers
- Visualizations
- 2.
- Simulation/What-If Analysis
- 3.
- Segmentation and Persona Modeling
- 4.
- Business Impact Metrics
- 5.
- Model Deployment and Integration
5. Discussion
5.1. Insights and Strategic Recommendations
5.2. Selection of Optimal Machine Learning Model for Decision-Making
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Conflicts of Interest
References
- Bokhari, M.U.; Shallal, Q.M.; Tamandani, Y.K. Cloud Computing Service Models: A Comparative Study. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 890–895. [Google Scholar]
- Mohammed, C.M.; Zeebaree, S.R.M. Sufficient Comparison Among Cloud Computing Services: IaaS, PaaS, and SaaS: A Review. Int. J. Sci. Bus. 2021, 5, 17–30. [Google Scholar]
- Cusumano, M. Cloud Computing and SaaS as New Computing Platforms. Commun. ACM 2010, 53, 27–29. [Google Scholar] [CrossRef]
- Kumar, K.V.K.M. Software as a service for efficient cloud computing. Int. J. Res. Eng. Technol. 2014, 3, 178–181. [Google Scholar] [CrossRef]
- Tsai, W.; Bai, X.; Huang, Y. Software-as-a-Service (SaaS): Perspectives and Challenges. Sci. China Inf. Sci. 2014, 57, 1–15. [Google Scholar] [CrossRef]
- Berger, P.D.; Nasr, N.I. Customer Lifetime Value: Marketing Models and Applications. J. Interact. Mark. 1998, 12, 17–30. [Google Scholar] [CrossRef]
- Wang, R.; Ying, S.; Jia, X. Log Data Modeling and Acquisition in Supporting SaaS Software Performance Issue Diagnosis. Int. J. Softw. Eng. Knowl. Eng. 2019, 29, 1245–1277. [Google Scholar] [CrossRef]
- Morozov, V.; Mezentseva, O.; Kolomiiets, A.; Proskurin, M. Predicting Customer Churn Using Machine Learning in IT Startups. In Lecture Notes in Computational Intelligence and Decision Making, 2021 International Scientific Conference “Intellectual Systems of Decision-making and Problems of Computational Intelligence”; Springer: Berlin/Heidelberg, Germany, 2022; pp. 645–664. [Google Scholar]
- Manzoor, A.; Atif Qureshi, M.; Kidney, E.; Longo, L. A Review on Machine Learning Methods for Customer Churn Prediction and Recommendations for Business Practitioners. IEEE Access 2024, 12, 70434–70463. [Google Scholar] [CrossRef]
- Heilig, L.; Voß, S. Decision Analytics for Cloud Computing: A Classification and Literature Review. In Bridging Data and Decisions; INFORMS: Catonsville, MD, USA, 2014; pp. 1–26. [Google Scholar] [CrossRef]
- Arora, S.; Thota, S.R.; Gupta, S. Artificial Intelligence-Driven Big Data Analytics for Business Intelligence in SaaS Products. In Proceedings of the 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT), Delhi, India, 2–4 August 2024; IEEE: New York, NY, USA, 2024; pp. 164–169. [Google Scholar]
- Ge, Y.; He, S.; Xiong, J.; Brown, D.E. Customer Churn Analysis for a Software-as-a-Service Company. In Proceedings of the 2017 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 28 April 2017; IEEE: New York, NY, USA, 2017; pp. 106–111. [Google Scholar]
- Phumchusri, N.; Amornvetchayakul, P. Machine Learning Models for Predicting Customer Churn: A Case Study in a Software-as-a-Service Inventory Management Company. Int. J. Bus. Intell. Data Min. 2024, 24, 74–106. [Google Scholar] [CrossRef]
- Mezentseva, O.V.; Kolesnikova, K.; Kolomiiets, A. Customer Churn Prediction in the Software by Subscription Models IT Business Using Machine Learning Methods. In Proceedings of the International Workshop on Information Technologies: Theoretical and Applied Problems, Ternopil, Ukraine, 16–18 November 2021. [Google Scholar]
- Dias, J.R.; Antonio, N. Predicting Customer Churn Using Machine Learning: A Case Study in the Software Industry. J. Mark. Anal. 2025, 13, 111–127. [Google Scholar] [CrossRef]
- Sanchez Ramirez, J.; Coussement, K.; De Caigny, A.; Benoit, D.F.; Guliyev, E. Incorporating Usage Data for B2B Churn Prediction Modeling. Ind. Mark. Manag. 2024, 120, 191–205. [Google Scholar] [CrossRef]
- Sergue, M. Customer Churn Analysis and Prediction Using Machine Learning for a B2B SaaS Company. Master’s Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2020. [Google Scholar]
- Saias, J.; Rato, L.; Gonçalves, T. An Approach to Churn Prediction for Cloud Services Recommendation and User Retention. Information 2022, 13, 227. [Google Scholar] [CrossRef]
- Thota, S.R.; Arora, S.; Gupta, S. Hybrid Machine Learning Models for Predictive Maintenance in Cloud-Based Infrastructure for SaaS Applications. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Gajananan, K.; Loyola, P.; Katsuno, Y.; Munawar, A.; Trent, S.; Satoh, F. Modeling Sentiment Polarity in Support Ticket Data for Predicting Cloud Service Subscription Renewal. In Proceedings of the 2018 IEEE International Conference on Services Computing (SCC), San Francisco, CA, USA, 2–7 July 2018; IEEE: New York, NY, USA, 2018; pp. 49–56. [Google Scholar]
- Chakraborty, A.; Raturi, V.; Harsola, S. BBE-LSWCM: A Bootstrapped Ensemble of Long and Short Window Clickstream Models. In Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD), Bangalore, India, 4–7 January 2024; ACM: New York, NY, USA, 2024; pp. 350–358. [Google Scholar]
- Hoang, H.D.; Cam, N.T. Early Churn Prediction in Freemium Game Mobile Using Transformer-Based Architecture for Tabular Data. In Proceedings of the 2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC), Gwalior, India, 27–28 July 2024; IEEE: New York, NY, USA, 2024; pp. 568–573. [Google Scholar]
- Rothmeier, K.; Pflanzl, N.; Hullmann, J.A.; Preuss, M. Prediction of Player Churn and Disengagement Based on User Activity Data of a Freemium Online Strategy Game. IEEE Trans. Games 2021, 13, 78–88. [Google Scholar] [CrossRef]
- Kristensen, J.T.; Burelli, P. Combining Sequential and Aggregated Data for Churn Prediction in Casual Freemium Games. In Proceedings of the 2019 IEEE Conference on Games (CoG), London, UK, 20–23 August 2019; IEEE: New York, NY, USA, 2019; pp. 1–8. [Google Scholar]
- Karmakar, B.; Liu, P.; Mukherjee, G.; Che, H.; Dutta, S. Improved Retention Analysis in Freemium Role-Playing Games by Jointly Modelling Players’ Motivation, Progression and Churn. J. R. Stat. Soc. Ser. A Stat. Soc. 2022, 185, 102–133. [Google Scholar] [CrossRef]
- Pang, L.; Hu, Z.; Liu, Y. How to Retain Players through Dynamic Quality Adjustment in Video Games. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing China, 12–14 March 2021; IEEE: New York, NY, USA, 2021; pp. 154–160. [Google Scholar]
- Boyle, R.E.; Pledger, R.; Brown, H.-F. Iterative Mixed Method Approach to B2B SaaS User Personas. Proc. ACM Hum. Comput. Interact. 2022, 6, 1–44. [Google Scholar] [CrossRef]
- Mali, M.; Mangaonkar, N. Behavioral Customer Segmentation For Subscription. In Proceedings of the 2023 3rd Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 25–27 August 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Li, H. (Alice) Converting Free Users to Paid Subscribers in the SaaS Context: The Impact of Marketing Touchpoints, Message Content, and Usage. Prod. Oper. Manag. 2022, 31, 2185–2203. [Google Scholar] [CrossRef]
- Yoganarasimhan, H.; Barzegary, E.; Pani, A. Design and Evaluation of Personalized Free Trials. arXiv 2020, arXiv:2006.13420. [Google Scholar] [CrossRef]
- Harahap, E.P.; Hermawan, P.; Kusumawardhani, D.A.R.; Rahayu, N.; Komara, M.A.; Agustian, H. User Interface Design’s Impact on Customer Satisfaction and Loyalty in SaaS E-Commerce. In Proceedings of the 2024 3rd International Conference on Creative Communication and Innovative Technology (ICCIT), Tangerang, Indonesia, 7–8 August 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- van Belle, E.P.J. Data-Driven Drivers of Customer Loyalty in a Business-to-Business Environment for the Software as a Service Industry. Master’s Thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 2022. [Google Scholar]
- Najjar, A.; Boissier, O.; Picard, G. Elastic & Load-Spike Proof One-to-Many Negotiation to Improve the Service Acceptability of an Open SaaS Provider. In Autonomous Agents and Multiagent Systems, Proceedings of the AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, 8–12 May 2017, Revised Selected Papers; Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–20. [Google Scholar]
- Chiang, W.-H.; Ahmad, U.; Wang, S.; Bukhsh, F. Investigating Aha Moment Through Process Mining. In Proceedings of the 25th International Conference on Enterprise Information Systems, Prague, Czech Republic, 24–26 April 2023; SCITEPRESS—Science and Technology Publications: Setúbal, Portugal, 2023; pp. 164–172. [Google Scholar]
- Ahlgren, O.; Dalentoft, J. Collecting and Integrating Customer Feedback: A Case Study of SaaS Companies Working B2B. Master’s Thesis, Lund University, Lund, Sweden, 2020. [Google Scholar]
- Kumar, G.S.C.; Dhanalaxmi, B. Leveraging Usage-Based SaaS Models: Optimizing Revenue and User Experience. Knowl. Trans. Appl. Mach. Learn. 2025, 3, 12–17. [Google Scholar] [CrossRef]
- Baumann, E.; Kern, J.; Lessmann, S. Usage Continuance in Software-as-a-Service. Inf. Syst. Front. 2022, 24, 149–176. [Google Scholar] [CrossRef]
- Curiskis, S.; Dong, X.; Jiang, F.; Scarr, M. A Novel Approach to Predicting Customer Lifetime Value in B2B SaaS Companies. J. Mark. Anal. 2023, 11, 587–601. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random Survival Forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R. News 2002, 2, 18–22. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Dreiseitl, S.; Ohno-Machado, L. Logistic Regression and Artificial Neural Network Classification Models: A Methodology Review. J. Biomed. Inf. 2002, 35, 352–359. [Google Scholar] [CrossRef]
- Hastie, T.; Rosset, S.; Zhu, J.; Zou, H. Multi-Class AdaBoost. Stat. Interface 2009, 2, 349–360. [Google Scholar] [CrossRef]
- AlShourbaji, I.; Helian, N.; Sun, Y.; Hussien, A.G.; Abualigah, L.; Elnaim, B. An Efficient Churn Prediction Model Using Gradient Boosting Machine and Metaheuristic Optimization. Sci. Rep. 2023, 13, 14441. [Google Scholar] [CrossRef] [PubMed]
- Rouder, J.N.; Morey, R.D. Teaching Bayes’ Theorem: Strength of Evidence as Predictive Accuracy. Am. Stat. 2019, 73, 186–190. [Google Scholar] [CrossRef]
- Huang, X.; Khetan, A.; Cvitkovic, M.; Karnin, Z. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv 2020, arXiv:2012.06678. [Google Scholar]
- Ren, J.; Pang, L.; Cheng, Y. Dynamic Pricing Scheme for IaaS Cloud Platform Based on Load Balancing: A Q-Learning Approach. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 24–26 November 2017; IEEE: New York, NY, USA, 2017; pp. 806–810. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- van der Aalst, W. Process Mining; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 978-3-662-49850-7. [Google Scholar]
- Jiang, J.; Nguyen, T. Linear and Generalized Linear Mixed Models and Their Applications; Springer: New York, NY, USA, 2021; ISBN 978-1-0716-1281-1. [Google Scholar]
- Rizopoulos, D.; Verbeke, G.; Molenberghs, G. Shared Parameter Models under Random Effects Misspecification. Biometrika 2008, 95, 63–74. [Google Scholar] [CrossRef]
- Park, S.; Gupta, S. Handling Endogenous Regressors by Joint Estimation Using Copulas. Mark. Sci. 2012, 31, 567–586. [Google Scholar] [CrossRef]
- Pereira, I.; Madureira, A.; Bettencourt, N.; Coelho, D.; Rebelo, M.Â.; Araújo, C.; de Oliveira, D.A. A Machine Learning as a Service (MLaaS) Approach to Improve Marketing Success. Informatics 2024, 11, 19. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv 2016, arXiv:1602.04938. [Google Scholar]
- Dwork, C. Differential Privacy. In International Colloquium on Automata, Languages, and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Konečn\`y, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
- Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
- Grigoriadis Ioannis and Vrochidou, E. and T.I. and P.G.A. Machine Learning as a Service (MLaaS)—An Enterprise Perspective. In Proceedings of the International Conference on Data Science and Applications, Jaipur, India, 14–15 July 2023; Nanda, S.J., Yadav, R.P., Gandomi, A.H., Saraswat, M., Eds.; Springer: Singapore, 2023; pp. 261–273. [Google Scholar]
Ref. | Usage Behavior | Transactional/Business Metrics | Customer Profile | Financial Data | Customer Support | Satisfaction (e.g., NPS) | Survey/Interview | Marketing |
---|---|---|---|---|---|---|---|---|
[18] | ✓ | ✓ | ✓ | |||||
[24] | ✓ | |||||||
[12] | ✓ | ✓ | ✓ | |||||
[14] | ✓ | ✓ | ||||||
[22] | ✓ | |||||||
[19] | ✓ | ✓ | ||||||
[25] | ✓ | ✓ | ||||||
[13] | ✓ | ✓ | ✓ | |||||
[20] | ✓ | |||||||
[15] | ✓ | ✓ | ✓ | |||||
[8] | ✓ | ✓ | ✓ | |||||
[23] | ✓ | |||||||
[21] | ✓ | ✓ | ||||||
[38] | ✓ | ✓ | ✓ | ✓ | ||||
[29] | ✓ | |||||||
[30] | ✓ | ✓ | ✓ | |||||
[31] | ✓ | ✓ | ✓ | |||||
[26] | ✓ | ✓ | ||||||
[33] | ✓ | ✓ | ||||||
[32] | ✓ | |||||||
[27] | ✓ | ✓ | ||||||
[37] | ✓ | ✓ | ✓ | |||||
[16] | ✓ | ✓ | ✓ | ✓ | ||||
[28] | ✓ | ✓ | ||||||
[35] | ✓ | ✓ | ✓ | ✓ | ||||
[17] | ✓ | ✓ | ✓ | ✓ | ||||
[36] | ✓ | ✓ | ||||||
[34] | ✓ | ✓ | ✓ |
Ref. | Proposed Method | Compared Methods | Evaluation Results | Validation Method | Dataset | |
---|---|---|---|---|---|---|
Common Metrics | Miscellaneous Metrics | |||||
[18] | Random Forest | Neural Networks, AdaBoost | Random Forest: 0.997 AUC, 0.988 Accuracy, 0.989 F-measure for Non-Churn Class and 0.981 for Churn Neural Networks: 0.968 AUC, 0.965 Accuracy, 0.975 F-measure for Non-Churn Class and 0.946 for Churn AdaBoost: 0.995 AUC, 0.984 Accuracy, 0.989 F-measure for Non-Churn Class and 0.974 for Churn | - | Training (64%), validation (16%), and test (20%) sets | Dataset from a partner company associated with the University of Évora, containing 196,977 instances corresponding to 26,418 service subscriptions |
[24] | Hybrid Model (LSTM Hidden State) | Random Forest, LSTM | Hybrid Models: 0.8741 AUC, 0.6953 F1-score, 0.8023 Accuracy LSTM: 0.8592 AUC, 0.6795 F1-score, 0.7900 Accuracy Random Forest: 0.8405 AUC, 0.6414 F1-score, 0.7749 Accuracy | - | 10-fold cross-validation | Dataset from player logs of a freemium mobile game developed by Tactile Games including 2,284,238 records of 814,822 unique players |
[12] | XGBoost | Logistic Regression, Random Forests | XGBoost: 0.7526 AUC Random Forest: ~0.5 AUC Logistic Regression: 0.7257 AUC | - | 10-fold cross-validation | Dataset from a client SaaS company including 76,668 observations of 20 predictor variables |
[14] | Random Forest | Decision Trees, Support Vector Machine, Neural Networks, Naïve Bayes, Logistic Regression | Random forest: 0.88 Training Accuracy, 0.87 Test Accuracy Decision Trees: 1.00 (overfitting) Training Accuracy, 0.76 Test Accuracy Support Vector Machine: 1.00 (overfitting) Training Accuracy, 0.63 Test Accuracy Neural Networks: 0.85 Training Accuracy, 0.82 Test Accuracy Naïve Bayes: 0.71 Training Accuracy, 0.69 Test Accuracy Logistic Regression: 0.73 Training Accuracy, 0.71 Test Accuracy | - | train-test split (percentage not mentioned) | A B2B SaaS subscriptions dataset (source not mentioned) including 7044 examples of B2B SaaS subscriptions and 21 variables |
[22] | XGBoost | Transformer-based models (FT-transformer), GBDT | XGBoost: 0.948 Accuracy, 0.9545 Precision, 0.9418 Recall, 0.9481 F1-Score, 0.988 AUC Transformer-based models: 0.8679 Accuracy, 0.8477 Precision, 0.8875 Recall, 0.8671 F1-Score, 0.949 AUC GBDT: 0.8561 Accuracy, 0.8755 Precision, 0.8328 Recall, 0.8536 F1-Score, 0.934 AUC | - | train-test split (percentage not mentioned) | Dataset from a mobile freemium game includes data from over 268,370 users |
[19] | Hybrid Model (SVM + Naïve Bayes) | KNN, Random Forest, ANN, Decision tree | Hybrid Model: 0.9567 Accuracy, 0.943 Precision, 0.9565 Recall, 0.943 F1-Score ANN: 0.789 Accuracy, 0.8403 Precision, 0.8824 Recall, 0.8608 F1-Score Random Forest: 0.8 Accuracy, 0.79 Precision, 0.80 Recall, 0.79 F1-Score KNN: 0.839 Accuracy, 0.826 Precision, 0.829 Recall, 0.781 F1-Score Decision Tree: 0.9097 Accuracy, 0.9242 Precision, 0.9242 Recall, 0.9242 F1-Score | - | train-test split (80% train—20% test) | Dataset from Kaggle containing subscription details on 7044 customers of a fictional SaaS company |
[13] | Random Forest | Decision Tree, Logistic Regression, Support Vector Machine | Random Forest: 0.916 Recall, 0.926 F1-Score, 0.92 Accuracy, 0.939 Precision Decision Tree: 0.945 Recall, 0.871 F1-Score, 0.845 Accuracy, 0.809 Precision Logistic Regression: 0.868 Recall, 0.902 F1-Score, 0.896 Accuracy, 0.939 Precision Support Vector Machine: 0.881 Recall, 0.839 F1-Score, 0.814 Accuracy, 0.803 Precision | - | train-test split (80% train—20% test), 10-fold cross validation | Dataset extracted from the case-study company’s database system containing 1788 observations of churn and non-churn samples |
[11] | Neural Networks | - | Neural Networks: 0.9694 Accuracy, 0.9651 Precision, 0.9540 Recall | - | Not provided | Dataset collected from a cloud service provider including approximately 700 unique cloud service offering-customer pairs and around 90,000 associated support tickets |
[15] | XGBoost | Random Forest, Logistic Regression, Neural Networks, AdaBoost, Gradient Boosting Machine | XGBoost: 0.7956 Accuracy, 0.7916 Precision, 0.8507 Recall, 0.8201 F1-Score, 0.86 ROC AUC Random Forest: 0.7877 Accuracy, 0.8042 Precision, 0.8096 Recall, 0.8069 F1-Score, 0.85 ROC AUC Logistic Regression: 0.7757 Accuracy, 0.7986 Precision, 0.7895 Recall, 0.7940 F1-Score, 0.84 ROC AUC Neural Networks: 0.7835 Accuracy, 0.7910 Precision, 0.8220 Recall, 0.8062 F1-Score, 0.84 ROC AUC AdaBoost: 0.7867 Accuracy, 0.7970 Precision, 0.8191 Recall, 0.8079 F1-Score, 0.86 ROC AUC Gradient Boosting Machine: 0.7935 Accuracy, 0.7946 Precision, 0.8402 Recall, 0.8167 F1-Score, 0.86 ROC AUC | - | train-test split (80% train—20% test) | Two datasets provided by a Portuguese software house with the final dataset included 9539 observations from the two datasets combined |
[8] | Random Forest | Neural Networks, Decision Trees, Logistic Regression, Support Vector Machine, Naïve Bayes | Random Forest: 0.88 Training Accuracy, 0.87 Test Accuracy Neural Networks: 0.85 Training Accuracy, 0.82 Test Accuracy Decision Tree: 1.00 (overfitting) Training Accuracy, 0.76 Test Accuracy Logistic Regression: 0.73 Training Accuracy, 0.71 Test Accuracy Support Vector Machine: 1.00 (overfitting) Training Accuracy, 0.63 Test Accuracy Naive Bayes: 0.71 Training Accuracy, 0.69 Test Accuracy | - | train-test split (percentage not mentioned) | User activity datasets (source and number of data not mentioned) |
[23] | Random Forest | Neural Networks, Decision Trees, Logistic Regression, Support Vector Machine, Naïve Bayes, Gradient Boosting, KNN | Random Forest: 0.997 AUC Decision tree: 0.987 AUC Neural Networks: 0.994 AUC Gradient Boosting: 0.984 AUC Logistic Regression: 0.967 AUC Support Vector Machine: 0.990 AUC KNN: 0.840 AUC Naïve Bayes 0.887 AUC | - | train-test-validation split (percentage not mentioned), cross validation (folds not mentioned) | Dataset from “The Settlers Online (TSO)”, a freemium online strategy game, including 7439 users and 113,643 observed events. |
[21] | LightGBM | Hybrid Models (Neural Network with BiLSTM layers) | LightGBM: 0.690 AUROC, ~30 min Training Time Hybrid Models: 0.591 AUROC, ~2 h Training Time | - | train-test split (500,000 records-200,000 records) | Dataset from QuickBooks Online (QBO) users, including 700,000 combinations of users and reference timestamps. |
[38] | LightGBM | XGBoost, Gradient boosting, LASSO Regression, K-nearest-neighbors, AUTO-Arima | - | Performance indexed to LIghtGBM = 1.0× LightGBM: 1 SMAPE, 1 RMSE, 1 MAE XGBoost: ~1.10 SMAPE, ~1 RMSE, ~1.1 MAE Gradient Boosting: ~1.2 SMAPE, ~1.1 RMSE, ~1.2 MAE KNN: ~1.25 SMAPE, ~1.2 RMSE, ~1.25 MAE LASSO Regression: ~1.25 SMAPE, ~1.1 RMSE, ~1.2 MAE AUTO-Arima: ~1.75 SMAPE, ~5 RMSE, ~2.5 MAE | Not provided | Dataset collected from a well-known B2B SaaS company (number of data not mentioned) |
[29] | LASSO Regression, Dynamic probit model with copula corrections | - | - | LASSO Regression reduced MSE to 0.122 Dynamic probit model with copula corrections: −3841 Log-Marginal Density, 7808.5 Deviance Information Criterion | Not provided | Dataset from a U.S.-based multinational computer software company operating on a Software-as-a-Service business model including a sample of 14,989 unique consumers |
[30] | LASSO Regression | Random Forest, causal forest, XGBoost | - | LASSO Regression: +6.8% subscriptions XGBoost: +6.17% subscriptions Random Forests: Poor (overfitted training data) Causal Forests: Poor (minimal personalization) | train-test split (70% train—30% test) | Dataset from a fully randomized experiment involving 337,724 unconnected users globally |
[16] | Logistic Regression | Random Forest, XGBoost, Decision Trees, Support Vector Machine | Logistic Regression: 0.604 AUC Support Vector Machine: 0.603 AUC Random Forest: 0.594 AUC XGBoost: 0.599 AUC Decision Tree: 0.523 AUC | Logistic Regression: 1.682 TDL, 21,209 EMPB (EUR) Support Vector Machine: 1.590 TDL, 22,566 EMPB (EUR) Random Forest: 11.482 TDL, 15,106 EMPB (EUR) XGBoost: 1.360 TDL, 14,351 EMPB (EUR) Decision Tree: 0.856 TDL, 5809 EMPB (EUR) | cross-validation (folds not mentioned) | Dataset from a European software service provider including 3959 subscriptions |
[28] | XGBoost | Random Forest, Decision Trees, Logistic Regression, Support Vector Machine, K-nearest-neighbors, Naïve Bayes | XGBoost: 0.79 Accuracy, 0.8 Precision, 0.76 Recall, 0.78 F1-Score All the other methods where outperformed but their specific results were not provided | - | train-test split (80% train—20% test) | Dataset from Kaggle (fineTech_appData) including 50,000 rows of user information |
[17] | Random Forest | Logistic Regression | Random Forest: 0.09 Precision, 0.11 Recall, 0.10 F1-Score Logistic Regression: 0.05 Precision, 0.57 Recall, 0.19 F1-Score The proposed model was better at explaining churn drivers (feature importance) than precise prediction. | - | cross validation (folds not mentioned) | Dataset from Aircall, a Software as a Service company including data from about 5000 customers |
[36] | Random Forest, Gradient Boosting Machine | Logistic Regression | Logistic Regression: 1.00 Accuracy, 1.00 Precision, 1.00 Recall (due to dataset simplicity) Random Forest: 0.75 Accuracy, 0.714 Precision, 0.789 Recall Gradient Boosting Machine: 0.77 Accuracy, 0.753 Precision, 0.768 Recall | - | train-test split (80% train—20% test) | A simulation dataset replicating real-world Software as a Service (SaaS) usage patterns (number of data not mentioned) |
[32] | Logistic Regression | - | Logistic Regression: 0.9372 Accuracy, 0.9549 Precision, 0.9330 Recall, 0.9438 F1-Score, 0.999 AUC | - | Not provided | Datasets collected from four databases at Digidata, a SaaS company where the project was carried out (number of data not mentioned) |
Ref. | Visualizations |
---|---|
[14] | Accuracy curves to show the impact of tree counts in random forests |
[22] | AUC-ROC curves and confusion matrices to validate robustness |
[25] | Hazard rate curves and predictive intervals for retention statistics |
[13] | Feature importance (e.g., prevPeriodTrans) via bar charts |
[20] | Temporal sentiment plots track satisfaction trajectories |
[23] | ROC curves and feature plots that highlight “missed days” as top predictors |
[21] | (BBE-LSWCM) uses decile lift charts to show a 30% churn reduction in A/B tests |
[31] | Path analysis diagrams to map UI design to loyalty |
[33] | Time-series graphs to compare adaptive vs. non-adaptive negotiation modes |
[34] | Process maps that reveal key activation moment |
[37] | Trajectory plots that show activation impacts |
[16] | Coefficient plots to illustrate usage data’s impact on churn |
[35] | Dashboards to automate feedback summaries and issue prioritization |
[36] | Comparative plots that guide model selection via accuracy/recall metrics |
Ref. | Segmentation Groups | Description |
---|---|---|
[25] | Demographics (gender, geography) and engagement levels | Groups people according to demographics and engagement levels |
[23] | Activity patterns (e.g., “economy overview usage”) | Classifies players by activity patterns, using loyalty markers to target interventions |
[38] | Customer Lifetime Value (CLV) | Segments customers by CLV, focusing on prioritizing enterprise clients for retention |
[27] | B2B user personas (e.g., “Data Sellers”) | Creates user personas with pain points and workflow metrics to guide product development |
[28] | Behavioral clusters (e.g., education-focused users) | Identifies behavioral clusters, such as education-focused users, for targeted marketing |
[32] | Relationship length and cross-selling dependency | Segments by relationship length |
Ref. | Business Metrics |
---|---|
[18] | Cost savings per retained customer, revenue protection from churn reduction |
[14] | Dynamic pricing ROI |
[19] | Churn rates from early detection, acquisition cost reduction |
[25] | Revenue boost from collaboration-based monetization |
[8] | Negative MRR churn and CLV |
[21] | Churn reduction and intervention acceptance in A/B tests. |
[38] | Marginal ROI formulas |
[30] | Improved retention and revenue |
[26] | Quality investment costs against network-driven retention gains. |
[16] | Predictive accuracy against carbon emissions |
[36] | Customer satisfaction |
[32] | Country-specific retention differences |
Ref. | Focus | Data | Proposed Method | Output |
---|---|---|---|---|
[18] | Churn prediction | Usage behavior, customer profile, financial | Random Forest | Simulation/What-If Analysis, Business Impact Metrics, Model Deployment and Integration |
[24] | Churn prediction | Usage behavior | Hybrid Model (LSTM Hidden State) | Model Deployment and Integration |
[12] | Churn prediction | Usage behavior, customer profile, transactional/business metrics | XGBoost | Model Deployment and Integration |
[14] | Churn prediction | Usage behavior, transactional/business metrics | Random Forest | Visualizations, Business Impact Metrics |
[22] | Churn prediction | Usage behavior | XGBoost | Visualizations |
[19] | Churn prediction | Usage behavior, customer profile | Hybrid Model (SVM + Naïve Bayes) | Business Impact Metrics, Model Deployment and Integration |
[25] | Churn prediction | Usage behavior, customer profile | Cox Regression | Visualizations, Simulation/What-If Analysis, Segmentation and Persona Modeling, Business Impact Metrics |
[13] | Churn prediction | Usage behavior, customer profile, transactional/business metrics | Decision Trees | Visualizations |
[20] | Churn prediction | Customer support | Neural Networks | Visualizations |
[15] | Churn prediction | Usage behavior, customer profile, customer support | XGBoost | Model Deployment and Integration |
[8] | Churn prediction | Usage behavior, transactional/business metrics, customer profile | Neural Networks | Business Impact Metrics |
[23] | Churn prediction | Usage behavior | Random Forest | Visualizations, Segmentation and Persona Modeling |
[21] | Churn prediction | Usage Behavior, customer profile | LightGBM | Visualizations, Business Impact Metrics, Model Deployment and Integration |
[16] | Churn prediction | Usage behavior, transactional/business metrics, customer profile, customer support | Logistic Regression | Visualizations, Business Impact Metrics, Model Deployment and Integration |
[17] | Churn prediction | Usage behavior, transactional/business metrics, customer profile, satisfaction | Random Forest | Model Deployment and Integration |
[38] | Customer lifetime value | Usage behavior, transactional/business metrics, customer profile, financial | LightGBM | Segmentation and Persona Modeling, Business Impact Metrics |
[36] | Customer lifetime value | Transactional/business metrics, financial | Random Forest | Visualizations, Simulation/What-If Analysis, Business Impact Metrics, Model Deployment and Integration |
[29] | User engagement | marketing/trials | LASSO Regression | Simulation/What-If Analysis, Model Deployment and Integration |
[34] | User engagement | Usage behavior | Heuristic and Fuzzy Mining | Visualizations, Model Deployment and Integration |
[30] | User retention | Usage behavior, survey/interview, marketing | LASSO Regression | Simulation/What-If Analysis, Business Impact Metrics |
[26] | User retention | Customer profile, satisfaction | Reinforcement learning | Simulation/What-If Analysis, Business Impact Metrics |
[37] | User retention | Usage behavior, customer profile, survey/interview | Linear Mixed Models | Visualizations |
[31] | User satisfaction/user loyalty | Usage behavior, customer profile, survey/interview | PLS-SEM (Partial Least Squares Structural Equation Modeling) | Visualizations |
[33] | User satisfaction/user loyalty | Usage behavior, satisfaction | AQUAman negotiation mechanism | Visualizations |
[35] | User satisfaction/user loyalty | Usage behavior, customer support, satisfaction, survey/interview | Comparative analysis | Visualizations, Model Deployment and Integration |
[32] | User satisfaction/user loyalty | Usage behavior, transactional/business metrics, marketing | Logistic Regression | Segmentation and Persona Modeling, Business Impact Metrics |
[27] | User segmentation | Usage behavior, survey/interview | UMAP and HDBSCAN | Segmentation and Persona Modeling |
[28] | User segmentation | Usage behavior, transactional/business metrics | XGBoost | Segmentation and Persona Modeling |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Charizanis, G.; Mavridou, E.; Vrochidou, E.; Kalampokas, T.; Papakostas, G.A. Data-Driven Decision Support in SaaS Cloud-Based Service Models. Appl. Sci. 2025, 15, 6508. https://doi.org/10.3390/app15126508
Charizanis G, Mavridou E, Vrochidou E, Kalampokas T, Papakostas GA. Data-Driven Decision Support in SaaS Cloud-Based Service Models. Applied Sciences. 2025; 15(12):6508. https://doi.org/10.3390/app15126508
Chicago/Turabian StyleCharizanis, Gerasimos, Efthimia Mavridou, Eleni Vrochidou, Theofanis Kalampokas, and George A. Papakostas. 2025. "Data-Driven Decision Support in SaaS Cloud-Based Service Models" Applied Sciences 15, no. 12: 6508. https://doi.org/10.3390/app15126508
APA StyleCharizanis, G., Mavridou, E., Vrochidou, E., Kalampokas, T., & Papakostas, G. A. (2025). Data-Driven Decision Support in SaaS Cloud-Based Service Models. Applied Sciences, 15(12), 6508. https://doi.org/10.3390/app15126508