Next Article in Journal
Exploring Coastal Tourism Experience Through Social Media Text Mining: Sentiment and Thematic Patterns
Previous Article in Journal
A Structured Review of IoT-Based Embedded Systems and Machine Learning for Water Quality Monitoring
Previous Article in Special Issue
Exploratory Research on the Potential of Human–AI Interaction for Mental Health: Building and Verifying an Experimental Environment Based on ChatGPT and Metaverse
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Maximizing Social Media User Engagement Through Predictive Analytics in Retail Tourism: Identifying Key Performance Indicators That Trigger User Interactions

by
Prokopis K. Theodoridis
and
Dimitris C. Gkikas
*
School of Social Sciences, Hellenic Open University, 26335 Patras, Greece
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(21), 11720; https://doi.org/10.3390/app152111720
Submission received: 24 June 2025 / Revised: 20 July 2025 / Accepted: 1 October 2025 / Published: 3 November 2025

Featured Application

Retail digital marketing strategies have become essential in boosting social media engagement, brand awareness, and revenue in the tourism industry. This study combines marketing and information sciences to analyze online user behavior and engagement on social media pages operated by retail stores in tourist areas, based on key performance indicators. It utilizes descriptive and predictive statistics to identify user behavior patterns, enabling marketers to forecast engagement and mitigate risks associated with decision-making. Behavioral data is collected from social media retail pages targeting tourists. Various analytics models, including linear regression, Random Forests, Extreme Gradient Boosting, K-nearest neighbors, and Naïve Bayes, provide insights into predicting social media user engagement. This approach can be adapted for use in tourism marketing, digital strategy, and e-commerce, enabling marketers to proactively design, develop, and launch social media campaigns tailored to tourism consumers.

Abstract

This study examines and evaluates key performance indicators (KPIs) that impact user engagement on social media platforms, with a primary focus on fashion retail within seasonal tourism contexts. The primary objective is to determine which engagement metrics most accurately predict user interaction levels and to enhance strategic decision-making in digital marketing. Using a dataset of 2500 Facebook photos and videos from a women’s retail store, collected between 2016 and 2024, the study employs descriptive analysis and predictive modeling. Three KPIs—such as 3 s video views, reach from organic posts, and other clicks—are examined for their impact on user engagement. The posts are categorized into engagement levels, and classification models, including Random Forests (RF), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), and Naïve Bayes (NB), are evaluated. Results show that short video views and post reach are key predictors of user engagement. With XGBoost achieving a classification accuracy of 94.73%, the models perform effectively, and Cronbach’s alpha analysis confirms the consistency among the variables selected. The findings underscore the significance of KPI analysis in social media strategy and illustrate the value of data mining techniques in uncovering user behavior patterns that offer practical insights for optimizing digital marketing efforts.

1. Introduction

Social media platforms have revolutionized the digital marketing landscape and user engagement within the evolving e-commerce environment. The global e-commerce market is characterized by intense competition, advanced technological infrastructure, and rapidly changing social media platforms. These complexities create challenges for marketers, but they also present vital opportunities for data-driven growth and customer-focused strategies [1]. Consumers demand personalized, real-time solutions and services, and businesses increasingly adopt artificial intelligence (AI) technologies to respond efficiently [2]. AI-based systems significantly help brands track and analyze user behavior in real-time, target audiences, personalize content, and refine recommendations. These tools shape consumer decisions by offering products and services tailored to users’ needs and preferences [2,3]. As a result, AI is used to predict consumer behavior and shape purchasing experiences in data-driven sectors like online retail and e-commerce [3,4].
User engagement has become both a key performance metric and a measure of value shared between users and platforms [4]. The application of AI effectively alters how consumers engage and interact with brands on social media, transforming the structure of digital marketing strategies and enabling decision-makers to understand behavioral factors [5].
Figure 1 compares the compound annual growth rate (CAGR) of global e-commerce to total retail from 2018 to 2028. It shows that e-commerce consistently surpasses total retail growth, with a significant spike between 2020 and 2024, indicating the pandemic and post-pandemic period (Figure 1) [6].
Machine learning, data mining, and predictive analytics tools are increasingly used to model user behaviors, enabling digital marketers to predict consumer actions and improve prediction accuracy [7].
However, due to the General Data Protection Regulation (GDPR), companies face restrictions and limitations in extracting user data, leading to the development of advanced methods to extract meaningful insights from anonymous or private datasets [8].
User engagement refers to the time users actively interact with a website or app interface and services, indicating both attention and interest levels [9,10,11].
These behavioral records, such as the time spent scrolling, viewing, or clicking, provide valuable insights for deploying personalized marketing strategies and targeting segmented audiences more effectively [12,13]. Digital platforms, including Facebook, Instagram, TikTok, YouTube, and Pinterest, have transformed the way consumers access products and services, creating a comprehensive behavioral data ecosystem [14].
These platforms enable brands to gather extensive datasets on users’ profiles, choices, interests, behaviors, and preferences, generate insights into user journeys, and support the delivery of personalized content [15]. Data segmentation strategies enable marketers to align user behavior with their brand identity, thereby enhancing personalization and relevance in consumer interactions [16].
At the same time, the growth of social media marketing (SMM) has boosted user engagement and increased competition among brands. SMM helps organizations connect with their audiences, enhance online identity and visibility, drive website traffic and interaction, and influence purchasing decisions and loyalty [17].
Therefore, evaluating the effectiveness of digital content, such as photos and video posts, becomes a crucial part of the marketing strategy. This study explores how users interact with organic images and video posts on a fashion retail brand’s Facebook business page. Three Facebook key performance indicators (KPIs), including “3 s video views from organic posts,” “reach from organic posts,” and “other clicks,” are used as independent variables to predict user engagement, with “engaged users” as the dependent target. The dataset includes 2500 posts published between 2016 and 2024. Extensive data pre-processing and analysis, including descriptive statistics, regression modeling, and data mining classification techniques, have been conducted [18].
The choice of these specific KPIs—3 s video views, organic reach, and other clicks—is based on their everyday use as performance benchmarks in both academic research and digital marketing industry standards. While broader metrics, such as Time Spent on Site or Page Views, exist, this study emphasizes interaction-related indicators that most accurately reflect active user engagement within the Facebook platform ecosystem. These KPIs relate to user actions and are recognized in previous engagement and SMM studies as pointing to substantial predictive value for user interaction behavior. The period from 2016 to 2024 was selected to encompass a range of consistent and changing social media usability patterns, including pre-pandemic and post-pandemic shifts in user behavior, as well as Facebook’s platform algorithm changes during this time. Additionally, the data are proprietary assets for which we obtained specific licenses to access.
“Engaged Users” refers to the dependent variable, which shows the number of users who interacted with a post through specific actions, such as clicks, reactions, comments, or shares. This metric reflects active user behavior beyond reach or impressions. In this study, engagement was categorized into three levels (low, medium, high) according to post-specific interaction thresholds generated from the dataset’s distribution.
Engagement Theory, Social Exchange Theory, and digital consumer behavior models provide essential theoretical foundations. Engagement Theory emphasizes how interaction features, such as views and clicks, serve as triggers for user involvement.
Social Exchange Theory views engagement as a mutually beneficial relationship in which users invest time and attention, expecting to receive informational or emotional value in return [19,20,21].
Digital Consumer Behavior Models establish KPIs, like reach and view duration, which are indicators of behavioral intent. These frameworks show the importance of KPIs as predictive factors. The reliability of these KPIs was confirmed through Cronbach’s alpha analysis [19,20,21].

2. Related Works

2.1. Social Media Marketing and Consumer Engagement

Social media marketing (SMM) has transformed the landscape of consumer behavior, user interaction, and company communication. Businesses are adopting new technologies and utilizing digital platforms to develop their promotional strategies, with a focus on branding, customer segmentation, sales growth, user experience, user satisfaction, public relations, and consumer loyalty [18,19,20,21]. Different interaction models, including B2B, B2C, C2B, and C2C, have adopted new methods of transparent and dynamic communication. Online word-of-mouth (WOM), driven by user-generated content, effectively affects consumer behavior and purchasing decisions [20,21,22,23].

2.2. Digital Platforms and User Behavior in Retail

Marketers are shifting toward mobile-first strategies by integrating social platforms and apps to enhance user targeting and predict churn. Facebook business pages, in particular, serve as valuable sources of behavioral data, enabling brands to analyze content performance based on factors such as post type, category, timing of publication, and user engagement [23,24,25]. In the fashion retail industry, where younger and more tech-savvy consumers are prevalent, platforms like Instagram and Pinterest enhance engagement through visual appeal and interactive content [26].

2.3. Data Mining and Machine Learning in Marketing Analytics

To interpret large user-generated datasets, marketers utilize supervised learning models, including K-Nearest Neighbors (KNN), Naïve Bayes (NB), Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Decision Trees (DTs) [12,27,28,29,30,31,32,33]. Studies comparing these models reveal the trade-offs in accuracy, complexity, and interpretability. However, their use in niche markets, such as seasonal tourism and fashion, is largely unexplored.

2.4. Model Performance and Evaluation Challenges

Model performance metrics, such as precision, recall, and F-measure, can yield misleading results if their limitations are overlooked, particularly in complex and non-linear domains like social media behavior [30,31,32,33,34]. Recent studies have shown the importance of statistically validating results and ensuring interpretability when exploring user engagement patterns [35,36,37,38,39].

2.5. AI in SMM for Tourism and Fashion Retail

In the tourism and fashion retail sectors, AI applications are being increasingly utilized to personalize user experiences and promote sustainable practices. Studies reveal how AI agents enhance interaction and targeting [40]. Magableh et al. (2024) and Jankovic and Curovic (2023) confirm that AI integration can boost sustainable financial performance and human-centered engagement strategies [41,42].

2.6. Ethical Dimensions of AI-Based Engagement

Ethical concerns, such as transparency, trust, and personalization, in AI-driven marketing are increasingly being addressed. Researchers highlight the potential of chatbots and recommendation systems to influence digital impulse buying and satisfaction. However, they warn against obscure systems and emphasize the importance of transparency, user consent, and fairness [43,44,45,46,47]. These concerns underscore the need for trustworthy, explainable machine learning models that facilitate human–AI interaction.

2.7. Research Gap and Study Positioning

While previous studies have explored machine learning applications in engagement prediction, few studies focus on real-world use cases in fashion retail within seasonal tourism contexts. This research fills that gap by using predictive analytics on actual Facebook business data. By analyzing engagement with KPIs such as post reach, 3 s views, and other clicks, this study aims to provide actionable insights for optimizing SMM strategies and improving user satisfaction [48,49,50,51].

2.8. Research Objectives

The refined research objectives are as follows: RO1 aims to identify the most influential key performance indicators (KPIs), including 3 s video views, organic reach, and other clicks that greatly affect user engagement on social media. RO2 focuses on segmenting user engagement into different categories based on observable behavioral patterns and post-interaction data from the Facebook business page. RO3 evaluates and compares four supervised classification algorithms, such as Random Forests (RF), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), and Naïve Bayes (NB), to predict engagement levels. Finally, RO4 provides strategic insights by turning classification results into practical marketing recommendations for content optimization and audience targeting. These steps support the overall aim of the study and are summarized in Table 1.

2.9. Hypothesis Testing

To support the research hypotheses, this study uses a series of theoretical frameworks related to Engagement Theory, Social Exchange Theory, and Digital Consumer Behavior Models. According to Engagement Theory, meaningful and active user participation, along with interaction, relevance, and added value, improves users’ cognitive skills and emotional engagement in digital environments. Social Exchange Theory explains user engagement as a reciprocal and mutually beneficial interaction, where users are more likely to participate when the advantages outweigh the costs of their attention or effort.
Insights from Digital Consumer Behavior Models indicate that measurable actions, such as video views, post clicks, and organic reach, serve as indicators of deeper engagement intentions. These frameworks provide the foundation linking the proposed KPIs—such as clicks, video views, and reach—with measurable user engagement outcomes in SMM practices. The methodology below explores how social media KPIs influence user engagement [15,16,17].
There are three hypotheses linking social media KPIs to user engagement. Each test has certain factors (e.g., organic reach, video views) related to user engagement. The “other clicks” refer to page name clicks, profile page clicks, and like counts (Table 2) [38].

3. Research Methodology

3.1. Research Scope

The methodological design used in this research closely aligns with the earlier introduced theoretical frameworks. The study uses descriptive and predictive statistical models to identify potential patterns based on user behavior data and engagement metrics, including KPI such as short video views and other clicks. It emphasizes the importance of integrating artificial intelligence (AI) into SMM by analyzing the factors that influence user behavior and applying predictive analytics in marketing.
Figure 2 illustrates the key steps in this research methodology for developing decision-making rules, encompassing problem definition, data collection, data cleaning and labeling, data analysis, regression analysis, model training and selection, evaluation, and deployment.
The study provides marketers with specific guidelines on how to enhance user engagement through organic social media posts. Its primary focus is on understanding the relationships among KPIs and providing recommendations to help decision-makers optimize SMM strategies. The presentation of results provides a clear overview of social media insights, including both descriptive and predictive statistics [39,52,53].
The authors aim to narrow the gap between marketers’ knowledge and actual user behavior on social media. The data collected includes both organic and paid posts, focusing on engagement metrics such as the target variable “engaged users” (dependent variable), as well as “3 s video views from organic posts,” “reach from organic posts,” and “other clicks” (independent variables) (Table 3).
Linear regression analysis and classifiers’ assessment include RF, XGBoost, KNN, and NB. A series of factors, including the nature of data, data features, and the context of the research field (classification, clustering, regression, etc.), influence the performance of the algorithms for the current dataset. The case study of applying these models in real-world scenarios provides marketers with the opportunity to explore tools that are more likely to yield highly accurate results [39,52,53]. Specific performance metrics, such as classification accuracy, Mean Squared Error (MSE), r-squared (R2), root mean squared error (RMSE), precision, recall, F1-score, and area under the curve (AUC), are measured to measure prediction errors and assess classification model performance by quantifying correctness, completeness, balance, and discrimination ability [34,35,36,38].

3.2. Research Design and Objectives

The aim is to establish rules for marketers to boost user engagement and generate a cycle of re-engaging clients (Figure 3) [54,55,56,57,58].
Following a comprehensive five-stage breakdown of a flowchart diagram, this study employs descriptive and predictive statistical analysis to better represent the KPIs influencing user engagement on social media business pages. The methodology includes descriptive statistical tests, regression analysis, and performance assessments of predictive models. The data pre-processing stage involves cleansing, normalization, handling missing data, and splitting the data into training sets (70%) and testing sets (30%). The training and test data are processed before being used by data mining classifiers (Figure 3).
A raw dataset refers to the stage of collecting user engagement data from the Facebook Business Analytics platform over a selected period, including a set of KPIs. The pre-processing phase examines potential correlations among variables. Linear regression shows which factors influence users’ engagement.
RO1 involves identifying KPIs such as 3 s video views, organic reach, and other clicks that influence user engagement. RO2 aims to categorize engagement classes based on post interactions. RO3 evaluates the performance of predictive models, including RF, XGBoost, KNN, and NB. Model selection and data splitting involve choosing data mining models, such as these, which are evaluated using metrics like classification accuracy (ACC), precision, recall, F1-score, and the area under the curve (AUC). RO4 offers strategic insights to enhance digital campaign performance. A summarized data analysis reveals hidden information and behavioral patterns. Predictions are presented in the final section of the study, including a set of extracted rules and a visual representation of the results.

3.3. Data Collection and Pre-Processing

3.3.1. User Profile

The selected fashion retail Facebook page is run by a business targeting Greek tourist spots on the mainland and two islands. The primary audience for this Facebook fashion page is primarily female users aged 25 to 44 who are very interested in seasonal collections, promotions, and social shopping features. These users typically engage with content such as videos, promotional posts, and click-through offers. Their purchasing decisions are quick and influenced by visual appeal, influencer posts, and community feedback, making them a good target for predictive engagement analysis. This group was chosen because the platform sees high organic traffic during tourist seasons and the brand’s connection to quick consumption by tourists. It offers an excellent opportunity to study user behavior in a tourism retail setting.

3.3.2. Data Acquisition

The dataset is collected from the Facebook fashion retail business page for an online and physical clothing store. Data spans eight years, from 1 January 2016 to 31 December 2024. It includes 2500 instances of Facebook post engagement metrics from 2016 to 2024.

3.3.3. Data Pre-Processing and Preparation

Data is extracted and analyzed using Microsoft Excel (Microsoft, Redmond, WA, USA). The statistical analysis is conducted with SPSS V28 (IBM, Armonk, NY, USA). Descriptive statistics, performance metrics, normality testing, model configuration, and class segmentation are performed to address the formulated hypotheses. Weka3 version 3.9 software and Python libraries (Matplotlib 3.10.0, NumPy v 2.3.0, and Seaborn v.0.13.2) are employed to evaluate data mining models and visually present the results [59,60,61,62].
Data is collected, cleaned, analyzed, and segmented. All variables are tested for potential relationships with the dependent target variable (e.g., engaged users). According to Facebook Insights, “engaged users” are the total number of unique users who engaged (clicked, liked, shared, or commented) with a post. This metric captures active interactions with content and is widely used as a benchmark for evaluating the effectiveness of posts in academic and business settings.
Only the moderate and strong correlations are maintained and interpreted as the best predictors. The descriptive statistics results are presented. The number of sessions is segmented into different levels of user engagement. The Count denotes the number of posts, while the Mean indicates the average value of the KPIs. Standard Deviation reflects the variability in the KPIs. Min/Max represents the minimum and maximum values for each KPI. Missing Values indicates the number of posts with zeros, missing data, or very low variance. Columns without significant data fields have been removed. The descriptive statistics analysis reveals that the low average values and high variance of all KPIs, particularly “3 s video views” and “reach,” indicate similar distributions (Table A1).

3.4. Descriptive and Correlation Analysis (RO1)

3.4.1. Normality Test

A normality test was performed to select the appropriate correlation analysis method. A p-value of 0.05 indicates strong evidence against the null hypothesis. It suggests a deviation from a normal distribution, leading to the recommendation of non-parametric methods for the statistical analysis. Shapiro–Wilk tests are carried out to determine whether parametric or non-parametric correlation analysis is suitable for evaluating the relationships between user engagement and other variables. The Shapiro–Wilk normality test is applied to five KPIs in the study, such as “engaged users,” “3 s video views from organic posts,” “reach from organic posts,” and “other clicks.” For each variable, the normality test statistic indicates whether the variable follows a normal distribution, while the p-value indicates the statistical significance of the result. All current KPIs have p-values below the 0.05 threshold, indicating they do not follow a normal distribution. These results suggest that non-parametric statistical methods should be used in the analysis, including Spearman’s correlation to examine the relationships among the KPIs (Table 4) [54].

3.4.2. Cronbach’s Alpha

To examine the internal consistency of the user engagement metrics, a Cronbach’s alpha analysis was conducted using the KPIs of “3 s video views from organic posts,” “reach from organic posts,” and “other clicks”. The value of 0.99 indicates a high degree of reliability for further statistical modeling. A session refers to continuous user activity on the platform for approximately 30 min (Table 5).

3.4.3. Linear Regression

Linear Regression is included in this research to address situations where a linear relationship between factors influencing predictions and user engagement is expected. It provides clear and straightforward estimates of each feature’s impact on the target. Linear Regression models the linear connection between one dependent variable and multiple independent variables. It offers interpretable coefficients and works well when variables have linear relationships. The Shapiro–Wilk test confirmed that the KPI values are not normally distributed; therefore, Spearman’s rank correlation was selected to evaluate the relationships between variables. The current approach emphasizes interpretability and practical explanations for marketers by focusing on direct, multivariate correlations between KPIs and “engaged users.” The selected models serve as a complementary analysis, enabling their application in social media analytics, where behavioral patterns coexist [55,56,57,58,63].

3.5. Engagement Classification (RO2)

Users’ engagement is categorized into three classes based on the level of interactions within completed customer sessions created by online users, considering the total recorded instances. These categories are labeled “high engagement,” “medium engagement,” and “low engagement”. The engagement classes were determined using a quantile distribution of the dataset’s “engaged users” metric. Low engagement (0–1) indicates posts with minimal or no user interactions. Medium engagement (2–10) corresponds to posts that elicited limited engagement. High engagement (≥11) signifies posts that generated maximum engagement. This classification provides a straightforward and practical way to categorize posts, enabling valuable insights for more targeted SMM strategies.
The summary shows a distribution where engagement level is defined by the number of sessions per post: “low engagement” (1506 sessions), “medium engagement” (678 sessions), and “high engagement” (316 sessions), out of a total of 2500 (Table 6).
Figure 4 illustrates how a variable can be categorized into low, medium, or high engagement levels. The median value rises as engagement level increases, indicating that posts with fewer video views tend to have higher engagement. The distribution is wider for low-engagement posts and becomes narrower for high-engagement posts, indicating a positive relationship between “3 s video views from organic posts” and increased engagement. The “reach from organic posts by engagement class” demonstrates that posts with higher organic reach are more likely to fall into the medium-to-high-engagement categories. The distribution is broad for low-engagement posts and narrow for highly engaging ones, further indicating a positive correlation between organic reach and higher engagement. The “other clicks” are clustered near zero for low engagement but shift upward as engagement increases. Outliers in the high-engagement group suggest that some posts tend to generate exploratory interest.

3.6. Predictive Modeling (RO3)

The model selection process was based on the dataset’s structure. Therefore, Random Forest and XGBoost were chosen for their ability to handle multicollinearity and imbalanced classes. KNN was selected for its strong performance on small, structured datasets. Naïve Bayes was also chosen for its simplicity and efficiency with nominal data.
Random Forest (RF) is an ensemble learning technique that builds multiple decision trees and combines their outputs to form a single prediction. It efficiently captures non-linear relationships and feature interactions. RF was chosen for its robustness in handling structured social media metrics and class imbalance. It performs exceptionally well with datasets that include numeric and categorical variables and have non-linear predictor relationships. RF also effectively addresses overfitting. This study selected RF due to its minimal assumption requirements, efficiency with imbalanced datasets, ability to generate reliable classification results, and interpretability through variable importance metrics [55,56,57,58,64,65].
Furthermore, XGBoost (Extreme Gradient Boosting) is a scalable boosting model known for its highly predictive performance and ability to identify complex feature interactions. XGBoost provides high accuracy and robustness in user engagement classification. This research also utilizes it as an efficient and scalable application of gradient boosting decision trees. XGBoost excels at predictive tasks, particularly with tabular datasets, where accuracy, computational efficiency, and control over overfitting are essential. It also supports parallelized tree boosting, making it particularly suitable for managing complex datasets. Additionally, XGBoost can detect feature interactions and handle unbalanced classes [55,56,57,58,64,65].
K-Nearest Neighbors (KNN) is utilized for its efficiency in managing local data clusters, which aids in pattern recognition, particularly in small or non-parametric datasets, such as those employed in the study. The KNN classification model is also simple to understand and implement. KNN performs exceptionally well with small datasets where the computational effort to find several k-neighbors remains low. It classifies data based on the most common class among its nearest neighbors, making it a reliable choice for problems where simplicity is key. It works effectively when a straightforward linear or non-linear model cannot sufficiently define the decision boundary. By classifying records according to the direct values of labeled examples, it is easy to deploy. Therefore, it is suitable for datasets where the relationship between features and the target variable is complex and difficult to predict or classify using parametric models [55,56,57,58,64,65].
NB is recognized for its computational efficiency and strong performance in high-dimensional spaces. Despite assuming attribute independence, NB performs well in real-world applications where these attributes are partially dependent, and its ability for rapid classification makes it a suitable choice. Naive Bayes is a type of supervised learning, meaning the model is trained with labeled data. NB is a straightforward classification algorithm designed to handle large datasets for real-time prediction scenarios. It also excels in processing high-dimensional data, where the number of features is significantly larger than the number of data instances. NB is also effective for categorical data and is frequently used in text mining, which could be relevant for future research on social media sentiment analysis. [21,55,56,57,58,63,66,67,68].
Data are processed using Python libraries and WEKA 3, ensuring effective data handling, training, and testing of the models. To evaluate the model’s generalizability, the dataset was split into a 70% training set and a 30% testing set using sampling to maintain class proportions. Over 60% of low-engagement posts create a class imbalance; thus, performance metrics included precision, recall, and F1-score measures [55,56,57,58]. Future studies could implement k-fold cross-validation or SMOTE-based balancing to further improve classifier performance.

Data Pre-Processing and Preparation

MSE measures the average of the squared differences between predicted and actual values. Lower values indicate better model performance. RMSE is the square root of MSE, providing error in the same units as the original variable. Lower values indicate higher accuracy. R2 represents the proportion of variance in the target variable explained by the model. Values closer to one indicate a better fit. MSE (Mean Squared Error) measures the average squared differences between predicted and actual values.
Precision, Recall, and F1-score metrics evaluate correctness, completeness, and balance. Classification accuracy refers to the overall number of correct predictions. Precision measures the ratio of correctly classified instances to the total cases. AUC, or Area Under Curve, assesses a model’s ability to discriminate across classes. Recall indicates the ratio of correctly predicted user engagement instances over all actual positive engagement instances. F1-score is the harmonic mean of precision and recall. The higher these values, the better the model’s performance. Model key characteristics are summarized in Appendix B (Table A2) [6].

3.7. Insights for Digital Campaign Optimization and Visualization (RO4)

The best-performing classifier was employed to create practical classification rules. The results, presented via box plots, scatterplots, and normality tests, help inform marketing strategies. These insights are designed to aid data-driven, actionable decisions in fashion retail social media marketing. The following section presents the analytical results of correlation testing, regression modeling, and classification modes.

4. Results

4.1. Descriptive and Correlation Analysis (RO1)

The following results directly relate to the research objectives and hypotheses outlined earlier. The Spearman correlation test uses the “engaged users” variable as the primary focus, indicating the strength and direction of relationships between selected variables and the “engaged users” group. Each row includes the variable name, the correlation coefficient (ρ), and the p-value indicating statistical significance.
The results for “3 s video views from organic posts,” “reach from organic posts,” and “other clicks” showed strong correlations with “engaged users,” suggesting a statistically significant positive relationship and that [54]: (1) Short video content exposure can trigger bigger user engagement. (2) The role of organic reach in optimizing engagement indicates that organic results significantly attract users’ attention and increase users’ interaction. (3) The user’s interactions beyond direct content are predictive of user engagement (Table 7).
Scatterplots illustrate confirmed links between the “engaged users” and the independent features. Figure 5 illustrates a positive relationship between short videos, increased organic reach, and clicking on other links, all of which led to higher user engagement. This suggests that as user actions increase, engagement also increases [64,65,69,70,71,72].
Figure 5 presents three scatterplots that visualize the bivariate relationships between each Key Performance Indicator (KPI) and the dependent variable, “Engaged Users.” These visualizations serve to illustrate the correlation strength and direction between independent engagement metrics and actual user interaction levels. Each subplot demonstrates the following relationships:
  • Three-second Video Views vs. Engaged Users.
This plot highlights a strong positive linear correlation, implying that as the number of 3 s organic video views increases, the number of engaged users also increases. This supports Hypothesis 1 (H1), which states that short video views lead to increased user engagement, confirming that visual content leads to more user interactions.
2.
Organic Reach vs. Engaged Users
Although organic reach is positively correlated with user engagement, the distribution appears to be more diffuse compared to video views. This implies that reach alone does not necessarily lead to engagement, as some posts may be widely viewed but still fail to receive a sufficient number of interactions. These visual results align with the regression results, which show that organic reach had a relatively weaker predictive impact.
3.
Other Clicks vs. Engaged Users
This plot also demonstrates a strong positive correlation, supporting hypothesis (H3). As the number of other clicks increases, the number of engaged users also increases. This depicts user interest beyond just the post itself, implying a deeper level of user interaction with the brand.
The Spearman correlation analysis results in Table 7 support all three scatterplots. The visual evidence supports the conclusion that active, user-triggered metrics, such as views and clicks, serve as indicators of user engagement rather than passive exposure metrics, like post reach.

Linear Regression Results

Linear Regression predicts the value of a dependent variable based on one or more independent variables. The Mean Squared Error (MSE) calculates the average of the squares of the errors, representing the average squared difference between the predicted and actual values. Root Mean Squared Error (RMSE), or the square root of MSE, adjusts the error metric to the scale of the original values, making it easier to interpret. The coefficient of determination (R2) indicates the extent to which the independent variables explain the variability of the dependent variable (Table 8) [59,60,61,62,64,65,69,70,71,72].
Table 9 shows the linear regression performance predicting “engaged users” based on each independent feature. It displays the regression coefficients from the Linear Regression analysis for each factor. It implies the expected change in “engaged users” associated with a one-standard-deviation increase in that variable, assuming all other variables remain constant. Positive coefficients indicate that an increase in these attribute values correlates with a higher user interaction rate.
However, the coefficient for “reach from organic posts” is negative, implying that greater reach without actual user interactions does not necessarily lead to increased engagement. This highlights the difference between passive exposure and active participation. The linear regression model provides decision-making guidelines that can help update content strategies by identifying which behaviors and content types manage to influence engagement.
In Figure 6, the linear regression bar chart displays the standardized regression coefficients predicting user engagement. Each bar represents an independent variable, including the “3 s video views from organic posts,” “reach from organic posts,” and “other clicks.” The length and direction of each bar indicate the strength and direction of its impact on the value of “engaged users” (Figure 6).
Positive coefficient values indicate that when an attribute’s value increases, the number of “engaged users” is expected to rise, assuming other variables remain constant. Conversely, negative coefficient values show an inverse relationship, where an increase in the feature’s value is associated with a decrease in user engagement.
The “3 s video views from organic posts,” “reach from organic posts,” and “Other clicks” with high positive coefficients are seen as the strongest positive predictors of user engagement. Therefore, the shorter the video, the more user interactions tend to increase. They are also positively correlated but serve as smaller predictors of user engagement.

4.2. Classifiers Performance Assessment (RO 2,3,4)

4.2.1. Classification Accuracy

Table 10 and Figure 7 display the performance results of the selected classifiers. They provide a comparative summary of the classification accuracy scores of RF, XGBoost, KNN, and NB, including metrics such as precision, recall, and F1-score, all shown as percentages. Classification accuracy (acc) indicates the percentage of correctly classified instances out of the total cases. Precision measures the proportion of accurate optimistic predictions among all predicted positives, showing how well each classifier avoids false positives. Recall or sensitivity evaluates the proportion of correctly predicted positive cases out of all actual positives. F1-score is the harmonic mean of precision and recall, offering a balanced performance measure when false positives and negatives are critical.
Among the classifiers, XGBoost demonstrated strong performance, with an accuracy nearly equal to RF and higher than KNN and NB. It achieved the highest classification accuracy across all metrics, indicating it as one of the most reliable models for predicting user engagement classes. Although RF, KNN, and NB are computationally efficient, their classification accuracies were lower. Table 10 summarizes the strengths and scores of each model, supporting decision-making in social media strategies (Figure 7) [59,60,61,62,64,65,69,70,71,72].

4.2.2. Confusion Matrices

Figure 8 shows the confusion matrices comparing the classification results of the RF, XGBoost, KNN, and NB models. It displays the number of correct and incorrect predictions for each engagement level: low, medium, and high. The cells along the diagonal represent accurate predictions, while the other cells indicate incorrect ones.
The XGBoost confusion matrix indicates a strong overall accuracy, with most posts correctly predicting instances of the low-engagement class. This suggests that XGBoost tends to favor the dominant class, thereby avoiding imbalance issues and increasing precision for the low-engagement class.
This approach creates a liability in predicting less common engagement classes. NB models complement this by recognizing patterns in the minority class and often produce falsely optimistic predictions across most classes.
Although the RF, KNN, and NB confusion matrices show lower classification accuracy, they also display a higher sensitivity to medium and high engagement classes. This provides a better balance in detecting all classes, albeit with a higher error rate. There is a trade-off between accuracy and classification sensitivity, which, when combined, can target different areas based on social media strategy goals (e.g., generic prediction versus personalized marketing).

4.3. Rule-Based Suggestions

The following recommendations can help decision-makers and marketers prioritize KPIs that drive user engagement and enhance organic posts and content strategies for improved performance in SMM campaigns.
According to Table 11, this study has helped develop a set of derived recommendations for marketers and decision-makers. These recommendations are based on linear regression and classification models. Each suggestion emphasizes the importance of KPIs in tracking the number of engaged users on a Facebook business page.
The Linear Regression model has helped generate the following suggestions:
Short video posts, organic reach from organic posts, and clicks to content—such as clicks on the page name, profile page, people’s names in comments, the like count, or timestamps—indicate that they are highly effective in increasing user engagement. They can easily be interpreted as signs of user engagement.
XGBoost classifier contributes to generating the following suggestions:
  • Posts that are similar to successful previous posts tend to generate the same level of user engagement.
  • Social media strategy should emphasize short videos and interactive call-to-action content (links, clickable text) to boost the chances of being classified as highly engaging.
  • XGBoost-based models provide a prediction of a scheduled post’s performance after posting.
  • Reach from organic social media posts does not necessarily lead to increased user engagement, click-through rates, or improved conversion optimization.

5. Discussion

The findings are based on previous engagement research discussed in the Related Works section. Consistent with earlier studies [23,24,26], the current results indicate that content characteristics are key factors in user engagement. Specifically, in fashion social media [26], short videos (such as 3 s clips) boost a post’s reach or impressions, and click-through interactions are among the most influential factors for user engagement on Facebook [23,24]. This consistency with previous research suggests that engagement is more strongly driven by user-triggered interactive content than by reach or impression metrics, especially when targeting young users [26].
This study, consistent with previous research on social media analytics [12,27,28,29,30,31,32], shows that the XGBoost classifier achieved the highest classification accuracy in user engagement levels, demonstrating XGBoost’s efficiency with structured, small datasets. XGBoost outperformed the other classifiers (≈94.73%), confirming that instance-based learning performs exceptionally well in this domain [12,27,28,29,30,31,32]. These findings contrast with results from different contexts (e.g., high-dimensional fraud detection) where NB can outperform KNN, highlighting that optimal model choice heavily depends on the specific domain [12,27,28,29,30,31,32]. The NB model, although less accurate, demonstrated higher sensitivity in detecting low-engagement classes, indicating that the caution expressed in previous research about relying solely on broad performance metrics in complex social media cases is justified [30,31,32,33,34].
The current results indicate that specific KPIs, including video views, user reactions, and the type of social media post, have a significant influence on user engagement. XGBoost achieved the highest classification accuracy for user engagement levels due to its ability to infer results from small and supervised datasets [48]. RF also performed well, especially in handling imbalanced classes, consistent with previous research that shows its sensitivity to probabilistic cases of feature distributions [44,45]. These findings also agree with earlier studies on social media user engagement and classification accuracy using machine learning models [46,47].
The analysis of user engagement revealed that organic reach, 3 s video views, and other clicks have a significant impact on engagement class predictions. The Linear Regression results showed that reach and other clicks possess predictive value, in line with previous studies that emphasize their role in user interaction metrics [21,24].
In comparison to related research in fashion retail and tourism marketing, our results confirm and extend the findings of earlier studies. For example, Jankovic and Curovic (2023) noted that digital consumer engagement can be effectively modeled using simple performance indicators, such as views and clicks, especially when personalized content is involved [42]. Our results complement this by showing how these variables not only relate to engagement but also act as reliable predictors in classification models.
Machine learning models, such as Random Forests and XGBoost, outperformed simpler classifiers (e.g., Naïve Bayes and KNN), in line with earlier studies by Kaur and Kumari (2020), who noted that ensemble methods are more robust in social media environments with non-linear user behavior patterns [35]. Likewise, Xia et al. (2024) highlighted the adaptive nature of AI in behavior prediction tasks, supporting our use of ensemble methods to manage seasonal fluctuations in engagement [40]. While earlier works using Naïve Bayes found its performance to be acceptable for high-dimensional but independent features [12,33], our results confirm that this assumption limits NB’s effectiveness in scenarios involving interdependent KPIs, such as post reach and video views.
Furthermore, our rule extraction for classification, especially from tree-based models, provides a practical link between statistical insight and strategic marketing use, aligning with the work of Magableh et al. (2024), who highlighted data-driven marketing personalization as a key element of sustainable financial performance [41].
This comparison highlights the study’s contribution to social media analytics by combining real-world Facebook retail data, traditional statistical testing, and advanced machine learning. It demonstrates that model interpretability, accuracy, and alignment with user behavior trends are essential for optimizing marketing campaigns in a digital retail environment. The empirical results support Engagement Theory by showing that user engagement increases when content features interactive elements—such as short videos and clickable elements—designed to capture attention and encourage involvement. Similarly, Social Exchange Theory is confirmed through the observed pattern that users are more likely to interact with content when they receive informative or emotional value in return, highlighting the reciprocal nature of digital engagement.

5.1. Research Limitations

Because Facebook remains the most recognizable platform for e-commerce purchases and B2C engagement, it serves as a good case study for detailed engagement analysis. Although the study offers valuable insights, certain limitations need to be addressed to mitigate generalizability. First, the dataset was limited to a single Facebook business page in the fashion industry, which reduces the ability to apply the results broadly across different industry sectors or social media platforms. The class distribution was uneven, with many posts showing low engagement. This imbalance affected the classifiers’ ability to forecast medium- and high-engagement levels accurately. Aside from the KPIs analysis, no qualitative factors are included that could indicate increased user interactions.

5.2. Practical Implications

The insight provides a unique opportunity for marketers to optimize user engagement by exploring and utilizing the publication of engaging videos, call-to-action content, and clickable content. Additionally, applying data mining models to identify high-performing posts and potential user engagement by integrating organic and paid data insights, then categorizing and transforming them into actionable strategies (e.g., e-commerce firms can predict user engagement before publishing). Based on these predictions, targeted audiences and user personalization can be implemented in real-time analytics. Regression rules can also aid campaign planning. Businesses can leverage the current findings by enabling data-based decision-making for content creation, post scheduling, audience segmentation, and posts’ performance optimization. The specific usage of KPIs, such as video views and clicks, would enable marketers to focus on creating optimized content that increases user engagement. Predictive analytics models provide a framework that enables businesses to evaluate the performance of posts before publication, leading to more effective budget management, optimized marketing strategies, and personalized user experiences. Ultimately, the data insights support shifting from the reactive approach of dealing with revenues to more proactive SMM tactics.

5.3. Future Research

Future work will expand the current methodology by increasing the number of businesses across social media platforms, enlarging the dataset, and applying the same approach to different social media platforms. Additionally, more data mining classifiers will be involved in performance assessments, exploring both quantitative and qualitative data, including textual and emotional information. Combining quantitative and qualitative research, utilizing engagement metrics alongside content analysis, can lead to further optimization of SMM [73]. While the current study provides strong quantitative insights into user engagement using KPIs and predictive analytics, the absence of qualitative dimensions, such as users’ motivations, emotional responses, and interpretive behaviors, is notable. These aspects are essential for a more holistic understanding of user engagement but fall outside the scope of the data used. Future research could employ both methodological approaches and sentiment analysis techniques to examine how emotional or psychological factors influence user engagement, thereby enabling a more nuanced and contextual interpretation of user behavior.

5.4. Ethical Considerations and Trends

The circular economy policy recommends that user engagement KPIs be viewed as metrics aligned with sustainable digital behavior. Circular e-commerce actions encompass user recommerce, low-waste logistics, and extended digital product lifecycles, which in turn influence user engagement strategies and ethical concerns related to AI and marketing [73].

6. Conclusions

This study examined how specific key performance indicators (KPIs) influence user engagement on a Facebook business page, using real-world data from a fashion retail brand operating in tourist locations. By applying both regression and supervised machine learning models, including linear regression, Random Forest (RF), Extreme Gradient Boosting (XGBoost), K-nearest neighbors (KNN), and Naïve Bayes (NB), the authors assessed the predictive value of three performance metrics: 3 s video views, organic reach, and other clicks.
The results showed that short video views and other clicks are the most significant predictors of user engagement, aligning with previous findings on user interaction performance metrics in social media marketing [23,26,40]. The XGBoost model achieved the highest classification accuracy (~94.73%), ensuring its performance for small, labeled datasets with user interaction characteristics [12,27,28,29,30,31,32]. These findings support the research hypotheses (H1–H3) and emphasize the Engagement Theory and Social Exchange Theory, which indicate that relevance and user engagement actions are key drivers of digital engagement [19,20,21].
The study also supports previous insights that key performance metrics, such as reach alone, may be able to predict engagement, as they often lack evidence of behavioral intention [24]. Linear regression analysis revealed strong predictive power (R2 ≈ 0.98) for all three KPIs, highlighting their significance in marketing optimizations.
From a practical perspective, the insights provide data-driven recommendations for marketers and decision-makers, including increasing the use of short-form video content and incorporating interactive content elements, as well as refining audience targeting strategies. These suggestions are particularly relevant in tourism retail, where seasonality, impulsive purchasing, and visual appeal significantly influence user behavior [42,48]. Machine learning models’ rule-based suggestions can also help in predicting post-performance before publication, enabling businesses to predict campaign performance and dynamically allocate resources.
However, the study’s scope is limited to a single fashion retailer’s Facebook data, which may affect the generalizability of the results. Consumer behavior across other industries, platforms (e.g., Instagram, TikTok), or regions may exhibit different patterns. Thus, future research should incorporate diverse datasets from multiple sectors and geographical contexts to justify and extend the generalization of the findings. Looking ahead, future studies could explore cross-platform user behavior, temporal dynamics of engagement, or the ethical implications of AI-driven personalization in tourism and retail. Expanding the modeling framework to include explainable AI (XAI) techniques may also enhance transparency in decision-making processes for both marketers and consumers [40,74].
Integrating behavioral and emotional aspects will help create a more comprehensive understanding of user engagement in AI-based marketing [41,47]. In the broader context of tourism marketing, these findings underscore how AI-based engagement models can enhance customer retention, personalization, and sustainability. As the tourism and retail sectors become increasingly digital, predictive analytics will play a crucial role in shaping data-driven strategies aligned with consumer behavior and experience optimization.
In conclusion, this study aims to provide further insights into the existing literature on predictive analytics in social media marketing strategies, focusing on KPIs and demonstrating how machine learning models can predict user engagement. As e-commerce continues to grow, a data-driven approach that uses precision, data, and ethical insights will enhance user experiences and satisfaction, while also developing brand–consumer interactions.

Author Contributions

Conceptualization, D.C.G. and P.K.T.; methodology, D.C.G.; software, D.C.G.; validation, D.C.G.; formal analysis, D.C.G.; investigation, D.C.G.; resources, D.C.G.; data curation, D.C.G.; visualization, D.C.G.; writing—original draft, D.C.G.; writing—review and editing, D.C.G. and P.K.T.; supervision, D.C.G. and P.K.T.; project administration, P.K.T.; funding acquisition, P.K.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Special Account of Research Funds (ELKE), Hellenic Open University, Greece. Grant number MIS 80443.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data is not publicly available due to copyright restrictions from the private company that owns it.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ACCClassification accuracy
AIArtificial intelligence
AUCArea under the curve
E-commerceElectronic commerce
KNNK-nearest neighbors
KPIKey performance indicator
MSEMean squared error
NBNaïve Bayes
pStatistical significance
R2R-squared
RFFandom Forest
RMSERoot mean squared error
SMMSocial media marketing
ρSpearman rank correlation coefficient

Appendix A

Table A1. Descriptive statistics.
Table A1. Descriptive statistics.
KPICountMeanStd Dev.MinMaxMissing Values
3 s video views from Organic posts25004.006.3701650
Reach from Organic posts25003.976.3401640
Other Clicks25004.016.3301640

Appendix B

Table A2. Model characteristics.
Table A2. Model characteristics.
RFXGBoostKNNNB
TypeEnsemble (Bagging)Ensemble (Boosting)Instance-based/Non-parametricProbabilistic
AccuracyHighVery HighMedium to HighFast, less accurate on complex data
AssumptionsNone (non-parametric)None (tree-based)None about data distributionStrong data imbalance independence
Training TimeModerateHighSlow/No trainingVery fast
Prediction TimeFastModerateSlower on large datasetsVery fast
Noise SensitivityLow (due to averaging)Medium (can overfit without tuning)HighModerate
InterpretabilityModerate (feature importance)Moderate (via SHAP or feature importance)LowModerate
Handling Unbalanced DataGood (with class weights or sampling)Excellent (custom loss functions, weights)Poor without tuningPoor
Best Use CasesStructured/tabular data, non-linear relationshipsCompetitions, imbalanced or complex data, rankingSmall/Medium-sized data, Good class segmentationText classification
Spam detection

References

  1. Pancer, E.; Chandler, V.; Poole, M.; Noseworthy, T.J. How readability shapes social media engagement. J. Consum. Psychol. 2019, 29, 262–270. [Google Scholar] [CrossRef]
  2. Bhagat, R.; Chauhan, P.; Bhagat, N. Influence of AI on Consumer Purchase Intentions through Personalized Recommendations. J. Retail. Consum. Serv. 2022, 64, 102746. [Google Scholar] [CrossRef]
  3. Sharma, P. Analyzing the Role of Artificial Intelligence in Predicting Customer Behavior and Personalizing the Shopping Experience in E-commerce. Int. J. Sci. Res. Eng. Manag. 2023, 144, 256–270. [Google Scholar] [CrossRef]
  4. Xiong, L. The Impact of Artificial Intelligence and the Digital Economy on Consumer Online Shopping Behavior and Market Changes. Discret. Dyn. Nat. Soc. 2022, 130, 107197. [Google Scholar] [CrossRef]
  5. Mussa, M.H. The Impact of Artificial Intelligence on Consumer Behaviors: An Applied Study on the Online Retailing Sector in Egypt. J. Shopp. Econ. 2020, 10, 128–145. [Google Scholar] [CrossRef]
  6. Ecommercedb. Global eCommerce Market 2024: Market Growth, Top Players & Online Share. 2024. Available online: https://ecommercedb.com/insights/global-ecommerce-market-2024-size-market-growth-online-share/4784 (accessed on 6 August 2025).
  7. Gironda, J.T.; Korgaonkar, P.K. Understanding Consumers’ Social Networking Site Usage. J. Mark. Manag. 2014, 30, 571–605. [Google Scholar] [CrossRef]
  8. Suraña-Sánchez, C.; Aramendia-Muneta, M.E. Impact of Artificial Intelligence on Customer Engagement and Advertising Engagement: A Review and Future Research Agenda. Int. J. Consum. Stud. 2024, 48, e13027. [Google Scholar] [CrossRef]
  9. Rosenberg, D. Interactions, Technology, and Organizational Change. Emergence 2000, 2, 68–77. [Google Scholar] [CrossRef]
  10. Bonilla-Quijada, M.; Olmo-Arriaga, J.L.D.; Adreu Domingo, D.; Ripoll-i-Alcon, J. Fast Fashion Consumer Engagement on Instagram: A Case Study. Cogent Bus. Manag. 2024, 11, 2322111. [Google Scholar] [CrossRef]
  11. Godey, B.; Manthiou, A.; Pederzoli, D.; Rokka, J.; Aiello, G.; Donvito, R.; Singh, R. Social Media Marketing Efforts of Luxury Brands: Influence on Brand Equity and Consumer Behavior. J. Bus. Res. 2016, 69, 5833–5841. [Google Scholar] [CrossRef]
  12. Gartner Inc. Market Guide for Web, Product, and Digital Experience Analytics; Gartner Inc.: Stamford, CT, USA, 2024; Available online: https://www.gartner.com (accessed on 3 March 2025).
  13. Tower Marketing. User Engagement Metrics. Available online: https://www.towermarketing.net/blog/user-engagement-metrics/ (accessed on 28 January 2025).
  14. Statcounter Global Stats. Social Media Stats Worldwide. Available online: https://gs.statcounter.com/social-media-stats#monthly-200903-202503-bar (accessed on 13 February 2025).
  15. Gensler, S.; Völckner, F.; Liu-Thompkins, Y.; Wiertz, C. Managing brands in the social media environment. J. Interact. Mark. 2013, 27, 242–256. [Google Scholar] [CrossRef]
  16. Ho, C.-W.; Wang, Y.-B. Does Social Media Marketing and Brand Community Play the Role in Building a Sustainable Digital Business Strategy? Sustainability 2020, 12, 6417. [Google Scholar] [CrossRef]
  17. Panigrahi, R.; Borah, S. Classification and analysis of Facebook metrics dataset using supervised classifiers. Soc. Netw. Anal. Comput. Res. Methods Tech. 2019, 1, 1–19. [Google Scholar] [CrossRef]
  18. Chokrasamesiri, P.; Senivongse, T. User Engagement Analytics Based on Web Contents. Comput. Inf. Sci. 2016, 656, 73–87. [Google Scholar]
  19. Martín-Consuegra, D.; Díaz, E.; Gómez-Carmona, D. Consumer Trust in Smart Speaker Advertising: The Role of Perceived Informativeness, Perceived Intrusiveness, and Privacy Concerns. Span. J. Mark.-ESIC 2020, 24, 1–20. [Google Scholar] [CrossRef]
  20. Hossain, M.T.; Akter, S.; Kattiyapornpong, U.; Dwivedi, Y.K. Reconciling the Tension between Trust and Distrust in Sharing Economy Platforms: An Ambidextrous Design Perspective. Pac. Asia J. Assoc. Inf. Syst. 2020, 12, 1–36. [Google Scholar] [CrossRef]
  21. Balyemah, A.J.; Weamie, S.J.Y.; Bin, J.; Jarnda, K.V.; Joshua, F.J. Predicting Purchasing Behavior on E-Commerce Platforms: A Regression Model Approach for Understanding User Features that Lead to Purchasing. Int. J. Commun. Netw. Syst. Sci. 2024, 17, 81–103. [Google Scholar] [CrossRef]
  22. Kim, J.; Kim, M. Rise of Social Media Influencers as a New Marketing Channel: Focusing on the Roles of Psychological Well-Being and Perceived Social Responsibility among Consumers. Int. J. Environ. Res. Public Health 2022, 19, 2362. [Google Scholar] [CrossRef]
  23. Leeflang, P.S.H.; Verhoef, P.C.; Dahlström, P.; Freundt, T. Challenges and solutions for marketing in a digital era. Eur. Manag. J. 2014, 32, 1–12. [Google Scholar] [CrossRef]
  24. De Bruyn, A.; Lilien, G.L. A Multi-Stage Model of Word-of-Mouth Influence through Viral Marketing. Int. J. Res. Mark. 2008, 25, 151–163. [Google Scholar] [CrossRef]
  25. Schultz, D.E.; Peltier, J. Social Media’s Slippery Slope: Challenges, Opportunities and Future Research Directions. J. Res. Interact. Mark. 2013, 7, 86–99. [Google Scholar] [CrossRef]
  26. Fournier, S.; Avery, J. The Uninvited Brand. Bus. Horiz. 2011, 54, 193–207. [Google Scholar] [CrossRef]
  27. Cvijikj, P.I.; Spiegler, E.D.; Michahelles, F. The effect of post type, category and posting day on user interaction level on Facebook. In Proceedings of the IEEE 3rd International Conference on Privacy, Security, Risk and Trust and IEEE 3rd International Conference on Social Computing, Boston, MA, USA, 9–11 October 2011; pp. 810–813. [Google Scholar] [CrossRef]
  28. Cvijikj, P.I.; Michahelles, F. Online engagement factors on Facebook brand pages. Soc. Netw. Anal. Min. 2013, 3, 843–861. [Google Scholar] [CrossRef]
  29. Chowdhury, S.; Faruque, M.; Sharmin, S.; Talukder, T.; Mahmud, M.; Dastagir, G.; Akter, S. The Impact of Social Media Marketing on Consumer Behavior: A Study of the Fashion Retail Industry. Open J. Bus. Manag. 2024, 12, 1666–1699. [Google Scholar] [CrossRef]
  30. Ozturk Kiyak, E.; Ghasemkhani, B.; Birant, D. High-Level K-Nearest Neighbors (HLKNN): A Supervised Machine Learning Model for Classification Analysis. Electronics 2023, 12, 3828. [Google Scholar] [CrossRef]
  31. Ghouchan Nezhad Noor Nia, R.; Jalali, M.; Houshmand, M. A Graph-Based k-Nearest Neighbor (KNN) Approach for Predicting Phases in High-Entropy Alloys. Appl. Sci. 2022, 12, 8021. [Google Scholar] [CrossRef]
  32. Cosenza, D.N.; Korhonen, L.; Maltamo, M.; Packalen, P.; Strunk, J.L.; Næsset, E.; Gobakken, T.; Soares, P.; Tomé, M. Comparison of Linear Regression, k-Nearest Neighbour and Random Forest Methods in Airborne Laser-Scanning-Based Prediction of Growing Stock. For. Int. J. For. Res. 2021, 94, 311–323. [Google Scholar] [CrossRef]
  33. Itoo, F.; Meenakshi; Singh, S. Comparison and Analysis of Logistic Regression, Naïve Bayes and KNN Machine Learning Algorithms for Credit Card Fraud Detection. Int. J. Inf. Technol. 2021, 13, 1503–1511. [Google Scholar] [CrossRef]
  34. Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. Int. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar] [CrossRef]
  35. Chibudike, C.E.; Abdu, H.; Ngige, O.C.; Adeyoju, O.A.; Chibudike, H.O.; Obi, N.I. Machine Learning—A New Trend in Web User Behavior Analysis. Int. J. Comput. Appl. 2021, 183, 5. [Google Scholar] [CrossRef]
  36. Barbaro, E.; Grua, E.M.; Malavolta, I.; Stercevic, M.; Weusthof, E.; van den Hoven, J. Modelling and Predicting User Engagement in Mobile Applications. Data Sci. 2020, 3, 61–77. [Google Scholar] [CrossRef]
  37. Son, H.; Park, Y.E. Predicting user engagement with textual, visual, and social media features for online travel agencies’ Instagram posts: Evidence from machine learning. Curr. Issues Tour. 2024, 27, 3608–3622. [Google Scholar] [CrossRef]
  38. Hootsuite. Facebook Metrics. Available online: https://help.hootsuite.com/hc/en-us/articles/5679252670107-Facebook-metrics (accessed on 25 March 2025).
  39. Naprawski, T. The Impact of Web Analytics Tools on Knowledge Management. Procedia Comput. Sci. 2023, 225, 3404–3414. [Google Scholar] [CrossRef]
  40. Xia, Y.; Shin, S.-Y.; Lee, H.-A. Adaptive Learning in AI Agents for the Metaverse: The ALMAA Framework. Appl. Sci. 2024, 14, 11410. [Google Scholar] [CrossRef]
  41. Magableh, I.K.; Mahrouq, M.H.; Ta’Amnha, M.A.; Riyadh, H.A. The Role of Marketing Artificial Intelligence in Enhancing Sustainable Financial Performance of Medium-Sized Enterprises Through Customer Engagement and Data-Driven Decision-Making. Sustainability 2024, 16, 11279. [Google Scholar] [CrossRef]
  42. Jankovic, S.D.; Curovic, D.M. Strategic Integration of Artificial Intelligence for Sustainable Businesses: Implications for Data Management and Human User Engagement in the Digital Era. Sustainability 2023, 15, 15208. [Google Scholar] [CrossRef]
  43. Lalmas, M.; O’Brien, H.; Yom-Tov, E. Enhancing the Rigor of User Engagement Methods and Measures. In Measuring User Engagement, Synthesis Lectures on Information Concepts, Retrieval, and Services; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
  44. Ahmed, S.M.M.; Owais, M.; Raza, M.; Nadeem, Q.; Ahmed, B. The Impact of AI-Driven Personalization on Consumer Engagement and Brand Loyalty. Qlantic J. Soc. Sci. 2025, 6, 311–323. [Google Scholar] [CrossRef]
  45. Dai, X.; Liu, Q. Impact of Artificial Intelligence on Consumer Buying Behaviors: Study about the Online Retail Purchase. J. Infrastruct. Policy Dev. 2024, 8, 7700. [Google Scholar] [CrossRef]
  46. Jain, R.; Khurana, A. Enhancing Consumer Experience with AI-Powered Chatbots in Online Retail. J. Bus. Res. 2022, 138, 378–389. [Google Scholar]
  47. Jain, S.; Gandhi, A.V. Impact of Artificial Intelligence on Impulse Buying Behaviour of Indian Shoppers in Fashion Retail Outlets. Int. J. Innov. Sci. 2021, 13, 193–204. [Google Scholar] [CrossRef]
  48. Raji, M.A.; Olodo, H.B.; Oke, T.T.; Addy, W.A.; Ofodile, O.C.; Oyewole, A.T. E-commerce and Consumer Behavior: A Review of AI-Powered Personalization and Market Trends. GSC Adv. Res. Rev. 2024, 18, 66–77. [Google Scholar] [CrossRef]
  49. Ruby, R.; Gokulakrishnan, K.; Sasikumar, D. AI Technologies and Online Impulse Buying Behavior: A Bibliometric Analysis. J. Retail. Consum. Serv. 2023, 67, 103006. [Google Scholar]
  50. Tiutiu, M.; Dabija, D.C.; Pantea, M.C.; Felea, M. Artificial Intelligence Implications in Retail in the New Normal: A Qualitative Approach. In Proceedings of the 9th BASIQ International Conference on New Trends in Sustainable Business and Consumption, Constanța, Romania, 8–10 June 2023; Pamfilie, R., Dinu, V., Vasiliu, C., Pleșea, D., Tăchiciu, L., Eds.; ASE: Bucharest, Romania, 2023; pp. 547–554. [Google Scholar] [CrossRef]
  51. Wen, H.; Zhang, L.; Sheng, A.; Li, M.; Guo, B. From “Human-to-Human” to “Human-to-Non-human”—Influence Factors of Artificial Intelligence-Enabled Consumer Value Co-creation Behavior. Front. Psychol. 2022, 13, 863313. [Google Scholar] [CrossRef]
  52. Muhamedyev, R.; Yakunin, K.; Iskakov, S.; Sainova, S.; Abdilmanova, A.; Kuchin, Y. Comparative analysis of classification algorithms. In Proceedings of the 2015 9th International Conference on Application of Information and Communication Technologies (AICT), Rostov on Don, Russia, 14–16 October 2015; pp. 96–101. [Google Scholar] [CrossRef]
  53. Asif, I.H. Machine Learning Decision Tree Visualization. Medium. Available online: https://miro.medium.com/v2/resize:fit:640/format:webp/1*vZtP98UkBRjxzrvqywJZuw.png (accessed on 28 January 2025).
  54. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  55. Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
  56. Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  57. Russel, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall: Hoboken, NJ, USA, 2003. [Google Scholar]
  58. Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann Publishers: Burlington, MA, USA, 2011. [Google Scholar]
  59. Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th ed.; Morgan Kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
  60. Matplotlib. A Plotting Library for Python and Its Numerical Mathematics Extension, NumPy. It Provides an Object-Oriented API for Embedding Plots into Applications. 2023. Available online: https://matplotlib.org/stable/users/index.html (accessed on 28 January 2025).
  61. NumPy. A Library for the Python Programming Language, Adding Support for Large, Multi-Dimensional Arrays and Matrices, along with Mathematical Functions to Operate on These Arrays. 2023. Available online: https://numpy.org/doc/stable/ (accessed on 28 January 2025).
  62. Seaborn. A Data Visualization Library Based on Matplotlib, Providing a Higher-Level Interface for Drawing Attractive and Informative Statistical Graphics. 2023. Available online: https://seaborn.pydata.org/ (accessed on 28 January 2025).
  63. Gkikas, M.C.; Gkikas, D.C.; Vonitsanos, G.; Theodorou, J.A.; Sioutas, S. Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark. Appl. Sci. 2024, 14, 10112. [Google Scholar] [CrossRef]
  64. BYJU’S. Euclidean Distance Formula—Derivation and Examples. Available online: https://byjus.com/maths/euclidean-distance/. (accessed on 28 January 2025).
  65. Im, S.-K.; Chan, K.-H. Vector Quantization Using k-Means Clustering Neural Network. Electron. Lett. 2023, 59, e12758. [Google Scholar] [CrossRef]
  66. DataCamp. Naive Bayes Classifier in Python with Scikit-Learn. DataCamp Tutorials. Available online: https://www.datacamp.com/tutorial/naive-bayes-scikit-learn (accessed on 28 January 2025).
  67. Chan, K.-H.; Im, S.-K. Sentiment Analysis by Using Naïve-Bayes Classifier with Stacked CARU. Electron. Lett. 2022, 58, 411–413. [Google Scholar] [CrossRef]
  68. Javatpoint. K-Nearest Neighbor Algorithm for Machine Learning. Available online: https://www.slideshare.net/MonikaSingh60 (accessed on 28 January 2025).
  69. Mussabayev, R. Optimizing Euclidean Distance Computation. Mathematics 2024, 12, 3787. [Google Scholar] [CrossRef]
  70. Dorta-González, P. A Multiple Linear Regression Analysis to Measure the Journal Contribution to the Social Attention of Research. Axioms 2023, 12, 337. [Google Scholar] [CrossRef]
  71. Popescu, P.S.; Mihaescu, M.C.; Popescu, E.; Mocanu, M. Using Ranking and Multiple Linear Regression to Explore the Impact of Social Media Engagement on Student Performance. In Proceedings of the 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), Austin, TX, USA, 25–28 July 2016; pp. 250–254. [Google Scholar] [CrossRef]
  72. Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  73. Ecommerce Europe. Closing the Loop on the Circular Economy: Strategy Paper. 2025. Available online: https://ecommerce-europe.eu (accessed on 13 May 2025).
  74. Coleman, B. The Ultimate Guide to Customer Engagement in 2024. HubSpot Blog. 21 October 2021. Available online: https://blog.hubspot.com/service/customer-engagement-guide (accessed on 18 January 2025).
Figure 1. Global e-commerce growth rate compared to the total retail market (2018–2028). Online retail outpaces general retail. Digital commerce dominance in global shopping behavior. Source: Data adapted from https://ecommercedb.com/insights/global-ecommerce-market-2024-size-market-growth-online-share/4784 (accessed on 6 August 2025).
Figure 1. Global e-commerce growth rate compared to the total retail market (2018–2028). Online retail outpaces general retail. Digital commerce dominance in global shopping behavior. Source: Data adapted from https://ecommercedb.com/insights/global-ecommerce-market-2024-size-market-growth-online-share/4784 (accessed on 6 August 2025).
Applsci 15 11720 g001
Figure 2. Data analysis operations flowchart. Numbers denote each process in the flowchart.
Figure 2. Data analysis operations flowchart. Numbers denote each process in the flowchart.
Applsci 15 11720 g002
Figure 3. Classification procedure flowchart.
Figure 3. Classification procedure flowchart.
Applsci 15 11720 g003
Figure 4. Organic posts classification box plots.
Figure 4. Organic posts classification box plots.
Applsci 15 11720 g004
Figure 5. KPIs association between pairs scatterplots.
Figure 5. KPIs association between pairs scatterplots.
Applsci 15 11720 g005
Figure 6. Linear Regression coefficient diagram.
Figure 6. Linear Regression coefficient diagram.
Applsci 15 11720 g006
Figure 7. Classifiers’ overall performance assessment.
Figure 7. Classifiers’ overall performance assessment.
Applsci 15 11720 g007
Figure 8. Classifiers’ performance confusion matrices.
Figure 8. Classifiers’ performance confusion matrices.
Applsci 15 11720 g008
Table 1. Research objectives (ROs).
Table 1. Research objectives (ROs).
RO NumberResearch Objective (RO)
1.To identify the KPIs that significantly affect user engagement levels on social media platforms.
2.To classify user engagement into distinct classes based on observed behavioral patterns and interaction metrics extracted from the Facebook business page.
3.To assess and compare the performance of Random Forests (RF), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) for predicting user engagement classes according to KPIs.
4.To provide insights for marketers and decision-makers on how to optimize the efficiency of digital campaigns through user segmentation, targeting, and content strategy.
Table 2. Hypotheses.
Table 2. Hypotheses.
HypothesisStatementJustification
H1.There is a statistically significant link between 3 s video views from organic posts and user engagement.Supported by Engagement Theory, which connects user interactions to content visibility and relevance.
H2.There is a statistically significant connection between reach from organic posts and user engagement.Based on Digital Consumer Behavior Models, which analyze how post-exposure factors influence engagement likelihood.
H3.There is a statistically significant link between other clicks and user engagement.Based on Social Exchange Theory, which views clicks as interactions between users and brands.
H4.Classification models (KNN, NB, RF, XGBoost) can accurately predict the user engagement class based on KPIs.Evaluated empirically through predictive modeling with labeled social media engagement data.
Table 3. User engagement parameters.
Table 3. User engagement parameters.
VariableTypeDescription
3 s video views from Organic postsVideoUp to 3 s video views generated organically
Reach from Organic posts.ReachNumber of unique users who saw the post without paid promotion
Other ClicksClickNumber of clicks on the Page name, profile page, people’s names in comments, the like count, or the timestamp.
Table 4. Normality test.
Table 4. Normality test.
MetricTest Statisticp-Value
3 s video views from organic posts0.153p < 0.001
Reach from Organic posts0.152p < 0.001
Other Clicks0.156p < 0.001
Table 5. Cronbach’s Alpha coefficient.
Table 5. Cronbach’s Alpha coefficient.
MetricCronbach’s Alpha
4 User Engagement KPIs0.99
Table 6. Engagement classification summary.
Table 6. Engagement classification summary.
ClassEngagement LevelUsersSessions
1Low engagement0–11506
2Medium engagement2–10678
3High engagement≥11316
Table 7. Correlation analysis.
Table 7. Correlation analysis.
Variable PairSpearman Correlation (ρ)p-Value
(Sig. 2-Tailed)
Significant (p < 0.05)
3 s video views from Organic posts/Engaged users0.864254541p < 0.001Yes
Reach from Organic posts/Engaged users0.867279652p < 0.001Yes
Other Clicks/Engaged users0.861850581p < 0.001Yes
Table 8. Linear regression performance assessment.
Table 8. Linear regression performance assessment.
ModelMSE 1RMSE 2R 2,3
3 s video views from Organic posts0.8840.9400.979
Reach from Organic posts0.8570.9260.980
Other Clicks0.8620.9290.980
1 MSE Measures the average squared difference between predicted and actual values. Lower values indicate better model performance. 2 RMSE: The square root of MSE, providing error in the same units as the original variable. Lower values indicate better accuracy. 3 R2 represents the proportion of variance in the target variable explained by the model. Values closer to 1 indicate a better fit.
Table 9. Linear regression analysis performance.
Table 9. Linear regression analysis performance.
FeatureCoefficientInterpretation
3 s video views from Organic posts~0.99For each additional 3 s video view, the number of engaged users increases by ~0.99, nearly a 1:1 ratio.
Reach from Organic posts~1.00Each additional organic reach reflects approximately one more engaged user.
Other Clicks~1.00Each extra click correlates with nearly one more engaged user, indicating a strong positive correlation.
Table 10. Classifiers’ performance assessment.
Table 10. Classifiers’ performance assessment.
ClassifierAccuracy 1 (%)Precision 2 (%)Recall 3 (%)F1-Score 4 (%)
Random Forest94.3394.2594.0394.05
XGBoost94.7394.7194.6294.66
KNN90.9390.7690.4790.52
Naïve Bayes77.8777.6277.8777.03
1 Accuracy: The overall correct predictions. 2 Precision: The ratio of correctly classified instances over all predicted instances. 3 Recall: The ratio of correctly predicted instances over all actual positive engagement instances. 4 F1-Score refers to a balanced mean of precision and recall.
Table 11. User engagement decision-making rules.
Table 11. User engagement decision-making rules.
KPI (Organic Metric)Suggestions for Marketers
3 s video views More short video views lead to increased user engagement.
Focus on viral or interesting videos.
Reach from organic postsMore reach does not necessarily lead to increased user engagement.
Impressions without user interaction are ineffective.
Other ClicksCalls to action, such as clickable buttons, lead to increased engagement. Create tempting content.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Theodoridis, P.K.; Gkikas, D.C. Maximizing Social Media User Engagement Through Predictive Analytics in Retail Tourism: Identifying Key Performance Indicators That Trigger User Interactions. Appl. Sci. 2025, 15, 11720. https://doi.org/10.3390/app152111720

AMA Style

Theodoridis PK, Gkikas DC. Maximizing Social Media User Engagement Through Predictive Analytics in Retail Tourism: Identifying Key Performance Indicators That Trigger User Interactions. Applied Sciences. 2025; 15(21):11720. https://doi.org/10.3390/app152111720

Chicago/Turabian Style

Theodoridis, Prokopis K., and Dimitris C. Gkikas. 2025. "Maximizing Social Media User Engagement Through Predictive Analytics in Retail Tourism: Identifying Key Performance Indicators That Trigger User Interactions" Applied Sciences 15, no. 21: 11720. https://doi.org/10.3390/app152111720

APA Style

Theodoridis, P. K., & Gkikas, D. C. (2025). Maximizing Social Media User Engagement Through Predictive Analytics in Retail Tourism: Identifying Key Performance Indicators That Trigger User Interactions. Applied Sciences, 15(21), 11720. https://doi.org/10.3390/app152111720

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop