This section provides a comprehensive analysis of the experimental results, starting with a detailed description of the dataset used for training and evaluation. The section then outlines the evaluation methodology used to assess the model’s effectiveness. The methodology includes training the DQN model, validating its performance, and testing it on a separate dataset. Performance metrics such as precision, recall, and F1 score are used to evaluate the model’s accuracy and relevance of recommendations. Finally, we analyze the experimental results, demonstrating significant improvements in recommendation accuracy and pricing strategy compared to traditional methods, and highlight the model’s ability to adapt to customer behavior through continuous learning.
4.2. Experimental Results
The evaluation of this model utilizes multi-class classification metrics and the proposed weighted multi-class accuracy based on
Table 5.
Figure 9 shows the confusion matrix for this modeling approach.
The confusion matrix presented in our study showcases the performance of an RL model designed for product recommendation and pricing tailored to customer purchase enthusiasm. The true labels on the Y-axis represent the actual customer behaviors, while the predicted labels on the X-axis indicate the model’s predictions. Each cell’s value represents the count of instances for each combination of true and predicted labels.
Our model demonstrates a high degree of accuracy in predicting non-purchasing behavior, with a substantial true positive count of 1,063,240 in the “Don’t Purchase” category. This high accuracy is critical as it allows the model to effectively identify customers who are unlikely to make a purchase, thereby enabling the business to minimize marketing expenditures on these non-responsive segments.
The ability to predict high discount categories accurately is a crucial strength of the model. This accuracy ensures that the model can effectively identify customers who are most likely to respond only when offered significant incentives. In the context of marketing and sales, different customers exhibit varying levels of price sensitivity. Some customers might make a purchase without any discount, while others might require substantial discounts to be persuaded to buy.
When the model accurately predicts that a customer falls into a high discount category (such as 20% or 30%), it indicates that the customer has a high price sensitivity. These customers are less likely to convert without significant incentives. By correctly identifying these customers, the business can tailor its marketing efforts more effectively. Offering substantial discounts to these customers can lead to successful conversions that might not have occurred otherwise.
Moreover, precise identification of high discount customers helps in optimizing the allocation of marketing resources. Instead of blanket discount offers to all customers, the business can target high-discount incentives specifically to those who need them. This targeted approach not only improves conversion rates but also enhances overall profitability. It ensures that the business does not erode its margins by offering unnecessary discounts to customers who would have purchased at a lower discount or even at full price.
Additionally, accurate prediction of high discount categories can improve customer satisfaction. Customers who receive personalized offers that match their price sensitivity are more likely to feel valued and understood, enhancing their overall experience with the brand. This positive experience can lead to increased customer loyalty and long-term customer relationships.
In summary, the model’s ability to accurately predict high discount categories is vital for identifying customers who need significant incentives to convert. It allows for targeted marketing strategies, optimized resource allocation, improved conversion rates, enhanced profitability, and better customer satisfaction. This precision in prediction supports the overall effectiveness of the personalized pricing strategy, making it a valuable asset for the business.
The evaluation results for the proposed RL-based pricing and product recommendation model are presented in
Table 8.
To comprehensively assess the performance of the proposed RL model for product recommendation and pricing, several key metrics were analyzed on the test data, including Macro Average Precision, Macro Average Recall, Macro and Micro F-scores, Macro Averaged AUC, NDCG@5, and Weighted Multi-class Accuracy. Together, these indicators provide a holistic view of the model’s capability to handle multi-class classification and ranking tasks.
A Macro Average Precision of 0.8059 indicates that, on average, 80.59% of the predicted instances were correctly classified. This high precision demonstrates that the model effectively minimizes false positives across all discount classes, ensuring that both recommendations and price adjustments are accurately tailored to the appropriate customer segments.
The Macro Average Recall of 0.8243 shows that, on average, 82.43% of the actual instances for each class were correctly identified. This strong recall value suggests that the model successfully captures most relevant cases across different classes, minimizing the number of customers whose purchasing intentions are overlooked by the RS.
The Macro F-score of 0.8108, representing the harmonic mean of Macro Precision and Recall, reflects the model’s balanced and consistent performance across all discount categories. This equilibrium between precision and recall is crucial in RS, where both the accuracy of suggested products and the inclusiveness of relevant options are important.
The Micro F-score of 0.8560 aggregates performance across all classes and accounts for class imbalance, reflecting the model’s overall accuracy at the dataset level. This result confirms that the model performs robustly across diverse customer behaviors and varying discount preferences.
The Macro Averaged AUC of 0.8743 further validates the strong discriminative ability of the proposed model across the five discount classes. Because AUC is a threshold-independent measure, this result indicates that the model consistently distinguishes between high and low purchase probabilities, regardless of classification boundaries. The macro-averaged computation ensures that all classes contribute equally, confirming a reliable ability to rank customers’ purchase likelihoods across multiple discount levels.
The NDCG@5 score of 0.8947 highlights the model’s exceptional ranking performance across the ordered discount categories. This metric measures how well the model prioritizes desirable outcomes—such as purchases at lower discount rates—while penalizing suboptimal ones, such as unnecessary deep discounts or missed opportunities. The high NDCG@5 score demonstrates that the model ranks pricing actions in a manner consistent with business profitability goals.
Finally, the Weighted Multi-class Accuracy of 0.8082 reflects the model’s predictive effectiveness when accounting for predefined class importance. This confirms that the system achieves reliable performance even when different classes (e.g., discount levels) carry different business priorities.
Collectively, these results demonstrate that the proposed RL model performs at a high level of accuracy, consistency, and business relevance. The strong Macro and Micro F-scores confirm balanced predictive capability across classes, while the high AUC and NDCG@5 values underline the model’s superior ranking and discriminative power. These findings validate the model’s effectiveness in jointly optimizing product recommendations and pricing decisions in alignment with customer purchasing behavior. Consequently, the proposed RL-based framework represents a valuable and practical tool for enhancing marketing decision-making, increasing profitability, and improving customer satisfaction.
The detailed Precision, recall, and F-score macro for the five output classes are presented in
Table 9,
Table 10, and
Table 11, respectively.
The model exhibits high precision for predicting non-purchase behavior (94.09%) and purchases with a 20% discount (94.71%), indicating that it accurately identifies customers in these categories. This precision is critical for minimizing false positives and ensuring that marketing resources are effectively utilized. However, the precision for predicting full-price purchases (72.7%), purchases with a 10% discount (71.3%), and high-discount purchases (70.02%) is moderate, suggesting areas for improvement to enhance the accuracy of these predictions.
In terms of recall, the model demonstrates strong performance in identifying non-purchasers (88.2%), full price purchases (81.6%) and customers who will purchase with a 10% discount (83.52%) or a high discount (81.53%). This indicates that the model effectively captures the majority of relevant instances in these categories. The recall for 20% discount purchases (77.21%) also shows good coverage, though further refinement could help in capturing more instances within these groups.
The Macro F-score, which balances precision and recall, is high for non-purchase (91.07%) and 20% discount (85.07%) categories, reflecting robust overall performance. The F-scores for full-price (76.94%), 10% discount (76.98%), and high-discount purchases (75.34%) indicate a balanced but moderate performance, highlighting potential areas for model enhancement to achieve better accuracy.
In summary, the model performs exceptionally well in predicting non-purchases and 20% discount purchases, with high precision and balanced performance. There are opportunities to improve the precision and recall for full-price, 10% discount, and high-discount categories to optimize resource allocation and marketing efforts. These metrics validate the model’s capability to effectively tailor product recommendations and pricing strategies while identifying specific areas for further refinement to enhance customer satisfaction and conversion rates.
Due to the absence of authoritative studies that have implemented this proposed method, the results of the modeling method for recommending products along with price and the use of a multi-class output in this discussion, an approach for evaluating this method is to simplify the obtained results into two classes, purchase and non-purchase, and compare them with studies that have used RL-based methods for product recommendation in RS.
Studies [
74,
75,
76] examined various DRL methods in the application of RS. The best results in a 2-class RS that suggests purchase or non-purchase are those in the experiments reported in these studies. According to the results presented in
Table 12, our RL-based approach consistently exceeds the performance of Wang et al, DiffRec and XSimGCL across all key metrics, demonstrating its robustness and effectiveness for two-class purchase prediction.
The confusion matrix of results from the 5-class modeling is shown in
Figure 9. In the course of converting the 5-class results into 2-class results, all purchases, whether at full price or with various discounts, are considered as a purchase, and non-purchase is considered as non-purchase, i.e., classes 2 to 5 are merged into each other. With this conversion, the confusion matrix for the 2-class purchase and non-purchase appears as
Figure 10.
The results of the evaluation metrics from converting to the 2-class problem of purchase or non-purchase and comparison with the results of studies [
74,
75,
76] are also shown in
Figure 11. These evaluation results show the modeling method for recommending products along with price using an RL approach and a comparison of the converted 2-class results with [
74,
75,
76].
Our RL-based pricing recommendation model demonstrates superior performance across all key evaluation metrics when compared with three representative state-of-the-art baselines: Wang et al. [
74], DiffRec [
76], and XSimGCL [
75]. In terms of
precision, our model achieves
0.883, which is higher than Wang et al.’s
0.821, DiffRec’s
0.865, and XSimGCL’s
0.872, indicating that our recommendations more accurately capture true purchases with fewer false positives. For
recall, our model attains
0.896, exceeding the
0.868 of Wang et al.,
0.882 of DiffRec, and
0.884 of XSimGCL, highlighting its strength in minimizing missed opportunities for conversions. The balanced effectiveness of our approach is reflected in the
F1-Score of
0.888, which surpasses the best competing result of
0.878 (XSimGCL). Finally, our model achieves an
accuracy of
0.893, which is on par with Wang et al. (
0.893) and slightly ahead of DiffRec (
0.888) and XSimGCL (
0.890). Overall, these improvements underscore the advantages of explicitly modeling pricing as an action and optimizing a profit-aligned reward under willingness-to-pay constraints—capabilities that are not natively supported by diffusion-based or contrastive-learning recommenders.
4.3. Analysis of Experimental Results
The proposed RL-based model for product recommendation and pricing demonstrates consistently strong performance in both multi-class and binary evaluation settings. In the multi-class configuration, the model accurately predicts non-purchases as well as purchases across different discount levels, confirming that the integration of pricing into the recommendation output is both feasible and effective. When simplifying the five-class output into a binary system distinguishing between “Purchase” and “Don’t Purchase,” the model maintains robust performance across all key metrics, reinforcing its reliability in capturing customer purchasing behavior.
When compared with three representative state-of-the-art baselines—Wang et al. [
74], DiffRec [
76], and XSimGCL [
75]—our model consistently outperforms them. While these prior methods primarily focus on ranking accuracy, preference modeling, or contrastive representation learning, the proposed framework explicitly integrates pricing decisions into the recommendation process and optimizes a profit-aware reward function aligned with customers’ WTP. This dual focus enables the model to capture both purchase likelihood and price sensitivity, resulting in more accurate, profit-oriented, and business-relevant recommendations than the existing baselines.
From a marketing perspective, accurately predicting customer purchasing behavior enables more precise and effective promotional campaigns. By identifying which customers are most responsive to specific discount levels, the model supports finer segmentation of the customer base. This targeted approach reduces marketing inefficiency while improving customer satisfaction by delivering personalized discounts that are more likely to convert. The model’s high recall ensures that a greater proportion of the potential customer base is reached, maximizing the overall impact of marketing initiatives. Furthermore, customers who receive tailored offers that align with their price expectations are more likely to perceive the brand as attentive and customer-centric, thereby increasing purchase intent and loyalty.
In terms of revenue optimization, the model’s predictive strength under various discount scenarios is critical. By identifying the optimal discount level required to convert each customer segment, the model helps businesses establish pricing strategies that maximize revenue and profitability. Delivering the right discount to the right customer at the right time can substantially increase sales volume while preserving profit margins. For instance, by minimizing unnecessary high discounts and reserving deeper incentives for highly price-sensitive customers, the firm can optimize its overall promotional expenditure. The model’s balanced performance, reflected in the F-score of 0.8884 and its strong precision and recall values, confirms its ability to drive revenue growth through improved customer targeting and personalized pricing strategies. This capability not only enhances short-term sales but also fosters long-term customer retention and lifetime value.
Although the model effectively predicts both non-purchasing behavior and high-discount purchase scenarios, some challenges remain in accurately classifying intermediate discount categories. Future improvements could focus on refining the state and action space representations—such as by clustering similar products and customers—to enhance contextual differentiation. Additionally, tuning the reward function to better reflect long-term user engagement and purchase satisfaction could further improve predictive accuracy. Incorporating larger and more diverse training datasets, performing hyperparameter optimization, and extending episode lengths may also enhance model convergence and generalization. These refinements would improve the system’s capacity to distinguish between closely related discount levels, resulting in even more precise and effective personalized pricing strategies.