1. Introduction
Recommender systems in real-world e-commerce scenarios face two fundamental challenges:
extreme data sparsity and
temporal dynamics [
1]. While massive catalogs create interaction matrices with densities often below 0.01%, user preferences and item popularity are highly volatile, driven by short-term trends and long-term seasonality [
2]. Traditional collaborative filtering and deep learning models struggle in this environment; they either overfit due to data scarcity or fail to capture rapid popularity shifts due to static modeling assumptions [
3].
Before detailing our approach, we clarify the scope and rationale of this work. Our framework targets the
non-personalized temporal recommendation setting, where the goal is to generate a single, globally optimal top-
K list that serves the entire user population at a given time
. This setting is both practically important and theoretically well-motivated for the following reasons. First, in environments with extreme sparsity (density < 0.0005%), over 95% of users have fewer than two recorded interactions, rendering user-specific preference modeling statistically unreliable [
1]. Under such conditions, personalized methods (e.g., matrix factorization, sequential models) suffer from severe cold-start degradation, as demonstrated empirically in [
2]. Second, non-personalized popularity-aware lists serve as the primary recommendation surface in many real-world scenarios—including e-commerce homepage “trending” sections, app store featured lists, and news portal highlights—where serving millions of users with individualized lists is either computationally prohibitive or operationally unnecessary [
4]. Third, global temporal popularity rankings naturally function as high-quality candidate generators for downstream personalized re-ranking stages, making our framework complementary to, rather than a replacement for, user-level models.
Formally, we address the following research questions:
RQ1: Can a dual-scale sliding-window model effectively capture both short-term trends and long-term periodicities under extreme data sparsity?
RQ2: Does deep integration of temporal modeling with evolutionary optimization (via the purchase heat indicator) yield better solutions than applying either technique independently?
RQ3: What are the individual contributions of each framework component (SWWP-guided initialization, SWWP-guided position updates, temporal fitness)?
To address data sparsity, meta-heuristic algorithms like Elite Evolutionary Discrete Particle Swarm Optimization (EEDPSO) [
5] have shown promise. By optimizing set-based metrics (e.g., Jaccard distance) without relying on gradient descent, EEDPSO avoids the cold-start failures common in neural networks. However, standard EEDPSO has a critical limitation: it is
time-agnostic. It treats all historical interactions equally, recommending “all-time bestsellers” even when they are out of season or no longer trending. This static nature compromises recommendation timeliness and fails to align with the dynamic purchasing intent of users.
To bridge this gap, we propose the Sliding-Window Weighted Popularity (SWWP) model, a lightweight temporal modeling mechanism designed explicitly for sparse environments. Unlike rigid time-decay functions, SWWP employs a dual-scale window strategy: it combines a Short-Term Trend Window to capture immediate popularity drifts (e.g., viral products) with a Long-Term Periodic Window to identify recurring seasonal patterns (e.g., holiday or weekend effects). This allows the system to distinguish between fading fads and enduring habits, generating a highly relevant candidate pool even when user-specific interaction history is minimal.
Furthermore, we present a hybrid framework that deeply integrates SWWP with EEDPSO. Rather than a simple weighted combination, we introduce a novel purchase heat indicator (). This indicator quantifies the current temporal activity level based on time segments, weekdays, and seasonal factors. It acts as a dynamic bridge, guiding the evolutionary search by adjusting particle initialization and fitness evaluation. This ensures that the global optimization capability of EEDPSO is directed towards temporally relevant regions of the search space.
The main contributions of this paper are summarized as follows:
- 1.
Hybrid Optimization Framework: We propose the first framework integrating SWWP with EEDPSO. We formally prove the NP hardness of the temporal-constrained recommendation problem, establishing the theoretical necessity of our metaheuristic approach.
- 2.
Deep Integration Mechanism: We design a purchase heat Indicator () that enables algorithm-level fusion. This mechanism dynamically balances temporal relevance with optimization diversity through time-aware initialization, differentiated position updates, and temporal fitness bonuses.
- 3.
Robust Temporal Modeling for Extreme Sparsity: We develop a dual-scale SWWP model that leverages hierarchical features (segments, weekdays, months). This ensures robust trend capture even in datasets with densities <0.0005%, where traditional sequential models often fail.
- 4.
Extensive Empirical Evaluation: Experiments on Amazon Reviews Data demonstrate that SWWP achieves an NDCG@20 of 0.245, outperforming nine temporal baselines by at least 13%. The hybrid framework significantly surpasses Differential Evolution (DE) and Genetic Algorithms (GAs) in temporal prediction quality. A systematic ablation study isolates the contribution of each integration mechanism, revealing that SWWP-guided position updates and temporal fitness are jointly critical, improving temporal prediction (Mass@K) by over 7× compared to unguided searches.
The remainder of this paper is organized as follows:
Section 2 provides a comprehensive literature review of temporal recommendation systems and meta-heuristic optimization.
Section 3 outlines the preliminaries, detailing the standard EEDPSO algorithm, which serves as our base optimizer.
Section 4 presents our proposed SWWP-EEDPSO hybrid framework, including the formal problem definition, the SWWP temporal modeling, and the deep integration mechanism.
Section 5 describes our experimental evaluation, covering the experimental setup, comparative analysis, and ablation studies. Finally,
Section 6 concludes the paper with key findings and future research directions.
2. Literature Review
This section reviews three foundational areas: temporal modeling in recommender systems, sliding-window techniques, and meta-heuristic optimization for recommendation.
Temporal signals capture how user preferences and item popularity evolve over time, and they form a critical dimension in recommender systems [
6]. In the Netflix Prize, Koren [
7] was the first to show the sizable impact of temporal dynamics on accuracy; his TimeSVD++ model introduced time-dependent bias terms and reduced RMSE by 3.7%.
A large body of work since then confirms the value of temporal modeling at multiple granularities. Micro-level (hourly) hour-scale patterns reflect users’ immediate intent shifts. In Airbnb’s production setting, Grbovic and Cheng [
8] observed that searches at 8–10 a.m. skew toward business stays, whereas those at 8–10 p.m. lean toward leisure trips. Liu et al. [
9] further showed that incorporating hourly features can raise CTR by 15%. Meso-level (daily/weekly) patterns capture weekday–weekend differences. Using Reddit data, Pálovics et al. [
10] found that views of technology content on Mondays are 40% higher than on weekends. Amazon’s study [
11] reported an 8–12% lift in conversion when day-of-week signals are added to the model. Macro-level (monthly/seasonal) seasonality and holiday effects drive long-term popularity shifts. In eBay’s practice, Zimdars et al. [
12] recorded a 300% surge in searches for gift items in the two weeks before Christmas.
Temporal context is also valuable under data sparsity. Zhang et al. [
13] proposed a time-aware collaborative filtering framework that leverages group behavior in similar time periods to produce effective recommendations in cold-start settings. A survey by Campos et al. [
2] noted that when users have fewer than five interactions, adding temporal features can improve accuracy by 25%.
The sliding window is a classic technique in time-series analysis, with roots in signal processing [
14]. Ding and Li [
15] first brought sliding windows to collaborative filtering and reduced MAE by 6.5%. Vinagre et al. [
16] introduced adaptive windows that resize dynamically to track concept drift. Matuszyk et al. [
17] compared five decay functions and found exponential decay to be the most robust in most scenarios, consistent with the Ebbinghaus forgetting curve [
18].
EEDPSO [
5] addresses the challenges of applying traditional PSO to recommendation by redefining velocity and position updates in discrete spaces. In experiments by Lin et al., EEDPSO outperformed Genetic Algorithms by 3% and Differential Evolution by 27% on sparse datasets. Its advantage is pronounced in cold-start scenarios because the optimization does not rely on gradients from historical data. However, a key limitation of EEDPSO is its static optimization assumption, which leaves it ill-suited to temporal dynamics.
Burke’s classic taxonomy classifies hybrid strategies into five types, with deep integration offering the strongest synergy [
19]. A current trend is to tightly couple optimization with feature extraction. Zhou et al. [
20] demonstrated the effectiveness of such coupling within a deep reinforcement learning framework. Recent advances in personalized recommendation have explored multi-interest learning to better capture diverse user preferences. Xie et al. [
21] proposed rethinking multi-interest candidate matching by improving interest representation diversity, demonstrating the importance of capturing heterogeneous user intents. Chen et al. [
22] introduced joint factual and counterfactual explanations for GNN-based recommendations, highlighting the role of explainability in modern recommender systems. While these works focus on personalized settings with sufficient interaction data, they underscore the broader goal of understanding temporal and contextual factors in recommendation quality—a goal our framework pursues through population-level temporal modeling under extreme sparsity. The proposed SWWP-EEDPSO framework follows this trajectory by achieving an algorithm-level deep integration that unifies temporal feature modeling with global optimization.
5. Experiments and Analysis
This section presents experiments on the Amazon Reviews Data (2018) under extreme sparsity (density < 0.0005%). We first compare SWWP against nine temporal baselines (
Section 5.2), then evaluate the full SWWP-EEDPSO framework against evolutionary baselines (
Section 5.3), and finally conduct a systematic ablation study to isolate each component’s contribution (
Section 5.4).
5.1. Experimental Setup
5.1.1. Dataset
Our framework targets top-K recommendation under sparse user–item interactions. We evaluate on several categories from real-world Amazon Reviews Data (2018): AMAZON FASHION, Appliances, Prime Pantry, Software, All Beauty, and Magazine Subscriptions. The subset contains valid items, valid interactions, and valid users.
Amazon Reviews Data (2018) contains both interaction records and rating scores. Following the standard notation in recommender systems, let
be the set of users and
be the set of items. Each interaction is represented as a triplet
, where user
assigns a rating
to item
. For sparsity analysis, we construct a binary interaction matrix
, where
if the triplet
exists. Consequently, the sparsity metrics are computed based on
, while the rating values
serve as the input for the optimization objective; specifically, they correspond to the raw rating term used in the Bayesian-adjusted popularity component (
) of the fitness function in Equation (
2) (detailed in [
5]).
Let nnz denote the number of nonzero entries (i.e., the number of unique pairs). By construction .
Density and sparsity. The matrix has
total entries. An upper bound on the density is
where
,
,
, and nnz is the number of unique
pairs. The corresponding sparsity is
Bipartite (degree) view. Viewing
as a bipartite graph with users and items as partitions and
M edges, the average degrees are
Both are orders of magnitude smaller than the sizes of their respective partitions, reflecting long-tail usage and cold-start behavior.
Storage implications. If the interaction matrix were stored densely in
float32, the memory requirement would be
Using a sparse CSR/COO-like format with 32-bit
row_ptr,
col_idx, and
val, the memory is approximated by
Even if repeated interactions are merged (i.e., ), the density remains far below common “sparse matrix” heuristics (e.g., <). The induced recommendation space is therefore highly sparse—the central challenge our method targets and the regime under which we evaluate.
5.1.2. Experimental Protocol
Temporal evaluation strategy: We employ a sliding window evaluation with strict temporal separation to prevent data leakage:
Evaluation advancement: The evaluation point advances by 1 day at each step.
Short-term data (for ): The most recent
days of interactions before the evaluation time
, used for trend score computation (Equation (
10)).
Long-term data (for ): All available historical interactions prior to
, up to one year, used for periodic score computation (Equation (
11)) with temporal feature matching.
Test data: Interactions on the evaluation day (next 24 h).
Cache update: Precomputed caches (Algorithm 1) are rebuilt before each evaluation using the dual-scale historical data described above.
Leakage prevention: The system timestamp is set to during prediction; only interactions with are accessible, ensuring no future information leakage.
Each algorithm generates predictions for the next 24 h based solely on historical data preceding the evaluation time
. Specifically, the Short-Term Trend Window
aggregates interactions from the most recent 14 days, while the Long-Term Periodic Window
draws from all available history (up to one year) with temporal feature matching (
Section 4.2.1). No future information is accessible during prediction, guaranteeing temporal integrity.
Addressing potential future information leakage. We emphasize that the temporal prediction component
(Equation (
19)) is used
exclusively for fitness evaluation during the optimization benchmark (
Section 4.3), where the goal is to assess whether different search strategies can discover temporally relevant items under identical evaluation criteria. It is
not used during the SWWP temporal modeling phase (
Section 4.2), which relies solely on historical data within the training window. Specifically, (1) the SWWP candidate pool generation (Algorithm 1) uses only historical interactions: the short-term window
covers
(14 days) and the long-term window
covers all available history prior to
with temporal feature matching; (2) the purchase heat indicator
is derived from historical transaction volume patterns; and (3) particle initialization and position updates reference only the precomputed SWWP cache. The future interaction data in
serves the same role as ground-truth labels in supervised learning evaluation; it measures predictive quality without influencing model parameters. All competing algorithms (EEDPSO, DE, GA) share this identical fitness function, ensuring that any performance differences arise from search strategy rather than information advantage.
Implementation details: All experiments were conducted on a single Intel Xeon Gold 6230 CPU core with 32GB RAM allocated. No GPU acceleration was used to ensure fair comparison across methods. The anomalous latency for ExpSmoothing (144,365 ms) resulted from the statsmodels implementation’s poor scaling to our high-dimensional item space (283,932 items), as it attempts to fit separate models per item without vectorization.
Latency Measurement Protocol: All latency measurements follow a standardized protocol to ensure fairness and reproducibility:
Measurement scope: Time from query initiation to final ranked list output, including candidate generation and ranking but excluding data loading.
Warm-up period: 100 calls for JIT compilation and cache warming before measurement.
Sample size: 1000 recommendation calls with randomized query times.
Timer precision: Python 3.10’s time.perf_counter() with nanosecond resolution.
Statistical reporting: Mean latency with 95% confidence intervals (not shown in table for brevity, but all intervals were within ±5% of mean).
The sub-millisecond latencies for SWWP (0.52 ms) and popularity-based methods (0.08 ms) are achieved through aggressive caching of precomputed scores (Algorithm 1). These measurements represent query time performance after preprocessing, consistent with production deployment scenarios where offline computation is standard practice.
ExpSmoothing Implementation Note: The anomalous latency for ExpSmoothing (144,365 ms) results from the statsmodels implementation attempting to fit separate ARIMA models for each of 283,932 items without vectorization. This approach is fundamentally unsuitable for high-dimensional item spaces. We include it for completeness, but note that production-ready implementations would require algorithmic redesign (e.g., clustering items or using shared parameters) rather than per-item models.
Statistical significance: Results are averaged over 30 evaluation windows. Due to the deterministic nature of the temporal popularity methods and the fixed dataset, variance primarily stems from temporal distribution shifts rather than algorithmic randomness.
5.1.3. Setup of Sequential Model Comparison
This study benchmarks nine time-series recommendation algorithms to assess their performance under extreme data sparsity:
RW (Random-Weighted): A weighted random baseline where items are sampled proportionally to their historical popularity. This serves as a lower bound following the evaluation framework of Cremonesi et al. [
4] for top-N recommendation tasks.
TAP (Time-Agnostic Popularity): Global popularity recommendation that ignores temporal dynamics. This implements the most basic collaborative filtering baseline as established in the top-N evaluation framework [
4].
TDP (Temporal-Decay Popularity): An extension of time-aware collaborative filtering with exponential decay, inspired by Koren’s temporal dynamics framework [
7]. Applies decay factor
to down-weight older interactions.
CP–Hour (Conditional Popularity—Hourly): Extends the temporal binning approach from Koren [
7] by partitioning time into segments defined by month × day-of-week × 4-h blocks, maintaining separate popularity distributions per segment.
CP–Week (Conditional Popularity—Weekly): A finer-grained variant modeling periodic patterns at 15 min resolution over the week cycle, extending context-aware splitting methods [
23] to temporal dimensions.
FS (Fourier-Seasonal): Seasonal regression using Fourier basis functions with five harmonics for daily and weekly patterns, following the harmonic regression framework in Bayesian forecasting [
24].
HW (Holt–Winters): The classical exponential smoothing method [
25] that jointly models level, trend, and multiplicative seasonality components for time-series forecasting.
STL-AR: Combines STL (seasonal and trend decomposition using Loess) [
26] with AR(5) autoregression on residuals, leveraging robust local regression for seasonal extraction.
STL-GBM: Enhances STL decomposition [
26] by replacing linear AR with gradient boosting machines to capture nonlinear patterns in the residual component.
We evaluate all algorithms using a comprehensive set of metrics following established recommendation evaluation protocols [
4,
5]:
Ranking quality: NDCG@K, MRR@K, MAP@K—emphasizing top-position accuracy, which is critical for user-facing recommendation.
Set-based accuracy: Precision@K, Recall@K—measuring overlap between recommended and actually purchased items within each temporal window.
Rank correlation: Spearman’s and Kendall’s —assessing agreement with ground-truth item ordering.
Pointwise error: MAE and RMSE—evaluating intensity prediction accuracy.
Beyond-accuracy: Coverage (catalog spread), Intra-List Diversity (ILD), and novelty (popularity-adjusted surprise).
Composite score: A weighted sum of per-metric normalized values for overall ranking.
Standard definitions of these metrics are provided in [
5]; we adopt identical formulations to ensure cross-study comparability. Since SWWP generates global temporal popularity rankings rather than personalized lists, user-centric metrics (MRR, MAP) measure how well a single global ranking serves diverse user needs, following established protocol for non-personalized systems [
4].
Algorithm Naming Consistency. For clarity, we use the following mapping between conceptual names and implementation labels: TAP → GlobalPop, TDP → GlobalPop–Decay, CP–Hour → POP–Segment, CP–Week → POP–minuteOfWeek, FS → Fourier, HW → ExpSmoothing.
Leveraging the above range of model families and multi-dimensional analyses, we conduct a comprehensive evaluation of SWWP’s performance and characteristics in time-series recommender systems.
5.1.4. Setup of Full-Framework Benchmark and Ablation Studies
Beyond the analysis of time-series recommenders, we evaluate the hybrid framework that integrates SWWP with EEDPSO and conduct ablation studies. Critically, all algorithms share the identical unified fitness function
(Equation (
18)), where
is computed based on ground-truth future interactions within a 6 h temporal window (
h). The prediction weight is set to
, with
scaled by a factor of 100 to balance its contribution against
. This design ensures that performance differences arise solely from each algorithm’s search strategy rather than from different optimization objectives. We benchmark against the strongest previously reported baselines—EEDPSO, DE, and GA—to evaluate whether SWWP-EEDPSO’s temporally informed search strategy provides meaningful improvements in discovering solutions with high temporal prediction accuracy.
The selection of comparison algorithms is motivated by their established positions in the metaheuristic optimization landscape. Differential Evolution (DE) and Genetic Algorithms (GAs) represent two fundamental paradigms in evolutionary computation that have demonstrated consistent performance across diverse optimization problems. DE, introduced by Storn and Price, excels at continuous optimization through its unique differential mutation operator, while GA, pioneered by Holland, provides robust exploration through crossover and mutation operations. In the original EEDPSO research, both DE and GA showed stable performance with competitive fitness values, making them ideal benchmarks for evaluating the impact of temporal integration. The inclusion of vanilla EEDPSO serves as an ablation baseline, enabling us to quantify the precise fitness trade-off introduced by SWWP integration.
Table 2 summarizes the configurations used across all algorithms. Our setup is anchored to the detailed hyperparameter analysis reported for EEDPSO [
5]: the original study conducts Optuna-based optimization over 100 trials, using a Bayesian sampler with pruning; selects the best-performing combination; and applies the resulting settings according to dataset size. We adopt these Optuna-identified parameters for the PSO family and keep identical swarm coefficients for EEDPSO and SWWP-EEDPSO to isolate the contribution of SWWP (i.e.,
,
,
).
Table 2 reflects these unified coefficients for both PSO-based algorithms. For DE and GA, we follow standard discrete optimization practice and align population sizes and iteration budgets with the PSO configurations so that all methods operate under comparable compute budgets. This design ties our configurations directly to a published, search-based protocol and strengthens the credibility and reproducibility of our comparisons.
Beyond tables, we analyze using convergence plots and temporal performance charts to comprehensively assess the performance and characteristics of the SWWP-EEDPSO hybrid framework. In this module, our primary focus is the change in fitness and the stability of performance across different temporal contexts.
5.2. Result of Sequential Model Comparison
5.2.1. Recommendation Accuracy Analysis
Table 3 presents comprehensive performance metrics across ten temporal recommendation algorithms under extreme sparsity conditions. SWWP achieves the highest NDCG@20 score of 0.245, representing a 12.9% improvement over the second-best performer GlobalPop–Decay (0.217) and more than doubling the performance of traditional methods like GlobalPop (0.120). This superiority extends across multiple accuracy metrics, with SWWP achieving Precision@20 of 0.340 and Recall@20 of 0.155, the highest among all evaluated methods.
Figure 2 visualizes the NDCG@20 distribution through a horizontal bar chart, revealing a clear performance hierarchy. Three distinct tiers emerge: (1) a high-performance tier led by SWWP (0.245) and GlobalPop–Decay (0.217); (2) a middle tier including POP–Segment (0.191) and ExpSmoothing (0.194); and (3) a lower tier comprising complex forecasting methods like STL-AR (0.112) and STL-GBM (0.087). The random baseline achieves only 0.014, confirming that all temporal methods provide substantial value over chance performance.
The precision–recall scatter plot in
Figure 3 further illustrates algorithm clustering in the accuracy space. SWWP occupies the optimal position in the upper-right quadrant, with the highest precision (0.340) and recall (0.155) values. A notable finding is the formation of three performance clusters: the high-performance cluster (SWWP, POP–Segment) with precision > 0.30; the moderate cluster (GlobalPop–Decay, ExpSmoothing) with precision ≈ 0.21; and the low-performance cluster (Random, POP–minuteOfWeek) with precision < 0.12. This clustering suggests fundamental differences in how algorithms handle temporal patterns under extreme sparsity.
5.2.2. Diversity and Coverage Trade-Offs
Figure 4 examines the relationship between coverage and diversity across algorithms. Despite the extreme sparsity (density < 0.0005%), meaningful differences emerge in how algorithms balance these objectives. SWWP achieves coverage of 0.0011 while maintaining moderate diversity (0.077), representing the best trade-off among temporal methods. In contrast, the Random baseline achieves the highest diversity (0.260) but with coverage of 0.0022, highlighting the exploration-exploitation dilemma.
The coverage analysis reveals an important pattern: popularity-based methods (GlobalPop, GlobalPop–Decay) achieve minimal coverage (0.0001) due to their focus on repeatedly recommending a small set of popular items. Time-segmented approaches (POP–Segment, POP–minuteOfWeek) improve coverage to 0.0006–0.0007 by varying recommendations across temporal contexts. SWWP’s sliding-window approach achieves the best coverage among non-random methods, suggesting that local temporal patterns provide better item discovery than global popularity metrics.
5.2.3. Computational Efficiency
Figure 5 presents the computational latency analysis, revealing a sharp bimodal distribution in algorithm efficiency. The fast group, including SWWP (0.52 ms), GlobalPop (0.08 ms), and GlobalPop–Decay (0.08 ms), maintains sub-millisecond response times suitable for real-time deployment. The slow group exhibits dramatically higher latencies—Fourier (105.39 ms), STL-AR (2196.83 ms), and notably ExpSmoothing (144,365 ms*)—making them impractical for production environments.
SWWP’s 0.52 ms latency represents an optimal balance between sophistication and efficiency. While slightly slower than simple popularity methods, it remains well within acceptable bounds for real-time systems while providing significantly better recommendation quality. The extreme latency of ExpSmoothing (marked with an asterisk in
Table 3) suggests implementation or scalability issues under high-dimensional sparse data.
5.2.4. Prediction Error Analysis
Figure 6 displays the RMSE distribution across algorithms. GlobalPop–Decay achieves the lowest RMSE (0.922), followed by Fourier (0.982) and GlobalPop (1.066). SWWP shows moderate error (1.142), while STL-based methods exhibit the highest errors (STL-AR: 1.352, STL-GBM: 1.346). This pattern suggests that simpler time-decay models better capture temporal dynamics under extreme sparsity, where complex models may overfit to noise.
The MAE results in
Table 3 corroborate this finding, with GlobalPop–Decay (0.802) and Fourier (0.867) achieving the lowest absolute errors. Interestingly, SWWP’s higher prediction error (MAE: 1.012) does not translate to poor recommendation quality, as evidenced by its superior ranking metrics. This discrepancy indicates that SWWP optimizes for ranking quality rather than pointwise prediction accuracy.
5.2.5. Ranking Quality Assessment
The ranking correlation metrics in
Table 3 reveal interesting patterns. GlobalPop–Decay achieves the highest Spearman correlation (0.408) and Kendall’s tau (0.362), indicating strong agreement with ground-truth rankings. POP–minuteOfWeek also shows reasonable correlation (Spearman: 0.314, Kendall: 0.298). Surprisingly, SWWP exhibits near-zero correlation (Spearman: 0.017, Kendall: 0.010), suggesting that its strength lies in identifying relevant items rather than predicting their exact ordering.
As shown in
Figure 7, this apparent weakness in ranking correlation is compensated by SWWP’s exceptional MRR@20 (0.547) and MAP@20 (0.083) scores, both substantially higher than all competitors. The MRR result indicates that SWWP excels at placing at least one highly relevant item at the top of recommendations, critical for user satisfaction in practical systems.
5.2.6. Error Distribution Characteristics
Figure 8 presents box plots analyzing error distribution characteristics. GlobalPop and GlobalPop–Decay exhibit the most compact distributions with IQR ≈ 0.5, indicating high prediction stability. SWWP shows moderate variability (IQR ≈ 1.0), balancing between consistency and adaptability to temporal changes. The random baseline displays the largest spread (IQR > 2.0) with numerous outliers, confirming its unsuitability for sparse recommendation scenarios.
The presence of outliers across most methods suggests that extreme sparsity creates challenging edge cases where temporal patterns break down. SWWP’s moderate outlier count indicates robustness to these edge cases while maintaining sensitivity to temporal variations.
5.2.7. Comprehensive Performance Assessment
Figure 9 synthesizes all metrics into composite scores for final ranking. SWWP achieves the highest composite score (0.861), followed by GlobalPop–Decay (0.762) and POP–Segment (0.754). This comprehensive assessment confirms SWWP’s superiority across multiple dimensions despite its weaker ranking correlation. These results address
RQ1 by demonstrating that the dual-scale sliding-window model effectively captures both short-term trends and long-term periodicities, achieving the highest accuracy across all ranking metrics under extreme data sparsity.
The composite analysis reveals three key insights. First, temporal awareness is crucial under extreme sparsity—all time-aware methods significantly outperform the time-agnostic GlobalPop baseline. Second, complexity does not guarantee performance—sophisticated methods like STL-GBM (composite: 0.473) underperform simpler temporal approaches. Third, the optimal algorithm (SWWP) successfully balances multiple objectives: high accuracy, reasonable diversity, sub-millisecond latency, and robust performance across temporal contexts.
These results validate our hypothesis that sliding-window temporal modeling with hierarchical fallback strategies effectively addresses the challenges of extreme sparsity while maintaining practical deployment feasibility. The consistent superiority of SWWP across diverse evaluation metrics, combined with its computational efficiency, establishes it as the preferred method for temporal recommendation in severely sparse environments.
5.3. Result of Full-Framework Benchmark and Ablation Studies
In
Table 4 and
Figure 10, we analyze the convergence behavior of the hybrid framework. The convergence curves reveal distinct optimization patterns across algorithms. SWWP-EEDPSO achieves the highest mean fitness of 3384.26, surpassing vanilla EEDPSO (3194.14) by 5.95%. This improvement demonstrates that temporally informed initialization and guided position updates steer the search toward higher-quality regions of the solution space, where temporal relevance and optimization quality reinforce each other.
The convergence analysis shows three distinct phases: EEDPSO exhibits rapid early convergence (iterations 1–100) with fitness jumping from 1600 to 3400, followed by gradual refinement; SWWP-EEDPSO demonstrates more moderate initial progress due to temporal constraints but maintains steady improvement throughout; and DE and GA show similar convergence patterns with slower initial progress and a plateau around iteration 300. Notably, EEDPSO converges fastest at iteration 380, while SWWP-EEDPSO requires 425 iterations, suggesting that temporal integration increases search complexity.
Figure 11 and
Figure 12 illustrate performance across different temporal contexts. The temporal performance comparison reveals that SWWP-EEDPSO exhibits distinct temporal patterns with notable performance peaks. The framework demonstrates significant performance improvements during two critical periods: the evening hours (18:00–23:00) and the lunch period (12:00–14:00). These peaks align perfectly with typical e-commerce activity patterns, where user engagement and purchasing behaviors intensify. The hourly heatmap clearly shows that SWWP-EEDPSO successfully captures and leverages these temporal hotspots, achieving up to 8% higher fitness during peak hours compared to off-peak periods. This temporal sensitivity validates the effectiveness of the SWWP integration in adapting recommendations to match real-world user behavior patterns throughout the day.
The experimental results reveal several key findings: SWWP-EEDPSO achieves a 5.95% fitness improvement over vanilla EEDPSO, demonstrating that temporal guidance enhances rather than constrains the optimization process; the hybrid framework maintains excellent stability (0.984 stability score) despite the added complexity of temporal integration; convergence analysis indicates that temporal integration requires moderately more iterations (425 vs. 380), reflecting the additional search effort needed to balance temporal relevance with optimization quality; and temporal performance analysis confirms that SWWP-EEDPSO successfully identifies and exploits temporal patterns, particularly during peak shopping hours.
Most importantly, despite the added complexity of temporal integration, SWWP-EEDPSO consistently and significantly outperforms both GA and DE algorithms across all metrics. With a mean fitness of 3384.26 compared to DE’s 3020.40 and GA’s 3024.00, SWWP-EEDPSO demonstrates superior optimization capability while maintaining lower variance (std = 55.68) than EEDPSO (std = 85.83) and competitive variance relative to both DE (std = 27.77) and GA (std = 43.13). This superior performance confirms that the SWWP-EEDPSO hybrid framework successfully balances temporal awareness with optimization quality, making it the optimal choice for deployment in time-sensitive recommendation environments. These results address RQ2 by confirming that deep integration of temporal modeling with evolutionary optimization yields significantly better solutions than applying EEDPSO without temporal guidance.
5.4. Ablation Study
To isolate the contribution of each integration mechanism in SWWP-EEDPSO, we conduct a systematic ablation study by disabling one mechanism at a time while keeping all other settings identical.
Table 5 reports the results averaged over 21 evaluation windows, with the fitness decomposed into
(base recommendation quality) and
(temporal prediction contribution, computed as
).
Figure 13 visualizes the ablation results. Panel (a) compares Mass@K across variants, showing the temporal prediction quality achieved by each configuration. Panel (b) decomposes the total fitness into its base quality component (
) and temporal prediction component (
), revealing the trade-off between static recommendation quality and temporal relevance.
The ablation reveals a clear hierarchy of component importance. SWWP-guided position updates are the most critical mechanism: removing them while retaining SWWP initialization causes Mass@K to drop by 70.3% (from 0.284 to 0.084), indicating that temporal guidance during the evolutionary search process—not merely at initialization—is essential for discovering temporally relevant solutions. The temporal fitness component is the second most important factor, with its removal leading to a 67.9% Mass@K decline (to 0.091). Without temporal signals in the fitness evaluation, the optimizer has no incentive to favor items that align with future user behavior, even when SWWP guides the search space. SWWP initialization shows a modest independent contribution ( Mass@K), as its effect is largely subsumed by the SWWP-guided updates over 500 iterations. Removing all SWWP guidance results in an 87.5% Mass@K decline (to 0.036), confirming that the integration mechanisms collectively drive temporal prediction quality.
An important insight emerges from the fitness decomposition. Variants without SWWP guidance achieve higher base fitness (e.g., 3792.6 for “w/o All SWWP” vs. 3208.2 for the full model). This occurs because an unconstrained search over the full item catalog () can more freely optimize diversity and coverage. However, the SWWP-constrained search achieves an 8× improvement in temporal prediction quality (Mass@K: 0.284 vs. 0.036). This trade-off is fundamental: by directing the evolutionary search toward temporally relevant regions of the solution space, SWWP-EEDPSO sacrifices some static recommendation quality in exchange for substantially better alignment with future user behavior—precisely the design goal of a temporal recommender system.
These results address RQ3 by demonstrating that (1) SWWP-guided updates and temporal fitness are jointly critical for temporal prediction, (2) the purchase heat initialization provides a complementary but secondary benefit, and (3) the deep integration achieves a meaningful trade-off between base recommendation quality and temporal relevance that would be impossible with either component alone.
6. Conclusions
This paper presented a hybrid non-personalized temporal recommendation framework that successfully addresses the dual challenges of extreme data sparsity and temporal dynamics in popularity-based recommendation scenarios. Through the deep integration of Sliding-Window Weighted Popularity (SWWP) with Elite Evolutionary Discrete Particle Swarm Optimization (EEDPSO), we demonstrated that temporal awareness and optimization quality need not be mutually exclusive goals. From a theoretical perspective, we established that the temporal-constrained recommendation problem is NP-hard through reduction from the Maximum Coverage Problem. This complexity result not only justifies the use of metaheuristic approaches but also highlights the fundamental computational challenges in balancing temporal relevance with optimization quality. The proof demonstrates that even with simplified objectives, finding optimal temporal recommendations requires exponential time in the worst case, motivating the metaheuristic approach adopted in this work.
Our SWWP model introduces several key innovations for temporal recommendation. By incorporating multi-dimensional temporal features—time segments, weekdays, and months—alongside exponential decay mechanisms, SWWP captures complex temporal patterns that traditional popularity-based methods overlook. The hierarchical fallback strategy ensures robustness even when specific temporal combinations lack sufficient data, progressively relaxing constraints from full feature matching to global popularity. Most notably, the purchase heat indicator quantifies temporal activity levels, providing a principled mechanism for balancing temporal and exploratory elements in the recommendation process.
The deep integration between SWWP and EEDPSO represents a significant advancement in hybrid recommendation architectures. Rather than treating temporal modeling and optimization as separate stages, our framework weaves temporal insights throughout the optimization process. The purchase heat indicator guides particle initialization, determining the proportion of temporally popular items (up to 80% during peak periods). Differentiated position updates maintain temporal relevance for SWWP-originated positions while allowing exploration elsewhere. This multi-level integration achieves what neither component could accomplish alone: temporally aware recommendations with strong optimization quality.
Experimental results on Amazon Reviews Data (2018) validate our approach under extreme sparsity conditions (density < 0.0005%). SWWP achieved NDCG@20 = 0.245, outperforming nine temporal baselines, including sophisticated methods like STL-AR (0.112) and Global Popularity–Decay (0.217). The 13% improvement over the strongest baseline demonstrates that our sliding-window approach with conditional filtering effectively captures temporal dynamics. Equally important, SWWP maintains sub-millisecond query time latency (0.52 ms) after offline precomputation, making it viable for deployment in production systems that employ periodic cache refresh—a standard architectural pattern in industrial recommender systems.
The SWWP-EEDPSO hybrid framework reveals important insights about the role of search strategy in temporal recommendation optimization. Under a unified fitness formulation that incorporates temporal prediction accuracy, all algorithms optimize identical objectives, yet SWWP-EEDPSO achieves 5.95% higher mean fitness (3384.26 vs. 3194.14) compared to vanilla EEDPSO. More importantly, SWWP-EEDPSO demonstrates substantially superior temporal prediction performance, successfully identifying items that align with actual future user interactions. In contrast, baseline algorithms (EEDPSO, DE, GA) achieve near-random temporal prediction accuracy, confirming that without temporal guidance in the search process, optimization algorithms cannot discover temporally relevant recommendations even when explicitly evaluated on temporal metrics. SWWP-EEDPSO also exhibits lower variance (std = 55.68 vs. 85.83 for EEDPSO), indicating more stable optimization behavior. The temporal performance analysis revealed pronounced peaks during lunch hours (12:00–14:00) and evening periods (18:00–23:00), achieving up to 8% higher fitness during these high-activity windows.
Our findings have several practical implications for deploying recommendation systems in resource-constrained environments. First, the success of SWWP demonstrates that lightweight temporal methods can outperform complex forecasting models under extreme sparsity, where sophisticated approaches like exponential smoothing and STL decomposition struggle due to insufficient training data. Second, the purchase heat indicator provides an interpretable mechanism for system operators to understand and control the balance between temporal relevance and exploration. Third, the hierarchical caching strategy enables millisecond-level response times while maintaining temporal sophistication, which is crucial for user experience in production systems.
This work also contributes to the broader understanding of hybrid recommendation architectures. The deep integration mechanism we developed—spanning initialization, evolution, and evaluation—provides a template for combining different optimization paradigms. The key insight is that effective hybridization requires more than sequential combination or weighted averaging; it demands algorithm-level integration where each component’s strengths guide the other’s operation. The purchase heat indicator exemplifies this principle, serving as a bridge that allows temporal patterns to influence swarm dynamics without overwhelming the optimization process.
Our framework has several limitations that should be acknowledged.
First, the non-personalized design inherently cannot capture individual user preferences; it is most effective as a trending/candidate generation component rather than a standalone personalized recommender. Extending the purchase heat indicator to incorporate user-specific temporal patterns is a natural but non-trivial extension.
Second, the discrete heat factors (Equations (
14)–(
16)) were derived from the Amazon Reviews dataset and may require recalibration for domains with different temporal activity profiles (e.g., news, entertainment).
Third, while our experiments demonstrate effectiveness under extreme sparsity (
), the relative advantage of SWWP over personalized methods may diminish in denser datasets where user-level models have sufficient training signal.
Fourth, the current framework processes temporal features at fixed granularities; adaptive window sizing that responds to local data density could further improve robustness.
Fifth, scalability to datasets with significantly more items (>10 M) has not been evaluated, and the precomputation cost would scale linearly with catalog size. These limitations define clear boundaries for the applicability of our approach and motivate the future research directions discussed below.
Looking forward, several avenues warrant further investigation. First, extending the temporal modeling to capture user-specific temporal patterns could improve personalization while maintaining computational efficiency. Second, investigating adaptive window sizes that respond to data density could enhance performance across different sparsity levels. Third, incorporating real-time feedback to dynamically adjust the purchase heat calculation could improve responsiveness to sudden shifts in user behavior. Finally, exploring the framework’s performance on other domains with strong temporal characteristics, such as news recommendation or seasonal product promotion, would validate its generalizability.
In conclusion, the SWWP-EEDPSO framework demonstrates that careful integration of temporal modeling with evolutionary optimization can yield superior recommendation quality even under extreme sparsity. By introducing the purchase heat indicator and implementing deep integration mechanisms, we achieved a system that balances temporal relevance, optimization quality, and computational efficiency. As recommender systems continue to face challenges from growing catalogs and sparse interactions, approaches that elegantly combine multiple optimization paradigms will become increasingly valuable for delivering timely, relevant recommendations at scale.