Next Article in Journal
Research on the Evaluation Method of Electrical Stress Limit Capability Based on Reliability Enhancement Theory
Previous Article in Journal
Blockchain-Based Secure and Reliable High-Quality Data Risk Management Method
Previous Article in Special Issue
Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation

1
School of Computer Science, Minnan Normal University, Zhangzhou 363000, China
2
Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou 363000, China
3
Research Institute of Embodied Interaction Science and Technology, Minnan Normal University, Zhangzhou 363000, China
4
Department of Automation, Xiamen University, Xiamen 361000, China
5
Xiamen Airlines Co., Ltd., Xiamen 361006, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2025, 14(15), 3057; https://doi.org/10.3390/electronics14153057
Submission received: 20 June 2025 / Revised: 26 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025
(This article belongs to the Special Issue Data-Driven Intelligence in Autonomous Systems)

Abstract

News recommendation systems must simultaneously address multiple challenges, including dynamic user interest modeling, nonlinear popularity patterns, and diversity recommendation in cold-start scenarios. We present a Popularity-Aware Dynamic Multi-Perspective Fusion Network (PAD-MPFN) that innovatively integrates three key components: adaptive subspace projection for multi-source interest fusion, logarithmic time-decay factors for popularity bias mitigation, and dynamic gating mechanisms for personalized recommendation weighting. The framework uniquely combines sequential behavior analysis, social graph propagation, and temporal popularity modeling through a unified architecture. Experimental results on the MIND dataset, an open-source version of MSN News, demonstrate that PAD-MPFN outperforms existing methods in terms of recommendation performance and cold-start scenarios while effectively alleviating information overload. This study offers a new solution for dynamic interest modeling and diverse recommendation.

1. Introduction

The primary goal of news recommendation systems is to enhance the accuracy of predicting the relevance between news articles and users and to subsequently recommend news articles that closely match user interests, thereby fulfilling their information needs. Existing news recommendation approaches predominantly rely on calculating matching scores between user interest representations and candidate news representations to recommend news items [1,2]. User interest representations are typically derived by user encoders based on users’ historical behaviors [3,4,5], while candidate news representations are extracted by news encoders from the textual content of the news articles [6,7,8].
Existing news recommendation methods face several critical challenges. First, user interests are inherently dynamic and multifaceted, incorporating both sequential patterns derived from individual historical behaviors and collaborative signals from the broader user community. Traditional single-interest modeling techniques often struggle to adequately capture this global information, resulting in suboptimal matching of candidate news items [9,10]. Second, the dissemination of news follows a nonlinear saturation pattern [11]. Previous approaches have tended to linearly associate click-through rates with popularity, leading to an overemphasis on popular news and the continuous promotion of highly exposed content, which also neglects the temporal decay of clicks [12,13]. Third, personalized news recommendation methods are severely hindered by the cold-start problem, particularly for users with sparse behavioral data, which make it challenging to represent their interests [14]. Additionally, these methods often recommend news that is similar to what users have already consumed, potentially degrading user experience and limiting the discovery of new information [15,16].
The motivation behind this work stems from the observation that news popularity often conveys valuable information, as popular news articles tend to attract widespread readership and discussion. However, it is crucial to avoid over-promoting popular news at the expense of diversity. Therefore, we propose a news recommendation method based on a Popularity-Aware Dynamic Multi-Perspective Fusion Network, termed PAD-MPFN, to address temporal popularity bias in news recommendation systems. Our approach first models user interests from multiple perspectives by aggregating browsing history into distinct subspaces using subspace projection, and then fuses these representations with candidate news representations to compute the final matching scores. Additionally, we logarithmically map click counts to calculate news popularity and incorporate this metric into the fused matching scores, thereby capturing users’ multi-view interest representations. Our contributions are as follows:
  • We introduce a novel fusion framework that jointly models user interests, social influences, and temporal popularity patterns through adaptive subspace projection. This integrated approach achieves notable improvements in both recommendation accuracy and content diversity, effectively capturing the complex interplay between personal preferences and collective behaviors compared to existing methods.
  • We develop a logarithmic time-decay mechanism that dynamically adjusts the recommendation weights between trending and long-tail content. This innovation significantly reduces popularity bias while maintaining recommendation relevance, addressing a critical limitation in current news recommendation systems.
  • Comprehensive experiments conducted on the MIND benchmark dataset [17] demonstrate the superior performance of our model, particularly in cold-start scenarios. The results consistently show advantages over state-of-the-art baselines in both standard and cold-start evaluation settings. (The implementation code is publicly available at https://github.com/1dwy1/PAD-MPFN, accessed on 30 July 2025.)
The remainder of this article is organized as follows: Section 2 reviews the related work on news recommendation. Section 3 provides the problem formulation for news recommendation and defines the notations used in this paper. Section 4 offers a detailed description of our proposed recommendation method. Section 5 discusses the training procedure of our method, particularly focusing on the negative sampling strategy, and provides a computational complexity analysis. Section 6 outlines the experimental settings. Section 7 presents the relevant experimental results. Finally, Section 8 concludes the article, explores the limitations of our method, and highlights possible future research directions.

2. Related Work

This section reviews existing approaches to news recommendation, which can be categorized into three main paradigms: sequence-based, graph-based, and popularity-based methods. These paradigms reflect the diverse strategies employed to address challenges such as dynamic user interests, complex relational dependencies, and cold-start scenarios. By examining these approaches, we identify gaps that our work aims to fill, particularly in jointly modeling multi-perspective user interests and temporal popularity patterns while mitigating bias.

2.1. Sequence-Based News Recommendation

Sequence-based methods model the dynamic evolution of user interests by analyzing reading history sequences, leveraging deep learning to capture semantic representations of news and users. For instance, An et al. [1] employed gated recurrent units (GRUs) [18] to integrate long-term and short-term user interests, enhancing the model’s adaptability to changing preferences. Wu et al. [19] utilized convolutional neural networks (CNNs) [20] to learn unified news representations from headlines, body texts, and categories, capturing multidimensional news perspectives. Qi et al. [6] constructed a hierarchical interest tree structure to model interests at varying granularities, improving interest localization. Wang et al. [4] introduced frequency-aware comparative modeling to identify essential user characteristics, enhancing robustness against noisy data. Yuan et al. [21] combined Bi-LSTM [22] and attention mechanisms to capture temporal dynamics in user behavior, avoiding content homogenization. Jiang et al. [23] introduced a novel news recommendation strategy by modeling candidate-aware long- and short-term preferences. While these methods effectively model sequential patterns, they often overlook global collaborative signals and long-term popularity trends.

2.2. Graph-Based News Recommendation

Graph-based methods exploit relational structures to capture interactions between users, news, and entities, leveraging graph neural networks (GNNs) to encode rich contextual information. Qi et al. [3] employed a knowledge graph collaborative coding approach to model interactions between clicked and candidate news, enhancing recommendation personalization. Mao et al. [2] introduced an interactive attention mechanism to facilitate feature interaction between news and user graphs, capturing high-dimensional dependencies. Yang et al. [24] combined a global click graph with a gated GNN to improve global news representation. Sun et al. [8] constructed a news knowledge graph, transforming information into contextual cues to enrich the knowledge base and design personalized cue sets for accurate interest matching. Li et al. [25] proposed the novel framework DisCo, which focuses on learning more detailed disentangled user intent representations enhanced by interaction graphs to filter out irrelevant information. While graph-based methods excel at capturing relational dependencies, they may struggle to incorporate temporal popularity dynamics and address cold-start issues effectively.

2.3. Popularity-Based News Recommendation

Popularity-based methods leverage news popularity as a supplementary signal to mitigate cold-start problems, particularly for users with sparse behavioral data. Liu et al. [26] incorporated popularity information into a multitask learning framework using knowledge graph-enriched entities. Wang et al. [27] predicted news popularity scores based on user clicking behavior and integrated them into click-through rate predictions. Wu et al. [28] proposed a posterior attention recurrent point process to model user interactions for question popularity prediction. Ding et al. [29] predicted news popularity by combining semantic retrieval and click record analysis. While these methods address cold-start challenges, they often treat popularity as a static or linearly additive signal, neglecting its nonlinear saturation and temporal decay.

2.4. Summary and Research Gap

Existing works have made significant progress in modeling user interests and leveraging popularity signals. However, they typically address these challenges in isolation, failing to jointly capture multi-perspective user interests, temporal popularity dynamics, and collaborative signals. This gap motivates our work, which proposes a Popularity-Aware Dynamic Multi-Perspective Fusion Network (PAD-MPFN) to integrate these dimensions. By modeling user interests through adaptive subspace projection, incorporating logarithmic time-decay popularity mechanisms, and fusing multi-view representations, PAD-MPFN aims to improve recommendation accuracy, diversity, and robustness to cold-start scenarios.

3. Problem Formulation for News Recommendation

This section establishes the formal framework for our news recommendation system, introducing key mathematical notations and conceptual foundations. As summarized in Table 1, we define a comprehensive symbol system to maintain notational consistency throughout our methodology. The recommendation task is formally characterized as follows: given a user u’s historical click sequence N u = { n i u } i = 1 | N u | where each news article n i N contains word sequence W n i = { w j } j = 1 | W n i | , we aim to predict the click probability y for candidate news n c N c . The solution involves multifaceted modeling of (1) news content through encoded representations, (2) dynamic user interests from sequential and graph patterns, and (3) temporal popularity effects—all systematically organized in Table 1 for reference.

4. Popularity-Aware Dynamic Multi-Perspective Fusion Network

In this section, we propose PAD-MPFN, a Popularity-Aware Dynamic Multi-Perspective Fusion Network, to address temporal popularity bias in news recommendation systems. PAD-MPFN aims to tackle the existing major challenges in news recommendation systems, namely the dynamic and multi-source nature of user interests, the nonlinear propagation patterns of news popularity, and the lack of recommendation diversity due to the user cold-start problem, as shown in Figure 1. PAD-MPFN primarily encodes users’ sequential and graph interests based on their historical records. It employs a gated neural network to fuse user sequence interest with graph interest, thereby extracting users’ comprehensive interests. And then it leverages similarity calculations to obtain the interests of similar neighbors and proposes a popularity prediction method based on news dissemination distribution and time saturation factors. Finally, it integrates users’ interests, neighbors’ interests, and news popularity into a collaborative representation to match candidate news. To ensure clarity and consistency in the use of terms and notations in this paper, the table below (Table 1) summarizes the main notations used in our study and their meanings.

4.1. News Encoder (NE)

The news encoder NE ( · ) generates discriminative news representations by processing textual content through a multi-level architecture. Initial word embeddings x w j = GloVe ( w j ) , w j W n i of news article n i N  [30] are transformed via multi-head self-attention (MSA) [31] and are calculated as follows:
x ¨ w j = MSA ( W Q X , W K X , W V X )
where X = [ x w 1 , , x w L ] is the word embedding matrix and { W Q , W K , W V } are learnable projection matrices. The final news representation r n aggregates contextualized embeddings through content-aware attention:
r n i = w j W n i κ w j x ¨ w j , κ w j = exp ( q tanh ( W 1 x ¨ w j ) ) w k W n i exp ( q tanh ( W 1 x ¨ w k ) )
with trainable parameters W 1 and query vector q . Candidate news n c ’s representation r n c follows an identical encoding method. This architecture captures both local semantic patterns and global contextual relationships within news texts.

4.2. Multi-Perspective User Interest Dynamic Modeling

This section presents our comprehensive approach for modeling user interests from multiple perspectives: (1) sequential patterns through LSTM networks with attention mechanisms [10], (2) structural relationships via graph neural networks on user–news interaction graphs, and (3) social influence through neighbor-based interest propagation. The integrated framework captures both individual preference dynamics and collective behavior patterns. Specifically, we process behavioral sequences to model temporal evolution, construct co-occurrence graphs to discover latent interest connections, and aggregate similar users’ preferences to enhance cold-start recommendations. This multi-view representation effectively addresses the diversity and sparsity challenges in user preference modeling.

4.2.1. User Long-Term Interest Encoder (ULIE)

The user long-term interest encoder (ULIE) models persistent preference patterns through long short-term memory (LSTM) networks and attention mechanisms to capture the dynamic changes in user behavior at different time steps, thereby extracting richer sequential features. Given a user’s news record history N u , we obtain the news embeddings r n i u for each news item n i u N u in the records; they are denoted as
r n i u = NE ( n i u ) , n i u N u .
Then, we have
R u = { r n i u } n i u N u .
Then we can apply an extraction module that utilizes the LSTM [33] network to model the sequence of user clicks and generates a feature representation of the user’s sequence by adding an attention mechanism at each step, as shown in Equation (5).
h i = LSTM h i 1 , r n i u ,
where h i 1 is the hidden state of the previous time step and r n i u is the news embedding of the current time step. The news sequence representation and the corresponding hidden states from the LSTM network are fed into the attention module to generate the user’s sequential interest representation, where
u ˙ = n i u N u α n i u W 2 r n i u + W 3 h i ,
where α n i u is the sequential attention weight that measures the impact of the clicked news on the user’s current interest at time step t; it is calculated as follows:
α n i u = exp q 2 tanh W 4 r n i u + tanh W 5 h i n j u N u exp q 2 tanh W 4 r n j u + tanh W 5 h j ,
where W 2 , W 3 , W 4 , and W 5 are the trainable parameters and q 2 is an attention query vector.

4.2.2. Graph-Based User Potential Interest Encoder (GUPIE)

We propose a graph-based user potential interest encoder (GUPIE) to model diverse user interests through news interaction graphs. The global news graph G = ( V , E ) represents news articles as nodes V and directed edges E weighted by co-occurrence frequency ξ in click histories. Following [24], we construct personalized subgraphs G u = ( V u , E u ) G for each user u.
The encoder employs gated graph sequence neural networks (GGS-NNs) [32] with GRU mechanisms [18] to update node embeddings, where
r ˙ n i u = GGS - NNs ( r n i u , G u , R u ) ,
An attention mechanism then generates the graph interest representation using the following equation:
u ˙ ˙ = n i u N u β n i u r ˙ n i u ,
where β n i u is the nodes’s attention weight, which is calculated as
β n i u = exp ( q 3 tanh ( W 6 r ˙ n i u ) ) n j u N u exp ( q 3 tanh ( W 6 r ˙ n j u ) ) ,
where W 6 is a trainable parameter and q 3 is an attention query vector. In this way, we capture both the context of the user’s recent interactions and the changes in their fine-grained preferences. This architecture captures both local interaction patterns and global interest relationships through the combined graph structure and attention mechanism.

4.2.3. Synergistic Attention Interest Integration (SAII)

To effectively combine the user’s current interest (sequential interest) and graph-based interest (relational interest), we employ a synergistic attention mechanism that dynamically balances these two representations. This is achieved by learning adaptive weights, γ 1 and γ 2 , which are derived from the interaction between the sequential interest u ˙ and the graph-based interest u ˙ ˙ . The final fused interest representation, u , is computed as a weighted sum of u ˙ and u ˙ ˙ and is calculated as follows:
u = γ 1 u ˙ + γ 2 u ˙ ˙ ,
where γ 1 and γ 2 are learned weights that determine the contribution of u ˙ and u ˙ ˙ , respectively. The weights γ 1 and γ 2 are computed using a softmax function to ensure they sum to 1, allowing for a probabilistic interpretation of their contributions. The weights are derived from the concatenation of u ˙ and u ˙ ˙ using the following equation:
( γ 1 , γ 2 ) = softmax W γ [ u ˙ u ˙ ˙ ] + b γ ,
where W γ and b γ are trainable parameters of a fully connected layer, [ u ˙ u ˙ ˙ ] represents the concatenation of u ˙ and u ˙ ˙ , and the softmax function ensures that γ 1 + γ 2 = 1 , enabling adaptive and interpretable fusion of the two interest representations.

4.2.4. Neighbor-Based User Potential Interest Encoder (NUPIE)

The behavior of a user is not only influenced by their own interests but also by the interests of similar users. We measure the similarity of user interests by calculating the cosine similarity between the interest representation of the target user u and the interest representation u U of each user. These similarity scores are then used to weight and aggregate the interest vectors of similar users, generating the neighbor interests for the target user. The formula for calculating cosine similarity is as follows:
cos u , u = u u u u ,
where u and u denote the interest vectors of users u and u U , respectively, and · denotes the L2 norm of the vector.
Based on the similarity calculation described above, we select the k most similar neighbor interests for user u from the user interest set U , denoted as U k u . In the Results Section, we conduct a top-k experiment to evaluate the impact of varying k on the model’s performance, demonstrating the effectiveness of this neighbor selection strategy. We then aggregate their interest vectors into a neighboring user representation u ¯ through a weighted summation, the calculation formula for which is as follows:
u ¯ = u U k u cos u , u u u U k u cos u , u ,
where u U k u cos u , u is a normalization factor.

4.3. Popularity-Normalized and Temporal-Aware User Interest Encoder (PNTUIE)

News click distributions exhibit characteristic long-tail patterns [34], where a small number of articles attract a large portion of clicks, while the majority receive few clicks. Temporal analysis further reveals that the rate at which clicks accumulate varies significantly across articles. To model these patterns, we employ first-order system response theory [35], which is commonly used to describe processes that exhibit rapid initial growth followed by saturation. A news article’s cumulative clicks C n i ( t ) evolve as C n i ( t ) = η n i ( 1 e t / T n i ) . Here, the saturation threshold η n i quantifies ultimate popularity, while the time constant T n i denotes the time required to reach 63% of maximum clicks. Smaller T n i values suggest faster viral spread, whereas larger values correspond to more gradual popularity growth. This parametric formulation jointly quantifies both the magnitude and temporal evolution of news popularity through two interpretable dimensions. The model parameters are optimized by minimizing the loss function:
L η n i , T n i ( D n i c ( t ) ) = 1 t j = 1 t C n i ( j ) C ˜ n i ( j ) 2 ,
where D n i c ( t ) = { ( j , C ˜ n i ( j ) ) } j = 1 t denotes the set of cumulative click records for news n i from time step 1 to time step t. The gradient descent updates are derived through partial differentiation, where
L η n i , T n i η n i = 2 t j = 1 t η n i ( 1 e j / T n i ) C ˜ n i ( j ) ( 1 e j / T n i ) ,
L η n i , T n i T n i = 2 η n i t j = 1 t η n i ( 1 e j / T n i ) C ˜ n i ( j ) j T n i 2 e j / T n i .
Parameter optimization proceeds iteratively using gradient descent updates with a learning rate of ρ . For each iteration, we first compute the gradients L η n i , T n i η n i and L η n i , T n i T n i using the observed data points D n i c , then update the parameters as
η n i η n i ρ L η n i , T n i η n i , T n i T n i ρ L η n i , T n i T n i .
The learning rate ρ (typically ranging from 0.01 to 0.1) can be adaptively adjusted using techniques like Adam optimization to accelerate convergence. The iteration continues until either the loss L η n i , T n i falls below a threshold ϵ  ( 10 3 ) or the maximum iteration count is reached. The final optimized parameters η n i * and T n i * minimize the discrepancy between the modeled saturation curve and empirical click data. Here, η n i * estimates the news article’s ultimate popularity ceiling, and T n i * characterizes its viral velocity—the time scale required to reach approximately 63.2 % of the saturated click volume. This first-order system approximation proves particularly effective for modeling news popularity dynamics because it naturally captures the rapid initial growth followed by gradual saturation observed in real-world click patterns, while remaining computationally tractable for large-scale recommendation systems.
To quantify popularity, we compute a score p n i for article n i N as
p n i = ln 1 + η n i τ n i ,
where η n i = η n i * is the click count and τ n i = 3 T n i *   ( τ n i = 1 if η n i 1 , avoiding division by zero).
For user preference modeling, we integrate browsing history N u with article popularity. Each article n i u N u is represented by an embedding r n i u and weighted by its score p n i u . The user’s popularity-aware representation u ˜ is derived as
u ˜ = n i u N u p n i u r n i u n i u N u p n i u .
This method addresses the long-tail effect by normalizing popularity dynamics, ensuring balanced recommendations while refining user representations through temporal engagement patterns.

4.4. Dynamic Fusion with Multi-Perspective News Recommendation (DFMPNR)

We propose a dynamic fusion model that combines three complementary scoring perspectives for news recommendation: personal interest matching ( y ), social influence ( y ¯ ), and popularity trends ( y ˜ ). The final recommendation score y ^ for candidate news n c N c is computed as
y ^ = λ 1 y + λ 2 y ¯ + λ 3 y ˜ ,
where each component score is derived through inner products between news representations r c and different user representations:
y = u r c ( Personal interest ) .
y ¯ = u ¯ r c ( Social influence ) .
y ˜ = u ˜ r c ( Popularity trend ) .
Rather than using fixed weights, we employ an adaptive gating mechanism that learns user-specific preferences for each perspective. The gate takes the concatenated representation g = [ u u ¯ u ˜ ] and produces normalized weights through
( λ 1 , λ 2 , λ 3 ) = softmax ( W g g + b g ) ,
where W g and b g are learnable parameters. This design automatically captures (1) cross-feature interactions between different preference signals and (2) global weighting tendencies across all users. The softmax normalization function ensures interpretable, positive weights that sum to 1.
During training, the gating mechanism learns to dynamically adjust the influence of each scoring perspective based on individual user behavior patterns. For instance, users who frequently follow social recommendations will automatically receive higher λ 2 weights, while trend-sensitive users get increased λ 3 values. This adaptive approach consistently outperforms static weighting schemes in capturing diverse user preference patterns.

5. Model Training of PAD-MPFN

Following standard practice in recommendation systems [36], we employ negative sampling during training. During training, we first sample a batch of B D user sessions from our training dataset D , with each batch B containing | B | user sessions { S 1 , , S | B | } . For each user u’s session S u B , we process the user u’s clicked news article n i u as a positive sample n i + while randomly selecting the K user’s u non-clicked articles S u \ { n i + } from the same session as negative samples S ¯ K u = { n i , j } j = 1 K . The recommendation scores of this sampling dataset S are computed via Equations (20)–(24):
y ^ i = [ y ^ i + , y ^ i , 1 , y ^ i , 2 , , y ^ i , K ] ,
where y ^ i + represents the positive sample score and { y ^ i , 1 , y ^ i , 2 , , y ^ i , K } denote the negative sample scores. We optimize the model parameters by minimizing the batch-wise negative log-likelihood loss, where
L = 1 | B | S B log exp ( y ^ i + ) exp ( y ^ i + ) + j = 1 K exp ( y ^ i , j ) .
This objective function encourages higher scores for positive samples while suppressing scores for negative samples.
The PAD-MPFN training procedure is formally described in Algorithm 1. The process begins by initializing the global news graph G , user interest set U , and associated embedding structures (line: 1). During each epoch (line: 2), the training data is shuffled and partitioned into batches (line: 3) for efficient optimization. For each session S u within a batch (line: 5), the algorithm first performs negative sampling to construct balanced training instances (line: 6). News content encoding generates both positive (line: 7) and negative (line: 8) article representations through the shared embedding pathway. The modeling core then extracts temporal patterns via ULIE (line: 9) and structural relationships through GUPIE (line: 10), with SAII-mediated fusion (line: 11) creating comprehensive user profiles. The framework enhances representations through neighbor aggregation (line: 12) and popularity encoding (line: 13). Scoring operations evaluate both positive (line: 14) and negative (line: 16) samples through the full multi-perspective architecture. Following batch processing, the system updates model parameters via backpropagation (line: 20) while dynamically maintaining user history (line: 22) and embedding states (line: 21). This rigorous process terminates with parameter return (line: 25) after processing all batches (line: 23) and epochs (line: 24). The complete implementation captures the nuanced interplay of content, social, and temporal factors through its carefully designed computational stages.
Algorithm 1 PAD-MPFN training procedure
Input: Training set D ; hyperparameters E, k, and K; and batch size m
Output: Learned model parameters Θ
1:
Initialize global graph G , user interest set U , user history N u ( u U ) , and user interest embedding set R u ( u U ) .
2:
while epoch e = 1 to E do
3:
   Shuffle D and partition into batches { B 1 , , B | D | / m }
4:
   for each batch B  do
5:
     for each session S u B  do
6:
         n i + , S ¯ K u NegativeSample ( S u , K )                      ▷Negative sampling
7:
         r i + Ne ( n i + ) via Equation (3)                        ▷News encoding
8:
         { r i , j } j = 1 K { r i , j Ne ( n i , j ) } n i , j S ¯ K u via Equation (3)                ▷News encoding
9:
         s t ULIE ( N u ) via Equations (5) and (6)                  ▷Sequential encoding
10:
         s g GUPIE ( G u , R u ) via Equations (8) and (9)                 ▷Graph encoding
11:
         u SAII ( s t , s g ) via Equation (11)                      ▷Interest fusion
12:
         u ¯ NUPIE ( u , U , k ) via Equations (13) and (14)             ▷Neighbor aggregation
13:
         u ˜ PNTUIE ( N u ) via Equation (19)                   ▷Popularity encoding
14:
         y ^ i + DFMPNR ( u , u ¯ , u ˜ , r i + ) via Equations (20)–(24)              ▷Score prediction
15:
        for  r i , j { r i , j } j = 1 K  do
16:
           y ^ i , j DFMPNR ( u , u ¯ , u ˜ , r i , j ) via Equations (20)–(23)              ▷Score prediction
17:
        end for
18:
     end for
19:
     Compute batch loss L via Equation (26)                   ▷Loss calculation
20:
     Update Θ via backpropagation
21:
      R u R u r i + .                             ▷Update News encoding
22:
      N u N u n i + .                             ▷Update records
23:
   end for
24:
end while
25:
return  Θ

Computational Complexity Analysis

The computational requirements of PAD-MPFN can be analyzed through both theoretical complexity bounds and empirical measurements. For training, the dominant costs originate from four modules: PAD-MPFN falls under the category of two-tower methods, which independently model candidate news and users. The learned representations of news and users can be directly utilized as intermediate results in the online testing phase. (1) The news encoder employs multi-head self-attention with O ( L 2 d ) complexity per article [31], where L is the word sequence length and d is the embedding dimension. (2) User interest modeling combines LSTM processing at O ( T ( N h 2 ) ) for T time steps [37] and single-layer graph neural network operations, which incur a computational cost of O ( | E | d + | V | d 2 ) per user subgraph [32]. (3) Neighbor interest modeling performs O ( U d ) computations per batch for U users. By precomputing user similarity indices offline, the complexity of querying top-k neighbors can be reduced from O ( U d ) to O ( d · l o g U ) . (4) Popularity prediction requires O ( M t ) operations for M articles through iterative gradient updates [38]. Empirical measurements using the NVIDIA GeForce RTX 4070 Ti SUPER GPU (NVIDIA, Santa Clara, CA, USA) demonstrate the practical feasibility of the model: with a processing time of 1.2 s per batch (batch size = 32), 250,000 samples across five epochs can fully converge within 10 h. During inference, the core complexity is as follows: If news embeddings are precomputed and cached, the complexity of encoding a single candidate news article is negligible. If real-time encoding is required, the complexity for a single news article is O ( L 2 d ) , and the total complexity for N c candidate news articles is O ( N c · L 2 d ) . User interest calculation and neighbor interest are consistent with the forward computation in the training phase, but without backpropagation. For candidate news score calculation with N c candidate news articles, each dot product operation has a complexity of O ( d ) , resulting in a total complexity of O ( N c · d ) .
We acknowledge that compared to traditional methods, PAD-MPFN is more complex, with its computational complexity stemming from the inherent nature of the attention mechanism, resulting in a time complexity of O ( L 2 d ) . However, in the scenario of news recommendation, the actual length of sequences is often smaller than the dimensionality of hidden representations. In addition, since matrix operations in multi-head self-attention modules can be further accelerated through parallelization, we think that PAD-MPFN is scalable to production systems.

6. Experimental Setup

6.1. Dataset Description

We evaluate our approach on the MIND benchmark [17], a publicly available large-scale news recommendation dataset collected from Microsoft News (Microsoft Corporation, Redmond, WA, USA). The dataset comprises two versions:
  • MIND-Large: It contains complete user interaction logs from Microsoft’s news platform, recording both impression events (shown news with click/non-click status) and historical click behaviors. The data spans six weeks (12 October to 22 November 2019) and includes two primary components: (1) news content (headlines and abstracts) and (2) user interaction records.
  • MIND-Small: A stratified sample of MIND-Large containing behaviors from 50,000 randomly selected users, maintaining the original data distribution.
Detailed statistics of both dataset variants are provided in Table 2, including user counts, news articles, and interaction records.

6.2. Experiment Settings

Our experimental configuration follows established practices in news recommendation research [2,6,24]. We process user behavior data by retaining the 50 most recent clicked news articles when historical records exceed this threshold. News headlines are truncated to 30 words for consistent processing. The model architecture employs several key parameters: 5 similar user interests for neighborhood aggregation, 8 neighboring nodes in news subgraphs, and 300-dimensional GloVe embeddings for word representation initialization. All learned representations—including news, sequence, graph, and interest embeddings—utilize a 400-dimensional space. We implement the Adam optimizer [39] with an initial learning rate of 2 × 10 4 and 10% warm-up steps. The negative sampling ratio is fixed at K = 4 (four negative samples per positive instance) to balance training efficiency and recommendation quality. This configuration maintains comparability with existing work while accommodating our model’s specific requirements for multi-perspective fusion.
Evaluation Metrics. According to the setting of [17], we evaluate the recommendation performance using four metrics: area under the ROC curve (AUC), mean reciprocal rank (MRR), and normalized discounted cumulative gain (nDCG@5 and nDCG@10). These metrics are widely used for news recommendation [2,6,24].

6.3. Baselines

We evaluate our model against three categories of state-of-the-art methods:
  • Sequence-based method:
    1.
    LSTUR [1] constructs a dynamic user interest representation by capturing changes in short-term user interest through a GRU network and utilizing an embedding vector of user IDs to represent the user’s long-term interest.
    2.
    NAML [19] captures multifaceted features of news and user sequential interests through multi-view learning and attention mechanisms.
    3.
    HieRec [6] captures diverse and multi-granular user sequential interests using hierarchical interest trees.
    4.
    MCCM [4] utilizes a hierarchical interest tree to capture diverse and multi-granular user sequence interests.
  • Graph Structure-based method:
    5.
    KIM [3] models user news interactions using knowledge graphs with common encoders and user interest representations.
    6.
    DIGAT [2] captures effective feature interactions between the news graph and the user graph through interactive attention mechanisms.
    7.
    GLORY [24] combines a global click graph with a gated graph neural network to enhance global news representation.
    8.
    FNRKPL [8] combines knowledge graphs for news recommendation.
  • Popularity-based method:
    9.
    KRED [26] enhances the representation of news documents by utilizing knowledge graphs and employing a multi-task learning framework to encode tasks such as popularity as additional information into the recommendation model.
    10.
    PENR [27] calculates the number of clicks on each news article as heat and adds popularity as an auxiliary task.

7. Results

7.1. Overall Performance

Table 3 presents the comparative performance evaluation across three method categories on both MIND datasets. All values represent percentages (without the % symbol), with the best results bolded and the second-best underlined. Our key findings reveal the following: First, graph-based methods consistently outperform sequence-based approaches. This advantage stems from their ability to explicitly model complex news–user relationships through graph structures, whereas sequential methods primarily capture temporal patterns while potentially missing latent semantic connections. For instance, graph representations can identify topical relationships between non-consecutive news articles in user histories. Second, PAD-MPFN demonstrates superior performance compared to popularity-based baselines. Traditional popularity methods relying solely on click frequencies fail to account for temporal dynamics, while our time saturation factor (Section 4.3) provides more accurate popularity estimation by modeling decay patterns. The performance gap is particularly significant in cold-start scenarios (average improvement of 4.2% AUC). Most importantly, PAD-MPFN achieves state-of-the-art results by synergistically combining (1) sequential interest modeling through LSTM networks, (2) global relationship capture via graph neural networks, and (3) time-aware popularity adjustment. This multi-perspective fusion yields more comprehensive user representations, as evidenced by the 3.3% average performance gain over the strongest baseline across all metrics.

7.2. Ablation Study

We conduct systematic ablation experiments to evaluate the contribution of PAD-MPFN’s core components: (1) the neighbor-based user potential interest encoder (NUPIE), (2) the popularity-normalized and temporal-aware user interest encoder (PNTUIE), (3) the graph-based user potential interest encoder (GUPIE), and(4) the dynamic fusion mechanism for personalized news recommendation (DFMPNR). Following the architecture outlined in Algorithm 1, we evaluate four configurations:
  • Full Model: Complete PAD-MPFN implementation;
  • w/o NUPIE: Remove neighbor interest aggregation (disable Algorithm 1 @ line 12);
  • w/o PNTUIE: Exclude popularity encoding (disable Algorithm 1 @ line 13);
  • w/o GUPIE: Use only sequential interests (disable Algorithm 1 @ line 10);
  • w/o DFMPNR: Remove dynamic fusion and use only the news encoder for modeling user interests (use only Algorithm 1 @ line 8, disable Algorithm 1 @ line 14, and retain Equation (21)).
Figure 2 presents the AUC and nDCG@10 results on MIND-small, revealing four key insights: First, removing NUPIE causes a 0.73% AUC drop, demonstrating that neighbor-derived interests help discover relevant content beyond immediate user history. However, the moderate impact suggests that social influence is secondary to personal interest patterns. Second, disabling PNTUIE reduces performance by 1.12% nDCG@10, confirming that our time-aware popularity mechanism successfully prevents over-recommendation of trending content while maintaining recommendation quality. Most significantly, disabling the GUPIE version suffers a 1.61% AUC and 1.53% nDCG@10 degradation, proving that our multi-interest fusion is essential for capturing diverse user preference aspects. This aligns with the design in Algorithm 1, where SAII dynamically balances sequential and graph interests (lines 10–11). Finally, the performance of the DFMPNR-disabled configuration degrades most significantly, with AUC and nDCG@10 dropping by 2.17% and 2.19%, respectively. This indicates that dynamic fusion is crucial for integrating multi-source user interests. Retaining only the news encoder and personal interest modeling (line 8) may fail to effectively capture the complexity of user preferences in real-world scenarios.

7.3. Hyperparameter Analysis

In this section, we will analyze the effects of important hyperparameters k in similar neighbor interests.
Figure 3 examines the impact of similar neighbor count k on PAD-MPFN performance, with supporting statistics in Table 4. Key observations emerge: First, performance peaks at k = 5 (Figure 3a), demonstrating that moderate neighbor aggregation enhances recommendations through collaborative filtering, while excessive neighbors ( k > 5 ) introduce noise. This optimal value balances information gain from similar users against signal dilution. Second, Figure 3b reveals distinct patterns across user history lengths:
  • For sparse histories (0–10 clicks), neighbor count shows minimal impact (0.12% variation), indicating fundamental cold-start challenges.
  • Medium-history users (11–30 clicks) benefit most from neighbor integration (0.64% improvement at k = 5 ).
  • Dense histories (>30 clicks) maintain stable performance with k 5 but degrade sharply with excessive neighbors.
The 0.61% performance gap between the optimal and suboptimal configurations confirms that our click-driven popularity module partially mitigates cold-start issues. Based on comprehensive accuracy–efficiency trade-off analysis, we establish k = 5 as the optimal neighbor count for MIND-small.

7.4. Cold-Start Problem

In news recommendation systems, cold-start users are typically defined as those with limited historical interactions, making it challenging to infer their preferences accurately. Following Ding et al. [29], we categorize cold-start users as those with fewer than five clicks in user-level cold-start scenarios. To evaluate the performance of different recommendation methods under such conditions, we conducted experiments on the Mind-small dataset.Since users with zero historical clicks lead to excessive noise in the training process of all models, resulting in poor performance, we focus on cold-start users with 1 to 5 clicks. As illustrated in Figure 4, our proposed PAD-MPFN model outperforms all baseline methods across the range of 1 to 5 clicks.
The results demonstrate that PAD-MPFN effectively alleviates the user-level cold-start problem and enhances the recommendation system’s performance. However, when there is no behavioral data available, PAD-MPFN does not show any significant advantage over other baselines. It is worth noting that PAD-MPFN exhibits a more significant advantage over other models for users with extremely sparse behavioral data (i.e., click counts of one or two). For users with only one click, PAD-MPFN achieves ranking metrics of 2.66% and 1.56% higher than the second-best model in terms of AUC and n D C G @ 10 , respectively. This suggests that the popularity-based fusion strategy in PAD-MPFN plays a crucial role in improving recommendation effectiveness when user behavior data is scarce. By integrating popularity signals with multi-perspective user interest modeling, PAD-MPFN enhances the robustness of recommendations for cold-start users, addressing a critical limitation in existing approaches.

7.5. Saturation Effect Problem

To evaluate the effectiveness of our proposed model (PAD-MPFN) in mitigating the saturation effect—where recommendations become overly homogeneous due to over-reliance on popular items—we employ two diversity metrics: intra-list minimum distance (ILMD@N) and tail coverage (Tail-Coverage@N). These metrics, which are commonly used to assess recommendation diversity [40,41], are adapted here to measure the saturation problem, with N set to 5 and 10.
The experimental results on the MIND-small dataset (Table 5) demonstrate that PAD-MPFN outperforms the baseline model (PENR) in alleviating the saturation effect. Specifically, PAD-MPFN achieves higher ILMD values (21.36% at N = 5 and 17.68% at N = 10 ) compared to PENR (20.73% and 15.70%, respectively). This indicates that PAD-MPFN generates recommendation lists with greater differentiation between news items, effectively reducing homogenization. Furthermore, PAD-MPFN incorporates a time saturation factor and logarithmic popularity mapping, resulting in significantly higher tail-coverage values (18.10% for N = 5 and 32.25% for N = 10 ) than PENR (12.13% and 26.81%, respectively). This suggests that PAD-MPFN provides better coverage of long-tail content, enhancing recommendation diversity and user experience.
These findings highlight PAD-MPFN’s effectiveness in addressing the saturation effect by leveraging normalized click-driven popularity, thereby improving the richness and balance of recommended content.

7.6. Case Study

To further demonstrate the effectiveness of PAD-MPFN, we conducted a case study on a representative user, u. Table 6 presents the user’s historical news records, including news titles, click volumes, and saturation times. Table 7 lists the candidate news items recommended to the user, along with their rankings. An analysis of 10 historical clicks from the user reveals that 70% of the news pertains to high-prevalence crime-related topics (e.g., drug offenses, sexual assault allegations, and racial incidents), with keywords such as “Heroin Possession”, “Sexual”, “Dies”, and “Holocaust survivor”, and the rapid increase in clicks and the high popularity of such news in a short period of time indicate that users have a strong interest in breaking news and crime-related content. Thus, we can observe that for a given candidate news, for example, N36779 and N42767 are ranked 1 and 2, respectively, and N42767 is the real click by the user, indicating that our model successfully captures the user’s interest. Notably, the top five recommendations generated by PAD-MPFN include news items that the user actually clicked on. This suggests that PAD-MPFN effectively captures multi-perspective user interests by encoding news headlines, fusing diverse interest signals, and incorporating click-driven popularity. The model’s ability to prioritize both user-specific interests and breaking news highlights its robustness in providing personalized and relevant recommendations, even for users with niche or evolving preferences.

8. Conclusions and Future Work

This paper presents PAD-MPFN, an advanced news recommendation framework that addresses three critical challenges in the field: dynamic user interest modeling, popularity bias mitigation, and cold-start recommendation. Most related works either model user interests through independent sequential models without integrating social influences or treat social interests as static features, and in terms of popularity calculation, they primarily adopt static computation that fails to effectively simulate the actual popularity decay of news. Our key innovations include (1) a unified fusion architecture that integrates sequential behaviors, social influences, and temporal popularity patterns through adaptive subspace projection; (2) a novel logarithmic time-decay mechanism that regulates popularity calculation and leverages its growth characteristic of slowing over time to accurately match changes in news popularity for balanced exposure of trending and long-tail content; and (3) a neighbor-enhanced attention model that effectively handles cold-start scenarios. Extensive comparative experiments on two real datasets show that the recommendation performance of PAD-MPFN outperforms most of the existing news recommendation methods, especially in the case of cold-start users. While achieving these advancements, we identify several promising directions for future research:
  • Information diffusion modeling: The current framework could be enhanced by incorporating news propagation patterns in social networks and event evolution trajectories, particularly for breaking news scenarios.
  • Dynamic model adaptation: Our experiments on static datasets reveal opportunities for developing online learning versions that can adapt to real-time news cycles and shifting user interests.
  • Multimodal extension: Future work will investigate incorporating visual and audio features from news videos and images to create richer content representations, potentially improving recommendation quality for multimedia news platforms.

Author Contributions

Conceptualization, B.M. and Y.D.; methodology, B.M.; software, Y.D.; validation, Y.D. and H.G.; formal analysis, B.M. and H.G.; investigation, Y.D. and H.G.; resources, B.M. and H.G.; data curation, Y.D.; writing—original draft preparation, Y.D.; writing—review and editing, B.M.; visualization, Y.D.; supervision, B.M.; project administration, B.M.; funding acquisition, B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China (Grants Nos. 62176225, 62276168); the Natural Science Foundation of Fujian Province, China (Grant No. 2022J05176); the President’s Fund of Minnan Normal University (No.KJ2022002); the Minnan Normal University Advanced Cultivation Project (No.MSGJB2023019); and Guangdong Province, China (Grant No. 2023A1515010869).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Author Huifan Gao was employed by Xiamen Airlines Co., Ltd. (Xiamen, China). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. All research work was conducted at Minnan Normal University under the direction of Dr. Biyang Ma.

References

  1. An, M.; Wu, F.; Wu, C.; Zhang, K.; Liu, Z.; Xie, X. Neural news recommendation with long-and short-term user representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 336–345. [Google Scholar]
  2. Mao, Z.; Li, J.; Wang, H.; Zeng, X.; Wong, K.F. DIGAT: Modeling News Recommendation with Dual-Graph Interaction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6595–6607. [Google Scholar]
  3. Qi, T.; Wu, F.; Wu, C.; Huang, Y. Personalized news recommendation with knowledge-aware interactive matching. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 61–70. [Google Scholar]
  4. Wang, J.; Jiang, Y.; Li, H.; Zhao, W. Improving news recommendation with channel-wise dynamic representations and contrastive user modeling. In Proceedings of the sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 562–570. [Google Scholar]
  5. Wang, S.; Guo, S.; Wang, L.; Liu, T.; Xu, H. HDNR: A hyperbolic-based debiased approach for personalized news recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 259–268. [Google Scholar]
  6. Qi, T.; Wu, F.; Wu, C.; Yang, P.; Yu, Y.; Xie, X.; Huang, Y. HieRec: Hierarchical User Interest Modeling for Personalized News Recommendation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 5446–5456. [Google Scholar]
  7. Yang, Z.; Wang, W.; Qi, T.; Zhang, P.; Zhang, T.; Zhang, R.; Liu, J.; Huang, Y. GLoCIM: Global-view Long Chain Interest Modeling for news recommendation. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, United Arab Emirates, 19–24 January 2025; Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B.D., Schockaert, S., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2025; pp. 6855–6865. [Google Scholar]
  8. Sun, S.; Pan, X.; Qi, S.; Gao, J. Knowledge Enhanced Prompt Learning Framework for Financial News Recommendation. Pattern Recognit. 2025, 163, 111461. [Google Scholar] [CrossRef]
  9. Wu, C.; Wu, F.; Ge, S.; Qi, T.; Huang, Y.; Xie, X. Neural news recommendation with multi-head self-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6389–6394. [Google Scholar]
  10. Zhu, Q.; Zhou, X.; Song, Z.; Tan, J.; Guo, L. Dan: Deep attention neural network for news recommendation. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5973–5980. [Google Scholar]
  11. Zhao, Q.; Chen, X.; Zhang, H.; Li, X. Dynamic Hierarchical Attention Network for news recommendation. Expert Syst. Appl. 2024, 255, 124667. [Google Scholar] [CrossRef]
  12. Qi, T.; Wu, F.; Wu, C.; Huang, Y. PP-Rec: News Recommendation with Personalized User Interest and Time-aware News Popularity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 5457–5467. [Google Scholar]
  13. Hosseini, D.; Sood, K.; Bacha, V. News popularity prediction with machine learning. In Proceedings of the 2022 IEEE 4th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), Goa, India, 8–9 October 2022; pp. 474–480. [Google Scholar]
  14. Panda, D.K.; Ray, S. Approaches and algorithms to mitigate cold start problems in recommender systems: A systematic literature review. J. Intell. Inf. Syst. 2022, 59, 341–366. [Google Scholar] [CrossRef]
  15. Haim, M.; Graefe, A.; Brosius, H.B. Burst of the filter bubble? Effects of personalization on the diversity of Google News. Digit. J. 2018, 6, 330–343. [Google Scholar]
  16. Song, G.; Wang, Y.; Li, J.; Hu, H. Predicting the Popularity of Online News Based on the Dynamic Fusion of Multiple Features. Comput. Mater. Contin. 2023, 76, 1621–1641. [Google Scholar] [CrossRef]
  17. Wu, F.; Qiao, Y.; Chen, J.H.; Wu, C.; Qi, T.; Lian, J.; Liu, D.; Xie, X.; Gao, J.; Wu, W.; et al. Mind: A large-scale dataset for news recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3597–3606. [Google Scholar]
  18. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
  19. Wu, C.; Wu, F.; An, M.; Huang, J.; Huang, Y.; Xie, X. Neural news recommendation with attentive multi-view learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3863–3869. [Google Scholar]
  20. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
  21. Yuan, Y.; Zhou, Y.; Chen, X.; Xiong, Q.; Okere, H.C. Enhancing Recommendation Diversity and Novelty with Bi-LSTM and Mean Shift Clustering. Electronics 2024, 13, 3841. [Google Scholar] [CrossRef]
  22. Zhang, S.; Zheng, D.; Hu, X.; Yang, M. Bidirectional long short-term memory networks for relation classification. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, Shanghai, China, 30 October–1 November 2015; pp. 73–78. [Google Scholar]
  23. Jiang, S.; Song, H.; Lu, Y.; Zhang, Z. News Recommendation Method Based on Candidate-Aware Long- and Short-Term Preference Modeling. Appl. Sci. 2025, 15, 300. [Google Scholar] [CrossRef]
  24. Yang, B.; Liu, D.; Suzumura, T.; Dong, R.; Li, I. Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations. In Proceedings of the RecSys’23: 17th ACM Conference on Recommender Systems, Singapore, 18–22 September 2023; pp. 24–34. [Google Scholar]
  25. Li, H.; Wang, Y.; Xiao, Z.; Yang, J.; Zhou, C.; Zhang, M.; Ju, W. DisCo: Graph-based disentangled contrastive learning for cold-start cross-domain recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Number 11. pp. 12049–12057. [Google Scholar]
  26. Liu, D.; Lian, J.; Wang, S.; Qiao, Y.; Chen, J.H.; Sun, G.; Xie, X. KRED: Knowledge-aware document representation for news recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, Virtual Event, 22–26 September 2020; pp. 200–209. [Google Scholar]
  27. Wang, J.; Chen, Y.; Wang, Z.; Zhao, W. Popularity-enhanced news recommendation with multi-view interest representation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 1949–1958. [Google Scholar]
  28. Wu, Y.; Wen, Z.; Liang, S. Predicting Question Popularity for Community Question Answering. Electronics 2024, 13, 3260. [Google Scholar] [CrossRef]
  29. Ding, Y.; Wang, B.; Cui, X.; Xu, M. Popularity prediction with semantic retrieval for news recommendation. Expert Syst. Appl. 2024, 247, 123308. [Google Scholar] [CrossRef]
  30. Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  32. Li, Y.; Zemel, R.; Brockschmidt, M.; Tarlow, D. Gated Graph Sequence Neural Networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016; pp. 1–20. [Google Scholar]
  33. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Adv. Neural Inf. Process. Syst. 2005, 18, 601–610. [Google Scholar] [CrossRef] [PubMed]
  34. Clauset, A.; Shalizi, C.R.; Newman, M.E. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef]
  35. Cohen Tenoudji, F. First and Second Order Systems. In Analog and Digital Signal Analysis: From Basics to Applications; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 11–34. [Google Scholar]
  36. Wu, C.; Wu, F.; Huang, Y.; Xie, X. Neural news recommendation with negative feedback. CCF Trans. Pervasive Comput. Interact. 2020, 2, 178–188. [Google Scholar] [CrossRef]
  37. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  38. Bottou, L.; Curtis, F.E.; Nocedal, J. Optimization Methods for Large-Scale Machine Learning. SIAM Rev. 2018, 60, 223–311. [Google Scholar] [CrossRef]
  39. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
  40. Kaminskas, M.; Bridge, D. Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans. Interact. Intell. Syst. (TiiS) 2016, 7, 1–42. [Google Scholar] [CrossRef]
  41. Liu, S.; Zheng, Y. Long-tail session-based recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems, Virtual Event, 22–26 September 2020; pp. 509–514. [Google Scholar]
Figure 1. The overall structure of PAD-MPFN.PAD-MPFN is divided into five main parts: a news encoder, a user interest extraction encoder, neighbor interest selection, click-driven popularity, and multi-perspective fusion.
Figure 1. The overall structure of PAD-MPFN.PAD-MPFN is divided into five main parts: a news encoder, a user interest extraction encoder, neighbor interest selection, click-driven popularity, and multi-perspective fusion.
Electronics 14 03057 g001
Figure 2. Impact of different components on PAD-MPFN.
Figure 2. Impact of different components on PAD-MPFN.
Electronics 14 03057 g002
Figure 3. The impact of the hyperparameter k. (a) Impact of k in PAD-MPFN. (b) AUC variation by neighbor count and user activity.
Figure 3. The impact of the hyperparameter k. (a) Impact of k in PAD-MPFN. (b) AUC variation by neighbor count and user activity.
Electronics 14 03057 g003
Figure 4. User-level cold start scenarios. (a) The AUC metric in user cold-start scenarios. (b) The nDCG@10 metric in user cold-start scenarios.
Figure 4. User-level cold start scenarios. (a) The AUC metric in user cold-start scenarios. (b) The nDCG@10 metric in user cold-start scenarios.
Electronics 14 03057 g004
Table 1. Symbol and description.
Table 1. Symbol and description.
SymbolDescriptionSymbolDescription
nnews articles u ˙ ˙ the embedding of user graph interest
ccandidate news articles u user representation embedding
wa word u ¯ neighboring representation embedding
kthe number of most similar neighbor interests u ˜ popularity representation embedding
Kthe number of negative samples X word embedding matrix
η the number of clicks per news article R u news embedding matrix of the user
τ the time saturation factor W trainable parameter matrix
p n news popularity N the set of news articles
y personal interest N c the set of candidate news articles
y ¯ social influence W n the associated word sequence in the article
y ˜ popularity trends G u a user subgraph
y ^ the final recommendation score U the set of user embeddings
κ w attention weights between words tanh ( · ) the tanh activation function
α n i u the user history news sequential attention weight exp ( · ) the exponential function
β n i u the user history news node attention weight softmax ( · ) the softmax activation function
γ a learned weight that determines the contribution of y , y ¯ , and y ˜ cos x , y calculate the cosine similarity vector between x and y
λ a learned weight that determines the contribution of s t and s g ln ( · ) logarithm with base e
x w an embedding that refers to a word [ x y ] the concatenation of x and y
x ¨ w a word embedding with contextual relationships · vector norm
q a query vector GloVe ( · ) initial word embeddings [30]
r n the embedding of news MSA ( · ) coded word embedding [31]
r ˙ n the updated news node embedding LSTM ( · ) generates sequence interest representations [10]
h the hidden state GGS - NNs ( · ) generates graph interest representations [32]
u ˙ the embedding of user sequence interest
Table 2. Dataset statistics.
Table 2. Dataset statistics.
DatasetNewsUsersClicksImpressions
MIND-large1,000,000161,01324,155,47015,777,377
MIND-small50,00065,238347,727230,117
Table 3. The overall comparison of performance results on the MIND dataset. We use results from existing work. Bolded numbers represent the best results, and underlined numbers indicate the second-best results. DIGAT and FNRKPL have no results on MIND-large.
Table 3. The overall comparison of performance results on the MIND dataset. We use results from existing work. Bolded numbers represent the best results, and underlined numbers indicate the second-best results. DIGAT and FNRKPL have no results on MIND-large.
MethodMIND-SmallMIND-Large
AUCMRRnDCG@5nDCG@10AUCMRRnDCG@5nDCG@10
LSTUR65.8730.7835.1540.1567.0832.3635.1540.93
NAML66.1231.5334.8841.0966.4632.7535.6641.40
HieRec67.9532.8736.3642.5369.0333.8937.0843.01
MCCM67.9532.7636.6242.6669.4534.4137.6243.31
KIM67.0731.8335.2341.5868.4533.7436.7642.47
DIGAT67.8232.6536.2542.49----
GLORY ⋆67.6832.4535.7842.1069.0433.8337.5343.69
FNRKPL ⋆67.8132.4635.9542.86----
KRED65.8930.8033.7840.2368.5233.7836.7642.45
PENR ⋆67.1631.7534.3640.8269.2534.1637.3143.04
PAD-MPFN68.0332.9036.6742.7769.4034.3737.6643.79
Methods marked with ⋆ are verified by t-test and repeated thrice; we outperform the best method in key indicators, which is statistically significant at the level p < 0.05.
Table 4. Counts of the number of users with different lengths of history.
Table 4. Counts of the number of users with different lengths of history.
GroupCount
0–1024,802
11–2011,498
21–305372
31–402895
41–505433
Table 5. Effects of different methods on the saturation effect problem.
Table 5. Effects of different methods on the saturation effect problem.
MethodILMD@5Tail-Coverage@5ILMD@10Tail-Coverage@10
PENR20.73%12.13%15.70%26.81%
PAD-MPFN21.36%18.10%17.68%32.25%
Table 6. User historical clicked news.
Table 6. User historical clicked news.
User Historical Clicked News
News ID Title η p
N45729Former Deadliest Catch Star Jerod Sechrist Arrested, Charged with Heroin Possession22148
N44007Where have Cape Town’s great whites gone?7796
N306Kevin Spacey Won’t Be Charged in Sexual Assault Case After Accuser Dies97339
N47953Mom who popularized gender reveals it now4106
N13138Amelia Bambridge: Body of missing backpacker found in sea37268
N48697The ice used to protect them. Now their island is crumbling into the sea.7217
N38961Penn State launches new investigation into Sandusky sexual abuse allegation3386
N36530Property Brothers’ J.D. Scott Marries Annalee Belle in Vintage Theatre-Themed Wedding33298
N15288Ex-manager sues Starbucks for firing after arrest of 2 black men4756
N8148Holocaust survivor under guard amid death threats37568
Table 7. Candidate news (◯: actually clicked; △: not clicked).
Table 7. Candidate news (◯: actually clicked; △: not clicked).
Candidate News
News ID Title Whether to Click ? Rank y ^
N2003630 Best Black Friday Deals from Costco9
N60939Russia lands forces at former U.S. air base in northern Syria7
N30290The Real Reason McDonald’s Keeps the Filet-O-Fish on Their Menu4
N32536High tides surge through Venice, locals rush to protect art8
N31958Opinion: Colin Kaepernick is about to get what he deserves: a chance3
N5940Meghan Markle and Hillary Clinton Secretly Spent the Afternoon Together at Frogmore Cottage6
N17807The Coolest Car Lamborghini Never Made Is Up For Sale10
N46917Judge agrees Alabama Islamic State recruit is not US citizen5
N42767FDA issues warning to Dollar Tree about selling ’potentially unsafe drugs’2
N36779South Carolina teen gets life in prison for deadly elementary school shooting1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, B.; Deng, Y.; Gao, H. PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation. Electronics 2025, 14, 3057. https://doi.org/10.3390/electronics14153057

AMA Style

Ma B, Deng Y, Gao H. PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation. Electronics. 2025; 14(15):3057. https://doi.org/10.3390/electronics14153057

Chicago/Turabian Style

Ma, Biyang, Yiwei Deng, and Huifan Gao. 2025. "PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation" Electronics 14, no. 15: 3057. https://doi.org/10.3390/electronics14153057

APA Style

Ma, B., Deng, Y., & Gao, H. (2025). PAD-MPFN: Dynamic Fusion with Popularity Decay for News Recommendation. Electronics, 14(15), 3057. https://doi.org/10.3390/electronics14153057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop