GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education

Liang, Yanli; Liu, Hui; Liu, Songsong

doi:10.3390/electronics15051086

Open AccessArticle

GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education

by

Yanli Liang

¹,

Hui Liu

^2,* and

Songsong Liu

²

¹

School of Marxism, Beijing Jiaotong University, Beijing 100044, China

²

School of Cyber Science and Technology, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 1086; https://doi.org/10.3390/electronics15051086

Submission received: 4 December 2025 / Revised: 27 February 2026 / Accepted: 2 March 2026 / Published: 5 March 2026

Download

Browse Figures

Versions Notes

Abstract

Ideological and political education (IPE) is a cornerstone of higher education in China. As IPE-related book collections expand rapidly, university libraries face a growing challenge of information overload, which hinders the accurate characterization of student reading preferences and the efficient matching of resources to demand. To address these issues, this study proposes GDNN, a practical hybrid recommendation system designed for both warm-start and cold-start scenarios. For warm-start users with historical borrowing records, we develop the PPSM-GCN framework. This framework enhances the classical graph convolutional collaborative filtering model LightGCN by integrating a novel potential positive sample mining (PPSM) strategy, which effectively mitigates data sparsity and improves the modeling of latent interests. For cold-start users without interaction history, we introduce an embedding and MLP architecture. This deep neural network learns implicit reader–book associations from reader attributes and book metadata, enabling personalized recommendations even in the absence of historical data. Experimental results demonstrate that PPSM-GCN and the embedding and MLP method achieve significant performance gains in their respective scenarios. This research provides both technical support and practical insights for the precise delivery of IPE resources and the overall enhancement of educational effectiveness in higher education.

Keywords:

ideological and political education book recommendation; graph neural network; cold start; data sparsity; hybrid recommendation model

1. Introduction

Ideological and political education (IPE) is a comprehensive discipline concerned with the formation, development, and transformation of individuals’ ideological and moral character, with the core objective of guiding individuals to establish a sound worldview, outlook on life, and value system [1,2]. Within higher education, ideological and political literacy is widely regarded as a key factor shaping students’ growth and success, playing a leading role in their holistic development [3,4]. Among the diverse resource systems supporting IPE in universities, book-based resources occupy a foundational position due to their systematic theoretical interpretation and coherent ideological narratives, thereby providing essential support for students’ theoretical learning, value identification, and moral cultivation.

IPE-related book resources span multiple disciplinary domains, including pedagogy, psychology, aesthetics, and sociology. Meanwhile, owing to the heterogeneity of university student populations, the demand for educational resources exhibits substantial personalization [5]. In real-world library services, students may struggle to locate suitable IPE books from a large collection, and recommendation lists may fail to reflect their actual learning needs. From the perspective of librarians and IPE practitioners, it also remains challenging to accurately identify user preferences, to ensure adequate educational relevance [6], and to avoid coarse-grained and labor-intensive resource management. Therefore, developing a practical personalized recommendation mechanism for IPE book resources—grounded in real borrowing data and capable of predicting students’ reading preferences—is of significant value for improving service quality and educational effectiveness.

Although recommendation systems specifically tailored to IPE books are still limited, book recommendation has attracted considerable attention in the broader literature [7,8,9,10,11]. Existing methods typically follow two mainstream technical routes. One line of research adopts heuristic or reinforcement-learning-based strategies to cope with noisy observations and sparse feedback. For example, Wang X. et al. [12] introduced a clustering-based reinforcement learning approach to filter noise and alleviate sparsity, improving recommendation performance in digital library environments. Another increasingly influential line is representation learning for collaborative filtering, evolving from matrix factorization (MF) to graph neural networks (GNNs), which often provides more expressive representations and stronger generalization. For instance, Ng Y. K. et al. [13] employed MF to build a children’s book recommendation system, improving age appropriateness and personalization. However, MF is essentially linear and may be insufficient to capture complex nonlinear dependencies in user–item interactions. By contrast, graph convolutional networks (GCNs) are naturally suited to graph-structured data and can exploit topological associations and latent dependencies among nodes, showing notable advantages in book recommendation tasks [14,15,16,17].

Despite these advantages, GCN-based recommenders still face two fundamental challenges in practical library scenarios. The first is interaction sparsity. Even lightweight architectures such as LightGCN [18,19,20], which simplify propagation to mitigate oversmoothing and redundant computation, remain constrained by extremely sparse user–item interactions. This issue is particularly pronounced for IPE book recommendation: due to the academically rigorous and thematically serious nature of IPE books, users typically consider multiple factors (e.g., thematic relevance, reading difficulty, and authoritativeness) when selecting resources. Consequently, borrowing behaviors are sparse and explicit feedback is limited, making it difficult to learn reliable preference signals from observed interactions alone.

The second challenge is the cold-start problem, which is ubiquitous in real library services [21]. In addition to warm-start users with historical borrowing records, the system must also serve cold-start users who have limited or no interaction history. Moreover, a timebased split—which better matches real deployment—naturally introduces a substantial number of cold-start users in the validation and test sets (see Section 2.1). For cold-start users, graph-based collaborative filtering methods that rely primarily on user–item edges become completely ineffective, as the absence of interactions leaves no neighborhood information available for message passing and representation learning [22].

To address both sparsity and cold-start in a unified manner, this study proposes GDNN, a practical hybrid recommendation system applicable to both warm-start and cold-start scenarios. For warm-start users with historical borrowing records, GDNN adopts a GCN-based collaborative filtering module and further incorporates a PPSM strategy to strengthen supervision signals under sparse interactions. The key intuition behind PPSM is that interacted items directly reflect a user’s interest preferences; therefore, potential positive samples can be mined from items that are similar to those already interacted with. Concretely, PPSM computes item–item co-occurrence similarity and selects high-confidence similar items from unobserved interactions as potential positives, which alleviates sparsity and improves the model’s capability of capturing users’ latent reading interests. For cold-start users, GDNN introduces a DNN-based recommendation module built upon an embedding and MLP architecture [23,24], which leverages reader attribute information and book metadata to learn implicit user–item associations and enable personalized recommendations even without historical borrowing records.

The contributions of this study are summarized as follows:

We construct a real-world library borrowing dataset for IPE books, which supports the study of personalized recommendation in higher education settings.
We propose GDNN, a practical hybrid recommendation system that can conduct both warm-start and cold-start recommendations under a realistic time-based evaluation protocol.
For warm-start users, we design a GCN-based collaborative filtering module enhanced with PPSM, which strengthens supervision signals by mining high-confidence potential positives from unobserved interactions based on their similarity to users’ interacted items, thereby alleviating sparsity and improving latent interest modeling.
For cold-start users, we develop a DNN-based recommendation module built upon an embedding and MLP architecture, which leverages reader attributes and book metadata to learn implicit user–item associations, enabling personalized recommendation in the absence of historical borrowing records.
Experimental results demonstrate that PPSM-GCN and the embedding and MLP method achieve significant performance gains in their respective scenarios, thereby providing technical support and practical insights for the precise provision of IPE book resources and the enhancement of educational effectiveness in higher education institutions.

The remainder of this paper is organized as follows: Section 2 introduces the experimental dataset, including its source, preprocessing procedures, and the characteristics it exhibits. Section 3 presents the proposed hybrid recommendation system GDNN, with an emphasis on the warm-start branch based on GCN collaborative filtering with the PPSM strategy, as well as the cold-start branch built upon an embedding-and-MLP-based DNN that leverages reader attributes and book metadata. Section 4 reports and analyzes the experimental results, evaluating the effectiveness of PPSM-GCN in warm-start scenarios and the embedding and MLP method in cold-start scenarios for IPE book recommendation. Section 5 elaborates on the ideological and political education value and application significance of the GDNN model, exploring its important role and practical value in improving the accuracy and pertinence of ideological and political education resource recommendation. Section 6 discusses the positive achievements attained in this study as well as several existing limitations.

2. Experimental Data

2.1. Data Collection and Processing

Raw Data Collecting and Cleaning. University library reader data are highly rich, encompassing basic information such as gender, grade, major, and college, as well as detailed book borrowing records. In this study, we select the dataset of book borrowing records published by Shandong Agricultural University, covering the period from 2014 to 2024. The raw dataset contains a total of 1,476,944 borrowing records, encompassing 73,418 readers and 184,193 books. Next, we perform data cleaning on the raw dataset. Specifically, we remove records with missing key fields (such as the reader’s college) and eliminate duplicate borrowing entries. Following this process, we obtain a cleaned dataset consisting of 1,351,336 borrowing records, involving 73,387 readers and 184,158 books.

IPE Book Borrowing Record Extracting. Upon obtaining the cleaned raw data, we further extract the borrowing records specifically for IPE books. Given the political, scientific, practical, and comprehensive nature of IPE, its audience is broad and its content extremely diverse. To accurately capture IPE-related resources, we refer to the Chinese Library Classification system. From a “macro IPE” perspective, we extract book categories closely related to IPE. Specifically, we select books whose classification codes begin with “A”–“G” (excluding “E”), encompassing 6 major categories and 58 subcategories, covering topics such as Marxist theory, ideological and political education, Party history and governance, philosophy, military studies, economics, and law. Relevant classification codes are summarized in Table 1.

After the extraction process described above, we obtain 235,532 IPE book borrowing records that are highly relevant to the research objectives, involving 38,717 readers and 46,279 IPE books. In the next section, we conduct an in-depth analysis of these IPE borrowing records to extract key reading behaviors and borrowing patterns. These insights can provide both a data foundation and a theoretical basis for the advancement of IPE in universities.

Recommendation Dataset Construction. Based on the IPE borrowing records extracted above, we construct the recommendation dataset required for the experiments in Section 4. Specifically, to mitigate the impact of sparse data on the experimental results, we follow the approach in references [18,25] by retaining only those readers and books with at least five interaction records, along with their borrowing records. After filtering, we obtain an IPE book recommendation dataset containing 10,206 readers, 9048 books, and 107,403 borrowing records.

Data Splitting of the Recommendation Dataset. The statistics at each step of the above data processing flow are summarized in Table 2. Finally, the obtained recommendation dataset is divided using a time-based split into a training set (2014–2021), a validation set (2022), and a test set (2023–2024), containing 98,649, 3989, and 4765 records, respectively. We further observe that the dataset suffers from a severe user cold-start problem, and the detailed data distribution is shown in Table 3. To address this issue, this paper proposes a practical hybrid recommendation system, GDNN, which is applicable to both warm-start and cold-start scenarios (see Section 3 for details).

2.2. Data Statistics and Visualization

Large-scale and real-world library datasets of IPE book borrowings contain abundant information that can be leveraged to optimize IPE. By analyzing these borrowing records, we can gain a better understanding of readers’ interests and preferences, providing valuable insights for the design of subsequent recommendation systems.

For the IPE book borrowing dataset, we analyze the distribution of different types of users. The reader category distribution is shown in Figure 1, where undergraduates account for 80.6%, while master’s students, doctoral students, and faculty members represent smaller proportions. This indicates that undergraduates and graduate students constitute the primary borrowing groups for IPE books, whereas faculty members borrow such resources at a relatively low rate.

Regarding user preferences, we analyze the most popular categories of IPE books. Figure 2 compares the popularity of the 58 extracted subcategories. As shown, the six most frequently borrowed subcategories are A8 (Marxism, Leninism, Mao Zedong Thought, Deng Xiaoping Theory), B84 (Psychology), F2 (Pre-capitalist Social Modes of Production), D9 (Law), C91 (Sociology), and F8 (Various Fields of Economics). These books span multiple disciplines, including Marxist theory, psychology, law, sociology, and economics, reflecting the diverse interests of readers within IPE book collections.

To further explore differences in borrowing behaviors among reader groups, this study conducts a comparative analysis based on users’ college and major backgrounds. The results, shown in Figure 3, indicate that disciplinary background significantly influences reading preferences: students from the College of Humanities and Law show greater interest in law (D9), Chinese politics (D6), and political theory (D0); students from the College of Information Science and Engineering exhibit higher engagement with psychology (B84); while students from the College of International Exchange and the School of Foreign Languages prefer economic planning and management (F2)-related books. These findings clearly illustrate the relationship between academic discipline and reading interest. Notably, despite disciplinary differences, all groups demonstrate shared attention to core areas such as Chinese politics, economics, and psychology, highlighting the broad foundation of IPE in shaping students’ common values.

Meanwhile, we also observe that even within the same academic discipline, users of different educational levels exhibit notable differences in preferences. For example, among students from the College of Agriculture, as shown in Figure 4, graduate students demonstrate a greater interest in topics related to Chinese economics and politics compared with undergraduates.

3. Methodology

To address the problem of predicting university students’ reading preferences for IPE books, we propose GDNN, a practical hybrid recommendation system applicable to both warm-start and cold-start scenarios. For warm-start readers, GDNN employs the embedding propagation mechanism of GNNs to model complex reader–book interactions. Furthermore, a potential positive sample mining (PPSM) strategy is introduced to alleviate data sparsity and improve the model’s ability to capture latent reading interests. For coldstart readers, reader attributes and book metadata are taken as input to a DNN model to learn implicit reader–book associations, thereby enabling personalized recommendations in the absence of historical interaction data.

3.1. GNN-Based Warm-Start Recommendation

In this section, we propose a graph convolutional collaborative filtering framework, PPSM-GCN, which integrates the developed potential positive sample mining strategy. The framework consists of three main components: a GCN-based collaborative filtering model, a sampling strategy, and a loss function. The schematic diagram of the PPSM-GCN framework is shown in Figure 5, and detailed descriptions of each component are provided as follows.

3.1.1. GCN-Based CF Model

Collaborative filtering (CF) is one of the most classical approaches in recommendation systems [18,26,27,28]. Its core idea is to analyze historical interaction data between users and items to mine collaborative signals embedded within the interactions, thereby recommending items that users are likely to be interested in. Traditional CF methods [26,27,28] typically rely only on first-order interaction information to learn user and item embeddings. Although this approach is simple and intuitive, it suffers from significant limitations. Specifically, first-order interactions capture only the direct relationships between users and items, while failing to reflect the higher-order associations among users or among items.

For instance, in the context of IPE book reading among university students, if two students have both read a particular IPE book, they are likely to share similar interests in other books on related topics. However, traditional collaborative filtering methods are unable to effectively exploit such indirect relationships, which limits their ability to improve recommendation accuracy.

To overcome the aforementioned limitations, the PPSM-GCN framework integrates a GCN to fully exploit the high-order interaction information between users and items. Specifically, the GCN models the user–item interaction data as a user–item bipartite graph [18,25] and introduces an embedding propagation mechanism to capture the high-order connectivity among nodes in this bipartite structure.

For example, taking node u in Figure 6, during the embedding propagation process along the graph structure, the embedding information of nodes

a_{1}

,

a_{2}

,

a_{3}

, and

a_{4}

is aggregated into node u. After aggregation, node u obtains a new embedding vector that encodes its first-order neighborhood information. By iteratively performing multiple rounds of embedding propagation, each node in the interaction bipartite graph gradually incorporates richer high-order neighborhood information from its surroundings. These enhanced feature representations enable the model to learn more expressive embeddings, thereby improving the predictive performance of the recommendation model.

In this study, the classical graph convolutional collaborative filtering model LightGCN is adopted as the base collaborative filtering component in the PPSM-GCN framework. Formally, given the embedding vectors at the l-th layer, LightGCN updates the embeddings at the

(l + 1)

-th layer as follows:

e_{u}^{(l + 1)} = \sum_{i \in N_{u}} \frac{1}{\sqrt{| N_{u} | | N_{i} |}} e_{i}^{(l)},

(1)

e_{i}^{(l + 1)} = \sum_{u \in N_{i}} \frac{1}{\sqrt{| N_{i} | | N_{u} |}} e_{u}^{(l)},

(2)

where

e_{u}^{(l)}

and

e_{i}^{(l)}

denote the embedding vectors of user node u and item node i after the l-th layer of embedding propagation, respectively. Equations (1) and (2) correspond to the embedding propagation processes of user u and item i, respectively.

N_{u}

and

N_{i}

represent the sets of neighboring nodes of user u and item i. The term

\frac{1}{\sqrt{| N_{u} | | N_{i} |}}

is a symmetric normalization factor that balances the contributions of different neighbors during aggregation.

After the embedding propagation process, each node in the user–item bipartite graph obtains an embedding vector containing rich neighborhood feature information. As illustrated in Figure 5, these embedding vectors are used to compute the predicted scores of users for both positive and negative samples (the detailed sampling strategy will be introduced in Section 3.1.2). Specifically, we define the similarity between a user and an item as the inner product of their embedding vectors, which represents the user’s predicted preference score for the item:

{\hat{y}}_{u i} = e_{u}^{⊤} e_{i},

(3)

where

e_{u}

and

e_{i}

denote the embedding vectors of user u and item i obtained after the embedding propagation, and

{\hat{y}}_{u i}

represents the predicted preference score of user u for item i.

3.1.2. Sampling Strategy

This section provides a detailed description of the sampling strategies employed in the PPSM-GCN framework, including random negative sampling, traditional positive sampling, and the proposed potential positive sample mining strategy.

Random Negative Sampling. In recommendation systems, we usually have access only to users’ implicit feedback, such as clicks, views, or purchases. These data indicate what users like as positive samples, but do not explicitly reveal what they dislike as negative samples. To enable the recommendation model to learn how to distinguish between “liked” and “disliked” items, it is common practice to sample negative items from the unobserved items, assuming that these items are those users are not interested in. The simplest approach is to randomly sample negative items from the set of non-interacted items [29,30], which is simple but effective. Consequently, we utilize this random negative sampling approach in this paper. By randomly selecting items from the user’s non-interacted data (i.e., the unobserved items shown in Figure 5) as negative samples, we construct pairwise training data alongside the positive samples.

Traditional Positive Sampling. Users typically interact with items based on their own interests; therefore, the core idea of collaborative filtering lies in uncovering user preferences to provide accurate and personalized recommendations. Following this principle, previous studies have generally drawn positive training samples directly from user interaction data (i.e., the observed items shown in Figure 5), which represents a reasonable positive sampling strategy. The traditional positive sampling process illustrated in Figure 5 naturally follows this setting.

However, directly sampling positive examples only from the set of interacted items may fail to fully exploit the potential positive training samples, thus limiting the model’s performance. This limitation arises because there may exist items that users have not yet interacted with but would actually find interesting. To address this issue, we propose a PPSM strategy, which identifies latent positive samples from users’ non-interacted items. This strategy enhances the model’s ability to capture users’ potential interests, thereby improving the recommendation accuracy.

Potential Positive Sample Mining. Interacted items directly reflect a user’s interest preferences; therefore, potential positive samples can be mined from items that are similar to the interacted ones. This constitutes the core idea of the proposed PPSM strategy. In this section, the potential positive sample set for a user u, denoted as

N_{u - pp}

, is defined as follows:

N_{u - pp} = ⋃_{i \in N_{u}} \{f_{K_{pp}} (i, I)\},

(4)

where

N_{u}

denotes the set of neighbor nodes of user u, i.e., all items with which user u has interacted; i represents an item in

N_{u}

; I denotes the set of all items; and

f_{K_{pp}} (i, I)

represents the set of the top

K_{pp}

items most similar to item i. Specifically, for a given interacted item i, we compute its similarity with all other items and select the

K_{pp}

items with the highest similarity as potential positive samples.

The key to determining

N_{u - pp}

lies in how to compute the similarity between an interacted item i and other items. A study on simplifying GNN embedding propagation [31] derived a method for calculating the similarity

ω_{i j}

between two items, such as item i and item j:

G = R^{⊤} R,

(5)

ω_{i j} = \frac{G_{i j}}{g_{i} - G_{i j}} \sqrt{\frac{g_{i}}{g_{j}}},

(6)

where

R \in R^{N \times M}

represents the user–item interaction matrix, and

G \in R^{M \times M}

denotes the item–item co-occurrence matrix, where

G_{i j}

indicates the co-occurrence count of items i and j, and

g_{i}

and

g_{j}

represent the total co-occurrence counts of items i and j, respectively.

Based on Equations (5) and (6), we can compute the top

K_{pp}

most similar items for each interacted item, thereby constructing the potential positive sample set

N_{u - pp}

for user u. By iterating over all interacted items of user u, the complete potential positive sample set

N_{u - pp}

can be obtained.

After obtaining the potential positive sample sets for all users, positive sampling can be performed either from the interacted positive sample set or from the potential positive sample set. Although potential positive samples may represent items of interest to the user, interacted positive samples directly reflect user preferences. Therefore, positive sampling should be biased toward selecting samples from the interacted item set. To this end, a sampling probability

p_{pp}

is introduced to control the likelihood of sampling from either the potential positive samples or the interacted positive samples. For instance,

p_{pp} = 0.2

indicates that items are sampled from the potential positive sample set with probability 0.2, and from the interacted positive sample set with probability 0.8.

Probabilistic and Graph-Theoretic Interpretation of PPSM. The rationale behind mining potential positive samples from item co-occurrence can be formally justified from both probabilistic and graph-theoretic perspectives. (1) From a probabilistic perspective, item co-occurrence reflects the conditional probability that a user interested in one item also likes another. Let

P (y_{u j} = 1 ∣ y_{u i} = 1)

denote the probability that a user u is interested in item j given their interest in item i. This conditional probability can be estimated from collective user behaviors: if items i and j frequently co-occur across many users’ borrowing histories, the empirical estimate of this probability tends to be high. PPSM computes the co-occurrence similarity

ω_{i j}

as a proxy for this conditional probability, identifying from unobserved items those most likely to match users’ latent interests, thereby enriching supervision signals in sparse interaction scenarios. From a graph-theoretic perspective, user–item interactions form a bipartite graph, where user and item nodes are connected via borrowing edges. In this graph, two item nodes are connected through length-2 paths via shared user nodes—the more users that have interacted with both items, the stronger their structural coupling. This coupling implies implicit similarity propagation: if item i is relevant to a user, other items structurally close to i in the bipartite graph (i.e., those with high co-occurrence) are also likely to be relevant. PPSM essentially precomputes this structural proximity, identifying items topologically similar to a user’s historical interactions via the co-occurrence matrix, and treats them as potential positives during training, enabling the model to capture implicit collaborative signals embedded in the graph structure.

Complexity Analysis of PPSM. The procedure of the PPSM strategy is shown in Algorithm 1. The algorithm comprises an offline pre-computation that constructs potential positive sample sets for all users based on item–item similarities derived from co-occurrence patterns. Let

| I_{u} |

denote the number of interacted items of user u, and let

nnz (\cdot)

denote the number of non-zero entries. The overall computational complexity of PPSM includes three main components:

O (\sum_{u} | I_{u} |^{2} + nnz (G) + \sum_{u} | I_{u} | K_{pp}),

which correspond to (i) sparse co-occurrence matrix construction (Step 1 in Algorithm 1), (ii) sparse item–item similarity computation (Steps 6–9) together with top-

K_{pp}

neighbor selection performed on non-zero entries (Steps 10–12), and (iii) potential positive sample aggregation for all users (Steps 13–19), respectively. Since real-world interaction data are highly sparse and

K_{pp}

is a small constant, the complexity scales with interaction sparsity rather than the square of the item catalog size, making PPSM practical for large-scale recommendation scenarios. Moreover, PPSM is executed as an offline preprocessing step and therefore does not introduce additional overhead during model training or inference.

Algorithm 1 Potential Positive Sample Mining (PPSM) Strategy

Input:

R \in R^{N \times M}

: user–item interaction matrix (N users × M items);

K_{pp}

: number of potential positive samples per interacted item

Output: potential positive sample set

N_{u - pp}

for each user u in the training set

1:: Compute item–item co-occurrence matrix $G = R^{⊤} R$ ( $G \in R^{M \times M}$ )
2:: for each item i in $1 \dots M$ do
3:: $g_{i} = \sum (G [i, :])$ (total co-occurrence of i)
4:: end for
5:: Initialize similarity matrix $S$ as zero matrix ( $M \times M$ )
6:: for each item pair $(i, j)$ with $i \neq j$ and $G [i] [j] > 0$ do
7:: Compute similarity $ω_{i j} = \frac{G_{i j}}{g_{i} - G_{i j}} \sqrt{\frac{g_{i}}{g_{j}}}$
8:: $S [i] [j] = ω_{i j}$
9:: end for
10:: for each item i in $1 \dots M$ do
11:: $t o p_K_{pp} [i] = argsort (S [i, :]) [1 : K_{pp}]$ (indices of $K_{pp}$ most similar items)
12:: end for
13:: for each user u in the training set do
14:: $N_{u} = {i ∣ R [u] [i] = 1}$ (interacted positive sample set of user u)
15:: Construct potential positive sample set $N_{u - pp} = \emptyset$
16:: for each item i in $N_{u}$ do
17:: $N_{u - pp} = N_{u - pp} \cup t o p_K_{pp} [i]$ (potential positive sample set of user u)
18:: end for
19:: end for
20:: return $N_{u - pp}$ for each user u in the training set

3.1.3. Loss Function

The Bayesian Personalized Ranking (BPR) loss function is specifically designed for learning personalized user preferences in collaborative filtering [30]. It optimizes model parameters by maximizing the difference between the predicted scores of positive samples and those of negative samples for each user. The objective of the BPR loss is to ensure that users prefer items they have actually interacted with as positive samples over items they have not interacted with as negative samples. Specifically, the BPR loss function can be expressed as:

L = \sum_{(u, i, j) \in D} - \ln (σ ({\hat{y}}_{u i} - {\hat{y}}_{u j})) + λ_{1} | | Θ_{1} {| |}^{2},

(7)

where D denotes the training dataset, consisting of triplets of user u, positive sample i, and negative sample j;

{\hat{y}}_{u i}

and

{\hat{y}}_{u j}

represent the predicted scores of user u for the positive sample i and negative sample j, respectively;

σ

is the sigmoid function, which maps the score difference into the probability space;

λ_{1}

is the regularization parameter; and

∥ Θ_{1} ∥^{2}

denotes the

L_{2}

regularization term on model parameters to prevent overfitting.

In the original BPR loss function, the positive sample i is drawn from the traditional positive sampling described in Section 3.1.2, i.e., randomly sampled from the user’s interacted items. The sampling strategy designed in this work aims to provide the model with higher-quality positive training samples by integrating both traditional positive sampling and the proposed PPSM strategy. Under this setting, the BPR loss function is adjusted as follows:

\begin{matrix} L_{pp} = \sum_{\begin{matrix} (u, i, j) \in D \\ (u, i_{pp}, j) \in D_{pp} \end{matrix}} - \ln σ ((1 - β_{pp}) {\hat{R}}_{u i} + β_{pp} {\hat{R}}_{u i_{pp}} - {\hat{R}}_{u j}) + λ_{1} ∥ Θ_{1} ∥^{2} \end{matrix},

(8)

β_{pp} = \{\begin{matrix} 1, & p_{pp} > rand (0, 1), \\ 0, & p_{pp} \leq rand (0, 1), \end{matrix}

(9)

where

D_{p p}

denotes the training dataset obtained through the PPSM strategy, where

i_{p p}

represents the positive sample derived from the PPSM strategy.

β_{p p}

is a binary variable that controls whether the potential positive sample is used for training, and

p_{pp}

is a hyperparameter representing the probability of using potential positive samples to train the model.

r a n d (0, 1)

denotes a random number uniformly distributed in the interval

[0, 1)

. When

p_{pp} > r a n d (0, 1)

, the model uses the potential positive samples for training, which helps the model uncover the user’s latent interests. Conversely, when

p_{pp} \leq r a n d (0, 1)

, the model relies on traditional positive sampling for training.

3.2. DNN-Based Cold-Start Recommendation

In addition to warm-start users with historical borrowing records, recommendation systems in real-world library settings must also accommodate cold-start users who lack sufficient interaction history. As illustrated in Table 3, the time-based split of our dataset introduces a substantial number of cold-start users in the validation and test sets, making effective cold-start recommendation a critical component of the overall system. To address this challenge, we propose a DNN-based cold-start recommendation model built upon an embedding and MLP architecture [32,33], with its network structure illustrated in Figure 7. This model leverages reader attribute information and book metadata to learn implicit associations between users and items, thereby enabling personalized recommendations even in the absence of historical borrowing records.

3.2.1. Feature Representation

The cold-start recommendation model utilizes two categories of input features: reader-side features and book-side features. For readers, we employ categorical attributes including reader college ID and reader type ID (which distinguishes between undergraduates, master’s students, doctoral students, and faculty members). For books, we use book ID and book name as input features.

The specific encoding methods for each feature are as follows:

(1) Reader College ID and Reader Type ID are discrete categorical features, which are mapped into

d_{1}

-dimensional dense embedding vectors through embedding layers, denoted as

e_{c} \in R^{d_{1}}

and

e_{t} \in R^{d_{1}}

, respectively, enabling the model to capture latent semantic similarities among different reader groups.

(2)Book ID is also treated as a categorical feature and mapped into a

d_{1}

-dimensional embedding vector

e_{b} \in R^{d_{1}}

through an embedding layer.

(3) Book Name contains rich semantic information that can be leveraged to infer book topics and themes. To fully utilize this information, we employ the pre-trained sentence Transformer model paraphrase-multilingual-MiniLM-L12-v2 [34] to encode book names into

d_{2}

-dimensional semantic embedding vectors

e_{n} \in R^{d_{2}}

(in this paper,

d_{1} = 32

,

d_{2} = 384

). These embeddings remain fixed during model training and do not participate in gradient updates.

All embedding vectors are then concatenated to form a unified feature vector:

h_{0} = Concat (e_{c}, e_{t}, e_{b}, e_{n}),

(10)

where

h_{0} \in R^{3 d_{1} + d_{2}}

represents the initial input representation. Substituting the specific values used in this paper, the concatenated dimension is

3 \times 32 + 384 = 480

.

3.2.2. MLP Architecture

The concatenated feature vector is then fed into a multi-layer perceptron to capture complex nonlinear interactions between reader and book features. The MLP architecture consists of three fully connected layers, structured as follows:

(1) First Layer: A linear layer maps the

3 d_{1} + d_{2}

-dimensional input to a

d_{1}

-dimensional hidden representation, followed by a ReLU activation function to introduce nonlinearity:

h_{1} = ReLU (W_{1} h_{0} + b_{1}), W_{1} \in R^{d_{1} \times (3 d_{1} + d_{2})}, h_{1} \in R^{d_{1}} .

(11)

(2) Second Layer: A linear layer maintains the

d_{1}

-dimensional hidden representation, followed by ReLU activation and dropout regularization:

h_{2} = Dropout (ReLU (W_{2} h_{1} + b_{2})), W_{2} \in R^{d_{1} \times d_{1}}, h_{2} \in R^{d_{1}} .

(12)

(3) Third Layer (Output Layer): A linear layer maps the

d_{1}

-dimensional hidden representation to a scalar prediction score:

{\hat{y}}_{u i} = W_{3} h_{2} + b_{3}, W_{3} \in R^{1 \times d_{1}} .

(13)

3.2.3. Prediction and Optimization

After passing through the aforementioned MLP layers, the output value

{\hat{y}}_{u i}

represents the predicted preference score of reader u for book i, reflecting the likelihood that the reader is interested in the target book.

For model optimization, we adopt the InfoNCE loss function [35], which maximizes the similarity of positive sample pairs while minimizing the similarity with negative sample pairs through contrastive learning. Compared with the traditional BPR loss, InfoNCE can utilize multiple negative samples in a single training step, enabling more effective learning of discriminative representations. Specifically, for each positive sample pair

(u, i)

, we randomly sample K negative sample books

j_{1}, j_{2}, \dots, j_{K}

. The InfoNCE loss is defined as:

L_{cold} = - \sum_{(u, i) \in D_{cold}} \log \frac{\exp ({\hat{y}}_{u i} / τ)}{\exp ({\hat{y}}_{u i} / τ) + \sum_{k = 1}^{K} \exp ({\hat{y}}_{u j_{k}} / τ)} + λ_{2} {∥ Θ_{2} ∥}^{2},

(14)

where

D_{cold}

denotes the training data for cold-start scenarios, containing all positive interaction pairs;

{\hat{y}}_{u i}

and

{\hat{y}}_{u j_{k}}

are the predicted scores of reader u for the positive sample book i and the k-th negative sample book

j_{k}

, respectively;

τ

is the temperature coefficient; K is the number of negative samples;

λ_{2}

is the regularization coefficient; and

Θ_{2}

represents all trainable parameters of the model, including the ID feature embedding matrices and MLP weights. The pre-trained book name embeddings

e_{n}

remain fixed during this process and do not participate in gradient updates.

4. Experimental Analysis

4.1. Experimental Setup

4.1.1. Experimental Dataset

As described in Section 2.1, a total of 107,403 IPE book borrowing records, involving 10,206 readers and 9048 books, are used as the data source for the experiments in this section. These records are then divided using a time-based split into a training set (2014–2021), a validation set (2022), and a test set (2023–2024). However, this splitting method inevitably introduces a severe user cold-start problem (see Section 2.1). To address this, we propose a practical hybrid recommendation system, which is designed to leverage a GNN and a DNN for warm-start and cold-start scenarios, respectively. The datasets corresponding to these two scenarios are summarized in Table 3.

4.1.2. Parameter Settings

We adopt a full-ranking evaluation protocol for all methods: for each test user, all un-interacted books are used as candidates. This setting is applied consistently across all compared methods to ensure fairness. All experiments are conducted with 10 different random seeds, and the reported results are averaged over these runs to ensure statistical reliability. To prevent overfitting, we adopt an early stopping strategy: training is terminated when the Recall@20 metric on the validation set does not improve for 10 consecutive epochs. The model checkpoint that achieves the best validation performance is selected for the final testing. All models are implemented using PyTorch 2.6.0 and trained on a single NVIDIA Quadro P4000 GPU with the Adam optimizer. The specific parameter settings for the warm-start and cold-start experiments are detailed as follows.

Parameter Settings for GNN-Based Warm-Start Recommendation. We implement the proposed PPSM-GCN using the official LightGCN codebase [20]. Specifically, the designed PPSM strategy is integrated into LightGCN in a plug-and-play way to assess its effectiveness. The hyperparameter settings for the PPSM-GCN are as follows: the embedding dimension, batch size, learning rate, and

L_{2}

regularization coefficient

λ_{1}

are set to 64, 2048, 0.005, and 0.01, respectively. Furthermore, for the key hyperparameters in PPSM-GCN, we adopt a grid search strategy on the validation set. Specifically, we search for the optimal number of GCN layers L in

{0, 2, 4, \dots, 14}

, the optimal number of potential positive samples

K_{pp}

in

{10, 20, \dots, 50}

, and the optimal sampling probability

p_{pp}

in

{0.1, 0.2, \dots, 0.9}

.

Parameter Settings for DNN-Based Cold-Start Recommendation. As presented in Section 3.2, in the proposed DNN-based cold-start recommendation model, the embedding dimension of categorical features is set to

d_{1} = 32

, and the dimension of the pretrained book name embeddings is

d_{2} = 384

. The MLP architecture consists of three layers with dimensions

480 \to 32 \to 32 \to 1

. In addition, the batch size, learning rate,

L_{2}

regularization coefficient

λ_{2}

, and dropout rate are set to 256, 0.005, 0.0001, and 0.3, respectively. Finally, the optimal temperature coefficient

τ

is tuned within

{1, 2, \dots, 10}

, while the optimal number of negative samples K is selected from

{5, 10, \dots, 40}

via a grid search method.

4.1.3. Performance Indicators

The core of the collaborative filtering chain is to predict the preference ranking of items for a given user, making ranking-based metrics the most commonly used performance indicators. In this paper, Precision@k, Recall@k, and normalized discounted cumulative gain (NDCG@k) are selected as the performance metrics for method comparison. Specifically, Precision@k measures the proportion of items in the recommended list that are of interest to the user, reflecting recommendation accuracy but being insensitive to long-tail items; Recall@k emphasizes the coverage of items that users are truly interested in, indicating the system’s ability to capture user interests and being suitable for sparse data scenarios; NDCG@k evaluates the ranking quality of the recommendation list, highlighting whether items of interest appear near the top, thus emphasizing the correctness of top-ranked recommendations. Overall, following common practice in recommender systems, we evaluate models using ranking-based metrics (Precision@k, Recall@k, and NDCG@k) under a full-ranking protocol instead of classification metrics such as F1-score. The combination of these three metrics provides a comprehensive evaluation of a recommendation system’s performance across accuracy, coverage, and ranking optimization.

The definition of Precision@k is as follows:

P r e c i s i o n @ k = \frac{True Positives}{True Positives + False Positives},

(15)

where k is the length of the recommendation list (default

k = 20

). True positives represent the number of correctly recommended positive items (i.e., items in the recommendation list that the user is indeed interested in), while false positives denote the number of incorrectly recommended items.

The definition of Recall@k is as follows:

R e c a l l @ k = \frac{True Positives}{True Positives + False Negatives},

(16)

where true positives represent the number of correctly recommended positive items, while false negatives denote the number of positive items that are not recommended.

The definition of normalized discounted cumulative gain (NDCG@k) is as follows:

N D C G @ k = \frac{D C G @ k}{I D C G @ k},

(17)

D C G @ k = \sum_{i = 1}^{k} \frac{r e l_{i}}{\log_{2} (i + 1)},

(18)

where

r e l_{i}

represents the relevance score of the i-th item in the recommended list, DCG@k refers to the discounted cumulative gain of the top k items in the recommendation list, and IDCG@k denotes the ideal DCG@k, which is computed by sorting all positive items in descending order of relevance scores and taking the DCG of the top k items.

4.1.4. Baseline Methods

In this work, we aim to build a practical IPE book recommendation system addressing real-world sparsity and cold-start challenges, rather than proposing a SOTA model. Therefore, to efficiently evaluate the effectiveness of the proposed method in the IPE book recommendation scenario, the comparison baselines are constructed based on the following two core settings.

Warm-Start Scenario. The proposed PPSM is a model-agnostic training sample augmentation strategy, whose core contribution lies in providing richer supervision signals for graph convolutional collaborative filtering models by mining potential positive samples. To most directly validate the effectiveness of this strategy, we adopt a “plug-and-play” integration with LightGCN and conduct an ablation study against the original LightGCN.

Cold-Start Scenario. The proposed DNN-based model aims to leverage reader attributes (college, reader type) and book metadata (book ID, book name semantics) for cold-start recommendation. Since collaborative filtering signals are unavailable in coldstart scenarios, we select a group-popularity-based recommendation method, HotRec, as the baseline. HotRec constructs groups based on readers’ college and type, and recommends the most popular books within each group to its users. This baseline setting effectively validates the advantage of the DNN-based model in leveraging fine-grained features for personalized recommendations.

4.2. Warm-Start Experimental Results

4.2.1. Performance Comparison

To evaluate the effectiveness of the proposed potential positive sample mining strategy, we compare PPSM-GCN with the baseline model LightGCN on the IPE book borrowing dataset. As shown in Table 4, PPSM-GCN consistently outperforms LightGCN across all metrics, achieving Precision@20 of 0.0090, Recall@20 of 0.0637, and NDCG@20 of 0.0396, corresponding to relative improvements of 9.8%, 10.0%, and 9.1%, respectively.

The performance gains can be attributed to the introduction of the PPSM strategy. While LightGCN relies solely on explicit interactions as positive samples, it tends to overlook potentially preferred books that are semantically similar but have not been interacted with. PPSM-GCN addresses this limitation by computing item similarities based on co-occurrence patterns and mining the top-K similar items for each interacted item as additional positive samples, effectively alleviating the data sparsity issue inherent in IPE book borrowing scenarios.

The significant improvement in NDCG@20 further demonstrates that PPSM-GCN not only recommends more relevant books but also ranks them accurately at the top of the recommendation list, better aligning with practical application requirements. Experimental results validate that similarity-based potential positive sample mining can effectively enhance the performance of GCN-based models in sparse recommendation scenarios.

4.2.2. Parameter Analysis

In this section, we first investigate the impact of different numbers of GCN layers L on the performance of PPSM-GCN. Then, we study the impact of the proposed PPSM strategy on the model performance, involving the number of potential positive samples

K_{pp}

and the sampling probability

p_{pp}

.

Impact of L. As shown in Table 5, the model performance exhibits a clear trend of first increasing and then gradually decreasing as the number of GCN layers grows. When

L = 0

(i.e., using only ID embeddings without graph convolution), the model achieves the poorest performance with Recall@20 of only 0.0216, highlighting the necessity of graph convolution for capturing collaborative signals. As the number of layers increases, performance improves steadily, peaking at

L = 8

with Recall@20 of 0.0637, NDCG@20 of 0.0396, and Precision@20 of 0.0090. Beyond 10 layers, the model performance begins to decline.

Notably, PPSM-GCN maintains relatively high performance even with deeper architectures (e.g.,

L = 8, 10, 12

), exhibiting resistance to the over-smoothing phenomenon commonly observed in GNNs. This can be attributed to the proposed PPSM strategy, which enriches training signals by introducing semantically similar potential positive samples. These additional supervision signals help preserve embedding discriminability during deeper propagation, allowing the model to benefit from higher-order connectivity information without suffering severe performance degradation. Based on these observations, we set

L = 8

as the default configuration for subsequent experiments.

Impact of

K_{pp}

. We further investigate the effect of the number of potential positive samples

K_{pp}

on model performance. Experimental results in Figure 8 show that as

K_{pp}

increases, model performance first rises and then declines, peaking at

K_{pp} = 20

. This indicates that an appropriate number of potential positive samples can effectively enrich training signals and enhance the model’s ability to capture users’ latent interests. However, introducing too many potential positive samples may introduce false positives, as items with lower similarity are more likely to be irrelevant, thereby diluting effective signals. Therefore, we set

K_{pp} = 20

as the optimal configuration.

Impact of

p_{pp}

. We also examine the influence of the sampling probability

p_{pp}

, which controls the likelihood of sampling from potential positive samples during training. Experimental results reveal a noteworthy phenomenon: as

p_{pp}

gradually increases from 0.1, model performance continuously improves, reaching its peak at

p_{pp} = 0.6

; although performance declines slightly after exceeding 0.6, the results at

p_{pp} = 0.7

and 0.8 still outperform those at lower probability settings. This suggests that when the model favors sampling from potential positive samples during training, it achieves better recommendation performance.

This phenomenon stems from the “topic cluster” characteristic of IPE book borrowing—readers interested in a topic often have latent interest in related books, which historical records fail to fully capture. PPSM leverages item co-occurrence to mine these implicit associations, helping the model discover unborrowed but relevant books. Greater exposure to such samples enables more comprehensive learning of user interests. Conversely, overreliance on actual interactions (low

p_{pp}

) limits recommendations to historically similar items, struggling to expand to other resources within the same topic cluster. Thus, collective co-occurrence patterns effectively supplement individual interest expressions, explaining why higher

p_{pp}

settings yield superior performance.

4.2.3. Case Study

To intuitively demonstrate the effectiveness of the PPSM strategy, we compare the recommendation lists generated by LightGCN and PPSM-GCN for three sample readers. Table 6 presents the ground-truth books borrowed by these readers and the top-20 recommendation lists produced by both models.

Analysis of Ranking Quality. PPSM-GCN consistently achieves better ranking performance across all cases. For Reader 9146, PPSM-GCN successfully places the ground-truth book 6388 at the 20th position, while LightGCN fails to recommend any ground-truth items. For Reader 9369, LightGCN recommends only one ground-truth book (1603 at position 13), whereas PPSM-GCN recommends two (1603 at position 9 and 1435 at position 20). For Reader 9600, LightGCN recommends one ground-truth book (6553 at position 8), while PPSM-GCN recommends two (6553 at position 6 and 8196 at position 15). These cases demonstrate that PPSM-GCN not only retrieves more relevant books but also ranks them higher in the recommendation list.

Why PPSM Improves Ranking Quality. The improved ranking performance can be attributed to the PPSM strategy’s utilization of co-occurrence relationships. PPSM computes item similarities based on the item co-occurrence matrix and identifies items with high co-occurrence with a user’s historically borrowed books as potential positive samples. Although these items have not been borrowed by the target user, they exhibit strong associations with other users’ borrowing behaviors and may align with the user’s latent interests. During training, the model is encouraged to assign higher scores to these co-occurrence-based potential positives, thereby learning to recognize such implicit preferences. In contrast, LightGCN, which relies solely on explicit interactions, lacks the ability to mine co-occurrence relationships from collective behaviors and thus struggles to recommend such items effectively. This case study validates that the PPSM strategy enhances the model’s capability to capture users’ latent interests by leveraging item cooccurrence, ultimately improving recommendation ranking quality.

4.3. Cold-Start Experimental Results

4.3.1. Performance Comparison

To evaluate the effectiveness of the proposed DNN-based cold-start recommendation model, we compare its performance against a group-popularity-based recommendation method (HotRec). As shown in Table 7, the proposed embedding and MLP method consistently outperforms HotRec across all evaluation metrics. Specifically, our method achieves Recall@20 of 0.0596, NDCG@20 of 0.0443, and Precision@20 of 0.0189, corresponding to relative improvements of 7.0%, 12.4%, and 6.2% over HotRec, respectively, demonstrating the effectiveness of leveraging reader attributes and book metadata for cold-start recommendation.

HotRec constructs groups based on college and reader type, achieving group-level personalization, but provides identical recommendations within each group, failing to capture individual interest differences. In contrast, our embedding and MLP method learns personalized representations for each reader–book pair by encoding reader-side (college ID, reader type) and book-side features (book ID, book name embeddings). Reader embeddings capture the group similarities; book ID embeddings memorize book-specific attributes; pre-trained book name embeddings provide semantic information for generalization to unseen books. These features are processed by an MLP to learn complex interactions, generating fine-grained recommendations for cold-start users. The 12.4% improvement in NDCG@20 demonstrates that our method not only recommends more relevant books but also ranks them higher, which is crucial for users with no historical interactions.

In summary, the experimental results validate that the proposed embedding and MLP framework effectively addresses the cold-start challenge by learning personalized preferences from reader attributes and book metadata, significantly outperforming the group-popularity-based baseline.

4.3.2. Ablation Study

To investigate the contribution of each component in the embedding and MLP method to cold-start recommendation performance, we conduct ablation experiments by removing the InfoNCE loss (replacing it with BPR loss), the book name embedding, and the book ID embedding, respectively. The results are shown in Table 8.

As shown in Table 8, removing any component leads to performance degradation, with the removal of book ID embedding causing the most significant drop (Recall@20 from 0.0596 to 0.0523). This indicates that book ID, as a unique identifier, is crucial for memorizing each book’s distinctive attributes and their association patterns with reader groups. Removing InfoNCE loss also results in substantial degradation (Recall@20 to 0.0528), demonstrating that the contrastive learning mechanism is essential for learning discriminative representations. Removing book name embedding yields relatively smaller but still noticeable degradation (Recall@20 to 0.0561), validating the supplementary role of semantic information in enhancing model generalization to unseen books.

In summary, the ablation study validates the necessity of each core component in the embedding and MLP method: book ID embedding provides individual memorization capability, InfoNCE loss offers an effective optimization objective, and book name embedding supplies semantic generalization ability. The synergy of these three components achieves optimal recommendation performance in cold-start scenarios.

4.3.3. Parameter Analysis

In this part, we investigate the impact of two key hyperparameters in the InfoNCE loss function (temperature coefficient

τ

and the number of negative samples K) on the performance of the cold-start recommendation model. The results are reported in Figure 9.

Impact of

τ

. Experimental results reveal that the choice of temperature coefficient critically influences model performance. When

τ = 1

, model performance is relatively low. As the temperature coefficient gradually increases to 6, performance steadily improves and reaches its optimum. This phenomenon can be explained by the role of temperature in balancing the difficulty of contrastive learning—an appropriate value prevents the model from being dominated by hard negatives while maintaining sufficient discrimination between positive and negative samples. Once

τ

exceeds 6, performance begins to decline, indicating that an overly smooth similarity distribution weakens the contrastive signals. Accordingly, we set

τ = 6

as the optimal configuration.

Impact of K. As can be seen in Figure 9, when

K = 5

, model performance is limited. As the number of negatives increases to 25, model performance consistently improves and peaks. This suggests that a sufficient number of negative samples provides richer contrastive information, enabling the model to learn more discriminative representations. However, when K continues to increase beyond 25 to 30 or higher, performance begins to fluctuate and slightly decline, likely due to the introduction of redundant information or noise that complicates optimization and yields diminishing returns. Balancing performance gains with computational efficiency, we adopt

K = 25

as the default setting.

5. Ideological and Political Education Value and Application Significance of the GDNN Model

Based on the real-world application scenario of ideological and political theory book lending in university libraries, this study proposes a hybrid recommendation system, GDNN, which directly addresses several key pain points of ideological and political education in resource supply and learning support. Amid the coexistence of rapid collection expansion and diversified student demands, the system can more stably identify students’ potential reading interests, thereby reducing the cost of information overload and maintaining effectiveness across two typical user groups: those with historical reading records and those without. Accordingly, this paper discusses the application significance of the framework along the chain of “precision supply—deep connection—equity and inclusiveness”.

Promoting the precise supply of ideological and political educational resources, and alleviating information overload and mismatching between resources and users. The superior performance of GDNN across general recommendation scenarios indicates that it can more reliably estimate the user–item correlation under large-scale book resource conditions. The direct value of this improvement is reflected in the following aspects:

It reduces the cost of retrieval and screening for students. With the increasing abundance of ideological and political books, the problem of “easy access but difficult to match” has become a major concern for students. Improved ranking quality can direct students’ limited attention to reading resources that better fit their learning stages and interest structures, thus enhancing the actual reach rate and reading completion rate.

It enhances the structured allocation of resource supply. GDNN classifies users into warm-start and cold-start groups based on the actual characteristics of university students and employs a differentiated modeling strategy. Experimental results demonstrate that GDNN maintains stable performance improvement in cold-start scenarios, suggesting that the DNN model with embedding layers and MLP can learn effective user–book matching patterns merely relying on reader attributes and book metadata. This advantage enables the framework to deliver well-adapted ideological and political book resources for students in cold-start states, such as freshmen, students with major changes, or those transferring between campuses, thereby realizing personalized and precisely targeted supply of ideological and political education resources. For the warm-start scenario, GDNN leverages the embedding propagation mechanism of graph neural networks to extract implicit reading patterns and topic-related correlations from the borrowing relation graph. This enables the recommendation to go beyond simple keyword matching or coarse-grained categorization, aligning more accurately with students’ actual reading habits and cognitive progression in knowledge acquisition. As such, it facilitates the shift of ideological and political education resource supply from universal coverage toward high-precision matching.

Exploring potential reading demands, and enhancing the continuity and depth of ideological and political reading. Ideological and political reading is generally characterized by low frequency but high value. Although some high-quality books have low circulation, they play a vital role in shaping students’ values and improving their theoretical literacy. The GDNN identifies potential positive samples from sparse interaction data through the PPSM strategy, which alleviates data sparsity and improves the performance of interest modeling, making the system more sensitive to students’ unexpressed but potential reading demands. This mechanism avoids excessive bias toward popular titles in recommendation and ensures more reasonable exposure for important long-tail resources such as classic theoretical works, methodological readings, and Party-history-themed books. Meanwhile, by accurately capturing potential reading interests, the model can generate stable topic-progressive recommendation sequences—from introductory books to specialized studies and further to selected readings of original works, which is consistent with the gradual learning pattern of ideological and political education, thus achieving coherent and systematic reading guidance.

Improving the fairness and inclusiveness of ideological and political resource services. Compared with traditional recommendation methods that suffer significant performance degradation on cold-start and weakly interactive users, leading to the Matthew effect of "the more active, the more attended; the more silent, the more neglected", GDNN maintains more stable recommendation performance for freshmen, low-frequency borrowers, and interdisciplinary students, effectively improving the fairness and inclusiveness of ideological and political education resource services.

6. Conclusions

In this paper, we construct a real-world library borrowing dataset for ideological and political education (IPE) books to support personalized recommendation research in higher education scenarios, and propose GDNN, a practical hybrid recommendation system that can conduct both warm-start and cold-start recommendation tasks under a time-based evaluation protocol. The effectiveness of GDNN originates from its differentiated design for the two scenarios: for warm-start users, it adopts a GCN-based collaborative filtering module combined with the PPSM strategy, which strengthens supervision signals by mining highconfidence potential positive samples similar to users’ interacted items, thereby alleviating data sparsity and improving latent interest modeling; for cold-start users, it employs an embedding-and-MLP-based DNN module to learn implicit user–item associations from reader attributes and book metadata, achieving personalized recommendation without historical borrowing records. Experimental results demonstrate that PPSM-GCN and the embedding and MLP method achieve robust performance improvements in warm-start and cold-start scenarios, respectively, providing technical support and practical insights for the precise provision of IPE book resources and the enhancement of educational effectiveness in higher education.

Despite these promising results, this study still has several limitations: it fails to capture the dynamic evolution of users’ reading interests, the single-university dataset restricts the generalizability of the proposed model, and the guidance of ideological and political education is not sufficiently integrated into the recommendation mechanism.

In the future, we will further optimize the GDNN framework by introducing temporal neural networks to model users’ dynamic interests, extending the dataset to multiple universities to verify the model’s generalizability, and integrating the orientation of ideological and political education to build a recommendation model with both personalization and educational value. We will also explore the adoption of more advanced graph-augmented models to refine the graph convolutional architecture and enhance overall system performance.

Author Contributions

Conceptualization, Y.L. and H.L.; methodology, Y.L.; software, Y.L. and H.L.; validation, Y.L., H.L. and S.L.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, H.L. and S.L.; visualization, Y.L.; supervision, H.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under grant number 72401010.

Data Availability Statement

The datasets and code used in this research are available upon request from the corresponding author, subject to reasonable conditions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, J. The quality of education and ideological and political education in China. In Quality Education in China: Retrospective Policy Perspectives; Xue, E., Marginson, S., Li, J., Eds.; Springer Nature: Singapore, 2025; pp. 129–143. [Google Scholar]
Chen, F. Study on ideological and political education. In Proceedings of the International Conference on Education, Management and Information Technology; Atlantis Press: Paris, France, 2015; pp. 243–247. [Google Scholar]
Li, S. Research on College Students’ Ideological Trend from the Perspective of Ideological and Political Education in Institutions of Higher Learning. In Proceedings of the 2018 3rd International Conference on Politics, Economics and Law (ICPEL 2018), Weihai, China, 9–11 October 2018; pp. 285–288. [Google Scholar]
Li, D.; Luo, S. Ideological and political education system based on emotion analysis in large-scale online education. Internet Technol. Lett. 2025, 8, e420. [Google Scholar] [CrossRef]
Asad, M.M.; Suleman, N. Impact of technology-supported personalized learning 5.0 on instructional quality: Insights from the higher education institutions of Pakistan. Qual. Assur. Educ. 2025, 33, 445–461. [Google Scholar] [CrossRef]
Huang, P. Research on Personalized Ideological and Political Education Content Distribution System Based on Intelligent Algorithms. Int. J. High Speed Electron. Syst. 2024, 2540151. [Google Scholar] [CrossRef]
Anwar, K.; Siddiqui, J.; Saquib Sohail, S. Machine learning techniques for book recommendation: An overview. In Proceedings of the International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur, India, 26–28 February 2019; pp. 1291–1297. [Google Scholar]
Devika, P.; Milton, A. Book recommendation system: Reviewing different techniques and approaches. Int. J. Digit. Libr. 2024, 25, 803–824. [Google Scholar] [CrossRef]
Das, K.N.; Bansal, J.C.; Deep, K.; Nagar, A.K. (Eds.) Collaborative filtering for book recommendation system. In Soft Computing for Problem Solving: SocProS 2018; Springer: Singapore, 2019; Volume 2, pp. 325–338. [Google Scholar]
Lin, N. Personalized book intelligent recommendation system design for university libraries based on IBCF algorithm. IEEE Access 2024, 12, 82015–82032. [Google Scholar] [CrossRef]
Jayapal, C.; Gokul, S.; Harshavardhan, S.V. Book recommendation system using hybrid filtering. In Proceedings of the 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 16–17 June 2023; pp. 1–5. [Google Scholar]
Wang, X.; Wang, Y.; Guo, L.; Xu, L.; Gao, B.; Liu, F.; Li, W. Exploring clustering-based reinforcement learning for personalized book recommendation in digital library. Information 2021, 12, 198. [Google Scholar] [CrossRef]
Ng, Y.K. CBRec: A book recommendation system for children using the matrix factorisation and content-based filtering approaches. Int. J. Bus. Intell. Data Min. 2020, 16, 129–149. [Google Scholar] [CrossRef]
Tao, Z.; Huang, J. Research on recommender systems based on GCN. AIP Conf. Proc. 2024, 3194, 030017. [Google Scholar]
Verma, P.; Anil, A. Recommendation system for books using Graph Neural Networks. In Proceedings of the 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Birmingham, UK, 9–12 October 2025; pp. 1–6. [Google Scholar]
Fan, S.; Li, W. Relation-aware Graph Convolutional Networks for Library Book Recommendation. In Proceedings of the 2023 International Conference on Applied Psychology and Modern Education (ICAPME 2023), Online, 20–22 October 2023; p. 79. [Google Scholar]
Sun, T.; Luo, M.; Chen, R.; Xia, Y.; Jiang, N. Rec-clusterGCN: An efficient graph convolution network for recommendation. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Online, 17–20 October 2021; pp. 244–250. [Google Scholar]
Chen, G.; Xia, L.; Huang, C. LightGNN: Simple graph neural network for recommendation. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, Hannover, Germany, 10–14 March 2025; pp. 549–558. [Google Scholar]
Huang, W.; Hao, F.; Shang, J.; Yu, W.; Zeng, S.; Bisogni, C.; Loia, V. Dual-LightGCN: Dual light graph convolutional network for discriminative recommendation. Comput. Commun. 2023, 204, 89–100. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Online, 25–30 July 2020; pp. 639–648. [Google Scholar]
Qian, T.; Liang, Y.; Li, Q.; Xiong, H. Attribute Graph Neural Networks for Strict Cold Start Recommendation. IEEE Trans. Knowl. Data Eng. 2022, 34, 3597–3610. [Google Scholar] [CrossRef]
Chen, H.; Yang, Y.; Bei, Y.; Wang, Z.; Xu, Y.; Huang, F. Graph Neural Patching for Cold-Start Recommendations. In Australasian Database Conference; Springer Nature: Singapore, 2024; pp. 334–346. [Google Scholar]
Hwang, R.; Kim, T.; Kwon, Y.; Rhu, M. Centaur: A chiplet-based, hybrid sparse-dense accelerator for personalized recommendations. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA); IEEE: Piscataway, NJ, USA, 2020; pp. 968–981. [Google Scholar]
Yang, J.; Kim, J.H.; Kim, J.Y. SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models. IEEE Trans. Comput. 2025, 75, 957–970. [Google Scholar] [CrossRef]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
Diaz-Aviles, E.; Drumond, L.; Schmidt-Thieme, L.; Nejdl, W. Real-time top-n recommendation in social streams. In Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin, Ireland, 9–13 September 2012; pp. 59–66. [Google Scholar]
Mao, K.; Zhu, J.; Xiao, X.; Lu, B.; Wang, Z.; He, X. UltraGCN: Ultra simplification of graph convolutional networks for recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 1253–1262. [Google Scholar]
Xu, W.; He, H.; Tan, M.; Li, Y.; Lang, J.; Guo, D. Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), Virtual, China, 25–30 July 2020; pp. 1905–1908. [Google Scholar]
Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’18), London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar]
van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2019, arXiv:1807.03748. [Google Scholar] [CrossRef]

Figure 1. Distribution of reader types for all books.

Figure 2. Distribution of top borrowed IPE book categories.

Figure 3. Comparison of book borrowing categories across different colleges. (a) Book borrowing distribution of students from the College of Humanities and Law; (b) Book borrowing distribution of students from the College of Information Science and Engineering; (c) Book borrowing distribution of students from the College of International Exchange; (d) Book borrowing distribution of students from the School of Foreign Languages.

Figure 4. Comparison of book borrowing categories between graduate and undergraduate students from the College of Agriculture. (a) Graduate students’ borrowing in the College of Agriculture; (b) Undergraduate students’ borrowing in the College of Agriculture.

Figure 5. The framework of the proposed PPSM-GCN.

Figure 6. Schematic diagram of the embedding propagation mechanism.

Figure 7. The network structure of the proposed DNN-based cold-start recommendation model.

Figure 8. Performance comparison with different

K_{pp}

and

p_{pp}

.

Figure 8. Performance comparison with different

K_{pp}

and

p_{pp}

.

Figure 9. Performance comparison with different

τ

and K.

Figure 9. Performance comparison with different

τ

and K.

Table 1. Classification codes for IPE-related books.

Category	Classification Codes
Category A	A1, A2, A3, A4, A49, A5, A7, A8
Category B	B-4, B0, B1, B2, B3, B4, B5, B6, B7, B80, B81, B82, B83, B84, B9
Category C	C0, C1, C2, C3, C4, C5, C91, C92, C93
Category D	D0, D1, D2, D33/37, D4, D5, D6, D73/77, D8, D9, DF
Category F	F0, F1, F2, F3, F4, F49, F5, F59, F6, F7, F8
Category G	G0, G1, G2, G4

Table 2. Statistics at each step of the data processing flow.

Data Source	Borrowing Records	Readers	Books
Raw Data	1,476,944	73,418	184,193
Cleaned Raw Data	1,351,336	73,387	184,158
IPE Borrowing Records	235,532	38,717	46,279
IPE Recommendation Dataset	107,403	10,206	9048

Table 3. Data split of the IPE recommendation dataset.

Data Split	Warm-Start		Cold-Start
Data Split	Borrowing Records	Readers	Borrowing Records	Readers
Train	98,649	9710	98,649	9710
Valid	2945	825	1044	265
Test	1625	486	3140	465

Table 4. Performance comparison with LightGCN. The results are reported as the mean and standard deviation over 10 runs with different random seeds. RI stands for relative improvement of the proposed PPSM-GCN to LightGCN. The p-values are obtained from paired t-tests, confirming that all improvements are statistically significant (

p < 0.001

).

Table 4. Performance comparison with LightGCN. The results are reported as the mean and standard deviation over 10 runs with different random seeds. RI stands for relative improvement of the proposed PPSM-GCN to LightGCN. The p-values are obtained from paired t-tests, confirming that all improvements are statistically significant (

p < 0.001

).

Model	Recall@20	NDCG@20	Precision@20
LightGCN	$0.0579 \pm 0.0013$	$0.0363 \pm 0.0010$	$0.0082 \pm 0.0003$
PPSM-GCN	$0.0637 \pm 0.0022$	$0.0396 \pm 0.0006$	$0.0090 \pm 0.0002$
RI over LightGCN	10.0%	9.1%	9.8%
p-value	<0.001	<0.001	<0.001

Table 5. Performance comparison across different numbers of GCN layers L. The best performance for each metric is highlighted in bold.

L	0	2	4	6	8	10	12	14
Recall@20	0.0216	0.0540	0.0620	0.0630	0.0637	0.0639	0.0614	0.0604
NDCG@20	0.0136	0.0338	0.0381	0.0389	0.0396	0.0387	0.0361	0.0358
Precision@20	0.0033	0.0076	0.0087	0.0089	0.0090	0.0089	0.0083	0.0082

Table 6. Comparison of recommendation ranks for case users. GT (ground truth) denotes the books that the users actually borrowed in the test set, while LightGCN and PPSM-GCN represent the recommendation lists generated by the two methods, respectively. The bolded book IDs indicate the books that are successfully hit by the recommendation lists of LightGCN or PPSM-GCN.

Reader ID	Book Type	Book IDs
9146	GT	5211, 5704, 6388, 7911, 8056, 8093, 8649, 9019
	LightGCN	2039, 1355, 334, 502, 1603, 914, 613, 6408, 6553, 3323, 123, 6419, 1096, 313, 72, 276, 17, 681, 514, 1197
	PPSM-GCN	1355, 2039, 1603, 334, 6553, 6419, 3323, 6408, 8196, 502, 6351, 123, 8526, 17, 8004, 1197, 276, 8328, 914, 6388
9369	GT	1435, 1603, 3102, 5835, 8993
	LightGCN	217, 59, 130, 1355, 1406, 203, 69, 1792, 206, 585, 368, 253, 1603, 292, 1051, 81, 496, 240, 432, 395
	PPSM-GCN	217, 59, 130, 1011, 97, 203, 2295, 69, 1603, 1355, 1333, 432, 292, 395, 1051, 206, 2373, 337, 2658, 1435
9600	GT	6553, 8196, 8262
	LightGCN	1355, 2039, 334, 1603, 502, 6408, 6419, 6553, 914, 3323, 123, 313, 1096, 276, 613, 1222, 17, 681, 6351, 5143
	PPSM-GCN	2039, 1355, 334, 1603, 6419, 6553, 3323, 502, 276, 6408, 6351, 123, 8526, 17, 8196, 7621, 313, 1197, 891, 8004

Table 7. Performance comparison with hot recommendation method (HotRec). In particular, the results of our embedding and MLP method are reported as the mean and standard deviation over 10 runs with different random seeds. RI stands for the relative improvement of our method to HotRec.

Method	Recall@20	NDCG@20	Precision@20
HotRec	0.0557	0.0394	0.0178
Embedding and MLP	$0.0596 \pm 0.0032$	$0.0443 \pm 0.0018$	$0.0189 \pm 0.0006$
RI over HotRec	7.0%	12.4%	6.2%

Table 8. Ablation study of different components in our embedding and MLP method. w/o InfoNCE denotes replacing the InfoNCE loss with BPR loss, w/o BN emb denotes removing the book name embedding, and w/o ID emb denotes removing the book ID embedding.

Method	Recall@20	NDCG@20	Precision@20
Embedding & MLP	0.0596	0.0443	0.0189
w/o InfoNCE	0.0528	0.0380	0.0171
w/o BN emb	0.0561	0.0408	0.0179
w/o ID emb	0.0523	0.0396	0.0170

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, Y.; Liu, H.; Liu, S. GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education. Electronics 2026, 15, 1086. https://doi.org/10.3390/electronics15051086

AMA Style

Liang Y, Liu H, Liu S. GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education. Electronics. 2026; 15(5):1086. https://doi.org/10.3390/electronics15051086

Chicago/Turabian Style

Liang, Yanli, Hui Liu, and Songsong Liu. 2026. "GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education" Electronics 15, no. 5: 1086. https://doi.org/10.3390/electronics15051086

APA Style

Liang, Y., Liu, H., & Liu, S. (2026). GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education. Electronics, 15(5), 1086. https://doi.org/10.3390/electronics15051086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GDNN: A Practical Hybrid Book Recommendation System for the Field of Ideological and Political Education

Abstract

1. Introduction

2. Experimental Data

2.1. Data Collection and Processing

2.2. Data Statistics and Visualization

3. Methodology

3.1. GNN-Based Warm-Start Recommendation

3.1.1. GCN-Based CF Model

3.1.2. Sampling Strategy

3.1.3. Loss Function

3.2. DNN-Based Cold-Start Recommendation

3.2.1. Feature Representation

3.2.2. MLP Architecture

3.2.3. Prediction and Optimization

4. Experimental Analysis

4.1. Experimental Setup

4.1.1. Experimental Dataset

4.1.2. Parameter Settings

4.1.3. Performance Indicators

4.1.4. Baseline Methods

4.2. Warm-Start Experimental Results

4.2.1. Performance Comparison

4.2.2. Parameter Analysis

4.2.3. Case Study

4.3. Cold-Start Experimental Results

4.3.1. Performance Comparison

4.3.2. Ablation Study

4.3.3. Parameter Analysis

5. Ideological and Political Education Value and Application Significance of the GDNN Model

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI