Next Article in Journal
Re-Evaluating Deep Learning Attacks and Defenses in Cybersecurity Systems
Previous Article in Journal
An Intelligent Self-Validated Sensor System Using Neural Network Technologies and Fuzzy Logic Under Operating Implementation Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Study of Filtering Methods for Scientific Research Article Recommendations

1
LISAC Laboratory, Department of Informatics, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
2
Informatics and Applications Laboratory, Science Faculty of Meknes, Moulay Ismaïl University, Meknes 50050, Morocco
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Big Data Cogn. Comput. 2024, 8(12), 190; https://doi.org/10.3390/bdcc8120190
Submission received: 1 October 2024 / Revised: 22 November 2024 / Accepted: 29 November 2024 / Published: 16 December 2024

Abstract

Given the daily influx of scientific publications, researchers often face challenges in identifying relevant content amid the vast volume of available information, typically resorting to conventional methods like keyword searches or manual browsing. Utilizing a dataset comprising 1895 users and 3122 articles from the CI&T Deskdrop collection, as well as 7947 users and 25,975 articles from CiteULike-t, we examine the effectiveness of collaborative filtering and content-based and hybrid recommendation approaches in scientific literature recommendations. These methods automatically generate article suggestions by analyzing user preferences and historical behavior. Our findings, evaluated based on accuracy (Precision@K), ranking quality (NDCG@K), and novelty, reveal that the hybrid approach significantly outperforms other methods, tackling some challenges such as cold starts and sparsity problems. This research offers theoretical insights into recommendation model effectiveness and practical implications for developing tools that enhance content discovery and researcher productivity.

1. Introduction

In today’s digital age, which began in the late 20th century, the volume of scientific literature published online is growing at an unprecedented rate. Between 2000 and 2020, global scientific output has seen remarkable growth across major research-producing nations. China’s scientific publication output increased from 53.88 million in 2000 to 528.54 million in 2018, demonstrating the country’s rapid emergence as a research powerhouse. The United States maintained its strong position with approximately 450.32 million publications in 2018, while the European Union collectively produced around 489.67 million publications in the same year. Other significant contributors include the United Kingdom (108.21 million), Germany (103.54 million), and Japan (95.87 million) in 2018 [1] More recent data from Scopus [2] show that in 2022, China surpassed the United States in total number of publications produced, accounting for 23.4% of global research output, followed by the United States at 17.8% and the European Union at 14.6%. This trend reflects the shifting landscape of global scientific research and knowledge production. This explosion of available information poses significant challenges for researchers, who must navigate this vast expanse to find relevant studies that inform their research, validate their hypotheses, or situate their findings within the broader scientific landscape to contrast them with existing studies, identify trends, and place their work within the context of what is already known in their field. Traditional methods of discovering pertinent literature, such as keyword searches [3,4], browsing conference proceedings, and reading journal publications, are becoming increasingly inadequate. The vast amount of information makes it difficult to efficiently locate high-quality resources, thus hindering the research process. This is where recommendation systems [5,6,7] come into play, offering a potential solution to streamline the discovery of relevant publications [8] and enhance the research process. These tools, initially designed to predict user preferences and make suggestions in domains like e-commerce, music, and movies, have proven to be invaluable in academia as well. For example, in e-commerce, recommendation systems have been instrumental in increasing sales, such as on Amazon, where it is estimated that 35% of its sales are generated by its recommendation engine. Similarly, in the academic sphere, these systems can recommend relevant articles, journals, and conferences, thereby facilitating more efficient literature reviews and helping researchers stay updated with the latest developments in their fields. By leveraging algorithms tailored for academic content, recommendation systems can significantly enhance the efficiency and effectiveness of the research process.
These systems rely on two main types of information: features and user–item interactions. Features consist of unique attributes of the items, such as keywords and categories, as well as user-specific information like preferences and profiles. User–item interactions include various types of engagement, such as views, likes, comments, and the number of purchases. Based on these factors, two various classifications of recommendation systems highlighted: classical classification and the classification of Rao and Talwar. In this study, we concentrated on classical classification. Both methods are described below:
  • Classical Classification [9]: This widely known classification divides recommendation systems into three main methods (which will be discussed in detail throughout this article):
    Collaborative filtering (CF) [10,11]: This method predicts user preferences based on the preferences of other users with similar tastes.
    Content-based filtering (CB) [12,13,14]: This method recommends items based on the attributes of the items themselves and a user’s past interactions with similar items.
    Hybrid filtering: This method combines both CF and CB techniques to capitalize on the strengths of each, resulting in more precise recommendations.
  • Rao and Talwar’s Classification [15]: This method expands on classical classification by introducing additional categories and categorizing systems based on the source of information used. In addition to the collaborative filtering, content-based filtering and hybrid filtering, we find:
    Demographic Filtering [16]: This method uses demographic information about users to make recommendations.
    Knowledge-Based Filtering [17]: This method uses specific domain knowledge and user requirements to recommend items.
    Community Filtering: Also called social RSs [18], this method enhances personalized recommendations by incorporating social relationships. However, the impact of social relationships on recommendation accuracy in niche domains is not well understood, and methods often ignore implicit social influences and their evolution over time. Key research areas include the influence of community interactions, best practices for integrating community data, balancing explicit and implicit social influences, and maintaining scalability while preserving social relationship information.
This study is structured as follows: Section 2 provides a general overview of recommendation systems (RSs), including their historical developments, areas of application, research efforts related to recommending scientific articles, and an explanation of the main recommendation models. Afterward, Section 3 describes in detail the techniques used in three recommendation algorithms that were compared in terms of performance. The experiments conducted and the results obtained, along with their interpretations, are presented in Section 4. Finally, Section 5 concludes the study and outlines directions for future work.

2. Related Works

A recommendation system [19] is a type of information filtering system designed to predict the “rating” or “preference” that a user might give to a particular item. The capability of computers to make recommendations was recognized early in computing history. In 1979, Grundy [20] introduced a librarian system as one of the first steps toward automatic RSs. This system, although primitive, classified users into “stereotypes” based on a brief interview and used these stereotypes to recommend books. Despite its limited use, this was an intriguing initial attempt in the field of recommendation systems. In the early 1990s, CF emerged as a solution to information overload. In 1992, Belkin and Croft [21] analyzed information filtering and retrieval, identifying the former as crucial for RSs and the latter for search engines. Goldberg et al. [22] introduced Tapestry, the first collaborative filtering system, inspiring Massachusetts Institute of Technology (MIT) and University of Minnesota (UMN) researchers to create GroupLens [23], a news recommendation service using user–user collaborative filtering. Prof. John Riedl’s GroupLens lab at UMN pioneered RS research. Similar technologies were applied to music and video by Ringo and Video RSs [24,25]. The commercial potential of recommendation systems led to the founding of Net Perceptions [26] in 1996, which served clients like Amazon. Schafer et al. [27] highlighted how these systems boost e-commerce sales. In 1997, the GroupLens lab launched MovieLens [28], releasing influential datasets for recommendation studies. Before 2005, collaborative filtering techniques dominated the field, including user–user [29,30], item–item [31,32], and SVD-based methods [33]. The Netflix Prize (2006–2009) spurred interest in matrix factorization models [34,35] and user-centric evaluation metrics [36]. The first ACM RSs Conference [37] was held in 2007, marking the importance of RS research. Richardson et al. [38] introduced a logistic regression model improving click-through rate estimation. Subsequent advancements included factorization machines (FMs) [39] and field-aware factorization machines (FFMs) [40]. Researchers emphasized user experience, leading to user-centric evaluation frameworks [41,42]. Since 2016, deep neural network-based recommendation models have emerged, enhancing app and video recommendations. Notable models include Wide&Deep [43], DeepFM [44], and YouTubeDNN [45]. Open benchmarks like FuxiCTR [46] and reproducible evaluation metrics [47,48] have been developed to address reproducibility issues. Recent research focuses on addressing biases in recommendation systems (RSs) through causal inference [49,50]. Courses like Thorsten’s counterfactual machine learning [51] leverage these concepts by using models that answer “what-if” questions to understand how changes in input data impact recommendations, thereby identifying and reducing biases.
Regarding some research related to scientific article recommendations, we find this table (Table 1) presents a chronological overview of significant contributions in the field from 2019 to 2023. The table encompasses both comprehensive survey papers and individual research works that have advanced the state of scientific paper recommendation systems. From foundational approaches using document embeddings and topic modeling to more sophisticated methods incorporating heterogeneous networks and knowledge graphs, the table captures the evolution of recommendation techniques. Recent works have increasingly focused on hybrid approaches, combining multiple data sources and advanced machine learning methods. The table is structured to highlight each study’s year, authors, primary purpose, methodological approach, and key findings, providing a systematic view of the field’s development over the past five years.
Recommendation models encompass various approaches to predict user preferences:

2.1. Collaborative Filtering (CF)

CF methods [64] make automatic predictions about a user’s interests by collecting preference or taste information from many users. It assumes that users who have shown similar preferences or agreed on certain items in the past are likely to have similar preferences in the future. Therefore, it leverages user behavior data to make recommendations. It pioneered technology in RSs and remains simple and effective. The CF process involves three stages: collecting user information, creating a matrix to calculate user associations, and providing reliable recommendations. Major companies like YouTube, Netflix, and Spotify heavily rely on such systems. CF systems are generally divided into two subcategories: memory-based and model-based.

2.1.1. Memory-Based

Memory-based algorithms [65] utilize the entire user rating database to generate predictions by leveraging user-to-user and item-to-item similarities or correlations. These algorithms are widely employed in many commercial systems due to their straightforward implementation and ability to provide accurate recommendations efficiently. Various statistical techniques such as Pearson correlation, cosine similarity, Euclidean distance, and the Jaccard measure can be used to estimate similarities between users or items.

2.1.2. Model-Based

Model-based algorithms [65] rely on developing a predictive model from the user rating data to make recommendations. Instead of using the entire database directly, these algorithms create a model that can generalize the data and identify patterns. For instance, techniques like matrix factorization, including singular value decomposition (SVD), discern latent features such as user preferences for genres or item popularity, effectively generalizing user–item interaction data. generalize user–item interaction data. Similarly, clustering methods group users with similar behaviors, allowing the identification of common interests in niche categories. Unlike memory-based approaches, model-based CF methods often require more computational resources and training time but can be more accurate and scalable, particularly with large datasets. Common techniques used in model-based CF include the following:
  • Matrix Factorization [66,67]: Involves techniques like Singular Value Decomposition (SVD) and Alternating Least Squares (ALS) to decompose the user-item interaction matrix into smaller matrices. This helps uncover hidden factors that explain observed interactions.
  • Bayesian Networks [68]:These probabilistic models represent dependencies among variables (users and items) and use these relationships to make predictions.
  • Clustering [69]: Users or items are grouped into clusters based on their similarities, and recommendations are made based on the preferences of the cluster members.
  • Markov Decision Processe [70]: These models consider the sequences of user interactions and make recommendations by predicting the next item in a user’s sequence.
  • Machine Learning Techniques [71]: Algorithms such as neural networks, decision trees, and support vector machines can be trained on user-item interaction data to predict user preferences.
In summary, CF methods predict user preferences by analyzing user data, making them foundational in recommendation systems. Their simplicity and effectiveness ensure they remain widely used on popular platforms.

2.2. Content-Based Filtering (CB)

This approach relies on comparing items with the past preferences of a specific user. Items that have similar features to those the user’s profile has previously liked or viewed are likely to be recommended. Content-based recommendation systems use various methods to analyze and compare the features of items and user profiles (see Figure 1).

2.3. Hybrid Recommendation Systems

CF and CB recommendation methods, previously discussed in detail, have strengths and weaknesses that affect their performance across different scenarios (See Table 2). Understanding these characteristics is crucial for selecting the appropriate approach for a given application. To leverage their advantages while addressing their limitations, hybrid recommendation systems were developed. These systems integrate CF and CB methods, leading to more accurate, diverse, and robust recommendations. For instance, the cold start problem in CF can be addressed by incorporating content-based methods. In content-based filtering, recommendations rely solely on item descriptions, eliminating the need for an item to be rated by multiple users before it can be recommended, as required in CF. According to Robin D. Burke [72], there are different ways to build hybrid recommendation systems:
  • Weighted: Assigns weights to each of the methods, combining their scores into a single recommendation score based on the weighted sum.
  • Switching: Alternates between different recommendation methods depending on specific criteria like user type, item characteristics, or the recommendation context.
  • Mixed: Independently generates recommendations using various methods and then merges the results into a unified list presented to the user.
  • Cascade: Uses one recommendation method to produce an initial list and refines it with another method. For example, CF can create a broad list, which is then fine-tuned by content-based filtering.
  • Feature augmentation: Enhances one recommendation method by incorporating additional features derived from another. For example, user similarity scores from CF can improve content-based filtering.
  • Meta-level: Utilizes the model output from one recommendation technique as input features for another. For example, the results of a content-based model can serve as features in a CF model.

3. Proposed Methods

This study uses a quantitative research approach to evaluate and compare the performance of three recommendation algorithms: collaborative filtering (CF), content-based (CB) methods, and hybrid methods. The effectiveness of these algorithms is systematically measured and analyzed using numerical metrics like recommendation accuracy, ranking quality, and novelty. Below, we describe the specific techniques used for each algorithm:
  • Collaborative Filtering: For this method, we employed a model-based approach, specifically using the singular value decomposition (SVD) algorithm [73], as it is a robust and reliable choice for building high-quality CF recommendation systems. The SVD algorithm (see Figure 2) decomposes the user-item interaction Y matrix into three matrices U R m × p , Σ R p × p , and V T R p × n that capture the latent factors representing users and items.
    Y = U Σ V T
    where U and V are the orthogonal matrices and Σ is the singular orthogonal matrix with non-negative elements and contains all the singular values of Y. This decomposition allows the model to make predictions by approximating the original interaction matrix using these latent factors, thereby improving the accuracy and scalability of the recommendations.
  • Content-Based Filtering: Since the task involves recommending scientific articles, we decided to use common techniques from the fields of information retrieval and text mining. Specifically, we used Term Frequency–Inverse Document Frequency (TF-IDF) [74] to convert text data into numerical vectors and extract features, where each word is represented by a position in the vector, and the value indicates the relevance of that word to a particular article. We calculate the similarity between the articles because all the elements will be represented in the same vector space model. The weight W ( t , d ) of term t in document d is given by:
    W t , d = T F t , d log N D F t
    where T F t , d is the number of occurrences of t in document d, D F t is the number of articles containing the term t, and N is the total number of articles in the corpus. We used cosine similarity to measure the similarity between the vectors generated by TF-IDF.
  • Hybrid Filtering: Concerning this approach, we used the dynamic weighted hybridization method (see the algorithm flowchart in Figure 3). This technique combines the recommendations from both CF and CB by assigning different weights ( w C F and w C B ), we can adjust them based on performance evaluation until convergence, resulting in final recommendations. This flexibility is particularly useful when dealing with diverse datasets, as it enables us to tailor the hybrid model to better suit the specific characteristics of the data. We update weights using the gradient of the loss function as follows:
    L o s s = u , i T r a i n S e t r u i S c o r e u , i 2
    where r u i is the actual rating of user u to item i, and S c o r e u , i is the combined score for user u and item i generated by the hybrid RS.

4. Experimental Section

4.1. Experimental Settings and Datasets

The datasets are split into 80% for training and 20% for testing, ensuring the model learns effectively by having sufficient data to identify patterns and behaviors. This split also allows for realistic performance evaluation on unseen data. Cross-validation techniques were used to enhance robustness. For the hybrid model, a learning rate of 0.001 was selected, and the Adam optimizer was employed with a batch size of 64. Training was conducted for 100 epochs with early stopping applied if validation performance did not improve after 10 iterations.
In this study, we used two datasets: “CI&T Deskdrop” and “Citeulike-t”. The first dataset includes a real sample of logs spanning a year (from March 2016 to February 2017) derived from CI&T’s in-house communication system, known as DeskDrop. The second dataset is collected from CiteULike and Google Scholar. It includes various data files representing user–article interactions and article content. Table 3 provides a statistical summary of these datasets.

4.2. Metrics of Evaluation

In this study, the recommenders use implicit feedback to estimate preferences from user interactions such as views, clicks, likes, and comments. Implicit feedback refers to data indirectly reflecting user preferences, gathered unobtrusively rather than through explicit ratings or reviews [75]. This approach focuses on ranking metrics commonly used in information retrieval settings. In the context of recommendation systems (RSs), the goal is to recommend the most relevant items based on the user’s preferences and past behavior. Therefore, it is more appropriate to compute precision and recall for the highest-ranked N items, giving rise to the concepts of precision at k and recall at k, where k is a user-defined integer aligned with the top-N recommendations objective.
It is important to note that Precision at k (Precision@k) represents the proportion of relevant items within the top-k recommendations, while Recall at k (Recall@k) measures the proportion of relevant items identified among these top-k recommendations. Mathematically, Precision@k and Recall@k are defined as follows:
Precision @ k = Recommended items @ k that are relevant Recommended items @ k
Recall @ k = Recommended items @ k that are relevant Total number of relevant items
The recommendation system outputs a ranked list of articles, making it essential to consider the order in which these articles are presented. Therefore, we used NDCG at k, defined as follows:
N D C G @ k = 1 I D C G × i = 1 k 2 r i 1 log 2 ( i + 1 )
where r i is the relevance rating of the document at position i. IDCG is set so that the perfect ranking has an NDCG value of 1. In our problem, r i is 1 if the document is recommended correctly.
Accuracy is important, but focusing solely on it often results in less satisfactory recommendations. To address this, we also use non-accuracy metrics such as the novelty of the recommended items. Novelty assesses how unfamiliar the recommended items are to a user. For a given user u, Novelty is defined as the ratio of unknown items in the list of top-N recommended items:
N o v e l t y ( u ) = i R ( 1 k n o w n s ( u , i ) ) R
where R is the set of top-N recommended items and k n o w n s ( u , i ) is a binary function that returns 1 if user u already knows item i, and 0 otherwise.

4.3. Results

The experiments conducted in this study yielded various results, which are summarized in three figures (Figure 4, Figure 5 and Figure 6) and two tables (Table 4 and Table 5).

4.4. Discussion

To identify the most suitable hybridization technique for recommender systems, we conducted an ablation study, summarized in Table 4. Key findings include the following:
  • Switching and Cascade techniques: NDCG@5 values range from 0.180 to 0.293, showing varying performance across systems.
  • Traditional Weighted technique: NDCG@5 values range from 0.183 to 0.305, indicating performance variability.
  • Dynamic Weighted technique: Highest NDCG@5 values ranging from 0.200 to 0.312, suggesting superior effectiveness.
For Novelty@5 and Novelty@10, the Dynamic Weighted technique generally performs best, indicating its potential for recommending novel items. Overall, the results highlight that the choice of hybrid technique significantly impacts performance metrics, with the Dynamic Weighted technique emerging as a promising hybridizing technique for enhancing both relevance and novelty in recommender systems.
Figure 4 and Figure 5 present the Precision@k and Recall@k evaluations for three recommendation systems (collaborative filtering (CF) [64], content-based (CB) [76], and hybrid [72]) using two datasets: CI&T Deskdrop and Citeulike-t. All three methods generally exhibit higher precision values for the CI&T Deskdrop dataset than the Citeulike-t dataset. The Hybrid method has the best scores across both datasets. Moreover, CB surpasses CF in both datasets, indicating that content-based recommendations are more potent than collaborative filtering in these scenarios. There is a noticeable decrease in precision from @5 to @10 across all methods and datasets, implying that while the top 5 recommendations are relatively precise, expanding to the top 10 introduces less relevant items. Similar observations can be made for recall, with the key difference being that the recall value increases from @5 to @10.
Regarding the quality of the ranking of the recommended items (NDCG), the hybrid approach consistently delivers the best performance across both datasets (see Table 5), followed by CB. CF shows lower performance, especially in the highly sparse Citeulike-t dataset (99.93%) compared to the CI&T Deskdrop dataset (7.99%) (refer to Table 3), highlighting the limitations of CF in such scenarios, which might require more data or additional techniques to handle sparsity [77] effectively.
Figure 6 shows that novelty decreases as the number of recommended items (Top-N) increases, since including more items often add more popular ones, reducing overall novelty. Examining the impact of dataset characteristics, we found that the Citeulike-t dataset, with higher sparsity, has lower novelty scores than the CI&T Deskdrop dataset, indicating that sparsity hinders the ability to provide novel recommendations due to fewer interaction data. In addition, the hybrid approach offers the best balance between relevance and novelty, making it the most effective for diverse recommendations, including many that may be unfamiliar to the user. Collaborative filtering, which outperforms content-based filtering in novelty, is useful when novel recommendations are important. In contrast, content-based filtering is beneficial when recommendations need to align closely with the user’s past interactions.
Despite the differing densities of the two datasets, the empirical findings reveal that the hybrid recommendation method consistently surpasses both collaborative filtering (CF) and content-based (CB) methods across a variety of metrics, underscoring its proficiency in addressing sparsity and cold start [78] issues. The superior precision and recall performance of hybrid methods suggests that the integration of collaborative and content-based strategies harnesses the strengths of both, resulting in more precise and comprehensive recommendations. Additionally, the balanced performance of the hybrid approach in terms of ranking quality (NDCG), and novelty indicates its capability to provide a versatile solution that emphasizes relevance while also introducing new and diverse items to users.
On the other hand, content-based methods exhibit strong potential, particularly in sparse datasets, by utilizing specific user preferences and content attributes to generate more relevant recommendations than CF. This is evident in CB’s consistent outperformance over CF across both datasets. However, CF shows limitations in handling highly sparse data, as seen with the Citeulike-t dataset, which may require additional techniques or more data to be effective. These observations collectively emphasize the need for a nuanced approach in recommendation systems, where hybrid methods offer a robust solution for diverse and accurate recommendations, while CB methods can be particularly useful in scenarios with limited interaction data. Nonetheless, further experiments and modifications to CF can still be conducted to assess its performance, especially in such a scenario.
In summary, our study advances recommendation systems by showing how hybrid approaches effectively combine collaborative filtering and content-based methods, improving not only the accuracy, relevance, and novelty of article recommendations but also addressing common challenges such as cold start and sparsity problems. These insights guide the development of more sophisticated tools that enhance content discovery and researcher productivity, making it easier for researchers to navigate the vast array of scientific literature [61,62].
Future research can explore the computational efficiency of hybrid algorithms in real-world applications. Additionally, our study provides a foundation for developing improved hybrid solutions that can be adapted for various sectors, including scientific research and business [79], where personalized content recommendations are increasingly valuable.

5. Conclusions

In this paper, we conducted a comparative study of the three most commonly used approaches in recommendation systems, using multiple metrics. Our experiments revealed that for article recommendations, the hybrid method outperformed content-based filtering and collaborative filtering in terms of relevance, ranking quality, and novelty, recommending more unknown articles to users. The hybrid method’s effectiveness stems from its combination of the two approaches, which helps address common issues such as the cold start and sparsity problems in RSs. It is important to acknowledge that this area is in a state of constant evolution, and conducting an exhaustive examination of all recommendation algorithms is impractical. The content provided here provides an overview of several existing approaches in recommendation. While comprehensive, it is naturally constrained by the selection of methods, and datasets. Moreover, the study does not consider the computational complexity and scalability of the algorithms in real-world applications. There is ample room for further enhancements and experimentation with alternative algorithms. Future research could focus on identifying the most effective algorithms, ultimately aiming to create a more efficient hybrid solution. This study highlights the potential of hybrid methods to overcome individual limitations of traditional approaches. Practically, it offers guidance for building more effective and user-centric recommendation systems tailored to academic content.

Author Contributions

Conceptualization, D.E.A.; methodology, D.E.A. and J.R.; supervision, J.R., software, D.E.A.; writing—original draft, D.E.A.; writing—review and editing, D.E.A.; project administration, H.T. and A.Y.; validation, A.S., B.A. and A.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

We used the CI&T Deskdrop (https://www.kaggle.com/datasets/gspmoreira/articles-sharing-reading-from-cit-deskdrop, accessed on 28 November 2024), and Citeulike-t (https://github.com/js05212/citeulike-t, accessed on 28 November 2024) datasets.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. National Science Foundation. Science and Engineering Indicators 2020; Technical Report NSB-2020-6; National Science Board: Alexandria, VA, USA, 2020. [Google Scholar]
  2. Elsevier Scopus. Scopus Database Statistics; Scopus Database; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
  3. Lee, J.; Lee, K.; Kim, J.G. Personalized academic research paper recommendation system. arXiv 2013, arXiv:1304.5457. [Google Scholar]
  4. Bai, X.; Wang, M.; Lee, I.; Yang, Z.; Kong, X.; Xia, F. Scientific paper recommendation: A survey. IEEE Access 2019, 7, 9324–9339. [Google Scholar] [CrossRef]
  5. Beel, J.; Langer, S. A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In Proceedings of the Research and Advanced Technology for Digital Libraries: 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Poznań, Poland, 14–18 September 2015; Springer: Cham, Switzerland, 2015; pp. 153–168. [Google Scholar]
  6. Sakib, N.; Ahmad, R.B.; Ahsan, M.; Based, M.A.; Haruna, K.; Haider, J.; Gurusamy, S. A hybrid personalized scientific paper recommendation approach integrating public contextual metadata. IEEE Access 2021, 9, 83080–83091. [Google Scholar] [CrossRef]
  7. Guo, G.; Chen, B.; Zhang, X.; Liu, Z.; Dong, Z.; He, X. Leveraging title-abstract attentive semantics for paper recommendation. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 67–74. [Google Scholar]
  8. Alzoghbi, A.; Arrascue Ayala, V.A.; Fischer, P.M.; Lausen, G. Pubrec: Recommending publications based on publicly available meta-data. In Proceedings of the LWLA 2015 Workshops: KDML, FGWM, IR, and FGDB, Trier, Germany, 7–9 October 2015. [Google Scholar]
  9. Adomavicius, G.; Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
  10. Najmani, K.; Benlahmar, E.H.; Sael, N.; Zellou, A. Collaborative filtering approach: A review of recent research. In Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Tangier, Morocco, 21–26 December 2020; Springer: Cham, Switzerland, 2020; pp. 151–163. [Google Scholar]
  11. Schafer, J.B.; Frankowski, D.; Herlocker, J.; Sen, S. Collaborative filtering recommender systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Springer: Berlin/Heidelberg, Germany, 2007; pp. 291–324. [Google Scholar]
  12. Lops, P.; Jannach, D.; Musto, C.; Bogers, T.; Koolen, M. Trends in content-based recommendation: Preface to the special issue on Recommender systems based on rich item descriptions. User Model. User-Adapt. Interact. 2019, 29, 239–249. [Google Scholar] [CrossRef]
  13. Lops, P.; De Gemmis, M.; Semeraro, G. Content-based recommender systems: State of the art and trends. Recommender Systems Handbook; Springer: Boston, MA, USA, 2011; pp. 73–105. [Google Scholar]
  14. Pazzani, M.J.; Billsus, D. Content-based recommendation systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Springer: Berlin/Heidelberg, Germany, 2007; pp. 325–341. [Google Scholar]
  15. Rao, K.N. Application domain and functional classification of recommender systems—A survey. DESIDOC J. Libr. Inf. Technol. 2008, 28, 17. [Google Scholar]
  16. Lahoud, C.; Moussa, S.; Obeid, C.; Khoury, H.E.; Champin, P.A. A comparative analysis of different recommender systems for university major and career domain guidance. Educ. Inf. Technol. 2023, 28, 8733–8759. [Google Scholar] [CrossRef]
  17. Uta, M.; Felfernig, A.; Le, V.M.; Tran, T.N.T.; Garber, D.; Lubos, S.; Burgstaller, T. Knowledge-based recommender systems: Overview and research directions. Front. Big Data 2024, 7, 1304439. [Google Scholar] [CrossRef]
  18. Shokeen, J.; Rana, C. A study on features of social recommender systems. Artif. Intell. Rev. 2020, 53, 965–988. [Google Scholar] [CrossRef]
  19. Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21. [Google Scholar] [CrossRef]
  20. Salton, G.; Fox, E.A.; Wu, H. Extended boolean information retrieval. Commun. ACM 1983, 26, 1022–1036. [Google Scholar] [CrossRef]
  21. Belkin, N.J.; Croft, W.B. Information filtering and information retrieval: Two sides of the same coin? Commun. ACM 1992, 35, 29–38. [Google Scholar] [CrossRef]
  22. Goldberg, D.; Nichols, D.; Oki, B.M.; Terry, D. Using collaborative filtering to weave an information tapestry. Commun. ACM 1992, 35, 61–70. [Google Scholar] [CrossRef]
  23. Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, New York, NY, USA, 22–26 October 1994; pp. 175–186. [Google Scholar]
  24. Shardanand, U.; Maes, P. Social information filtering: Algorithms for automating “word of mouth”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 7–11 May 1995; pp. 210–217. [Google Scholar]
  25. Hill, W.; Stead, L.; Rosenstein, M.; Furnas, G. Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Denver, CO, USA, 7–11 May 1995; pp. 194–201. [Google Scholar]
  26. Konstan, J.A.; Riedl, J. Recommender systems: From algorithms to user experience. User Model. User-Adapt. Interact. 2012, 22, 101–123. [Google Scholar] [CrossRef]
  27. Schafer, J.B.; Konstan, J.; Riedl, J. Recommender systems in e-commerce. In Proceedings of the 1st ACM Conference on Electronic Commerce, Denver, CO, USA, 3–5 November 1999; pp. 158–166. [Google Scholar]
  28. Harper, F.M.; Konstan, J.A. The movielens datasets: History and context. Acm Trans. Interact. Intell. Syst. (Tiis) 2015, 5, 1–19. [Google Scholar] [CrossRef]
  29. Breese, J.S.; Heckerman, D.; Kadie, C. Empirical analysis of predictive algorithms for collaborative filtering. arXiv 2013, arXiv:1301.7363. [Google Scholar]
  30. Herlocker, J.L.; Konstan, J.A.; Borchers, A.; Riedl, J. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999; pp. 230–237. [Google Scholar]
  31. Linden, G.; Smith, B.; York, J. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 2003, 7, 76–80. [Google Scholar] [CrossRef]
  32. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
  33. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J.T. Application of Dimensionality Reduction in Recommender System—A Case Study; Technical Report No. 00-043; University of Minnesota: Minneapolis, MN, USA, 2000. [Google Scholar]
  34. Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 426–434. [Google Scholar]
  35. Koren, Y. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 447–456. [Google Scholar]
  36. McNee, S.M.; Riedl, J.; Konstan, J.A. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Proceedings of the CHI’06 Extended Abstracts on Human Factors in Computing Systems, Montreal, QC, Canada, 22–27 April 2006; pp. 1097–1101. [Google Scholar]
  37. Massa, P.; Avesani, P. Trust-aware recommender systems. In Proceedings of the 2007 ACM Conference on Recommender Systems, Minneapolis, MN, USA, 19–20 October 2007; pp. 17–24. [Google Scholar]
  38. Richardson, M.; Dominowska, E.; Ragno, R. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web, Banff, Canada, 8–12 May 2007; pp. 521–530. [Google Scholar]
  39. Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; IEEE: New York, NY, USA, 2010; pp. 995–1000. [Google Scholar]
  40. Juan, Y.; Zhuang, Y.; Chin, W.S.; Lin, C.J. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 43–50. [Google Scholar]
  41. Pu, P.; Chen, L.; Hu, R. A user-centric evaluation framework for recommender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems, Chicago, IL, USA, 23–27 October 2011; pp. 157–164. [Google Scholar]
  42. Pu, P.; Chen, L.; Hu, R. Evaluating recommender systems from the user’s perspective: Survey of the state of the art. User Model. User-Adapt. Interact. 2012, 22, 317–355. [Google Scholar] [CrossRef]
  43. Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
  44. Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X.; Dong, Z. Deepfm: An end-to-end wide & deep learning framework for CTR prediction. arXiv 2018, arXiv:1804.04950. [Google Scholar]
  45. Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
  46. Zhu, J.; Liu, J.; Yang, S.; Zhang, Q.; He, X. Fuxictr: An open benchmark for click-through rate prediction. arXiv 2020, arXiv:2009.05794. [Google Scholar]
  47. Ferrari Dacrema, M.; Cremonesi, P.; Jannach, D. Are we really making much progress? A worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 101–109. [Google Scholar]
  48. Lin, J. The neural hype and comparisons against weak baselines. In ACM SIGIR Forum; ACM: New York, NY, USA, 2019; Volume 52, pp. 40–51. [Google Scholar]
  49. Dong, Z.; Zhu, H.; Cheng, P.; Feng, X.; Cai, G.; He, X.; Xu, J.; Wen, J. Counterfactual learning for recommender system. In Proceedings of the 14th ACM Conference on Recommender Systems, Virtual, 22–26 September 2020; pp. 568–569. [Google Scholar]
  50. Yuan, B.; Hsia, J.Y.; Yang, M.Y.; Zhu, H.; Chang, C.Y.; Dong, Z.; Lin, C.J. Improving ad click prediction by considering non-displayed events. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 329–338. [Google Scholar]
  51. Verma, S.; Dickerson, J.; Hines, K. Counterfactual explanations for machine learning: A review. arXiv 2020, arXiv:2010.10596. [Google Scholar]
  52. Collins, A.; Beel, J. Document embeddings vs. keyphrases vs. terms for recommender systems: A large-scale online evaluation. In In Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Urbana-Champaign, IL, USA, 2–6 June 2019; IEEE: New York, NY, USA, 2019; pp. 130–133. [Google Scholar]
  53. Chen, J.; Ban, Z. Academic paper recommendation based on clustering and pattern matching. In Proceedings of the Artificial Intelligence: Second CCF International Conference, ICAI 2019, Xuzhou, China, 22–23 August 2019; Springer: Singapore, 2019; pp. 171–182. [Google Scholar]
  54. Ali, Z.; Qi, G.; Muhammad, K.; Ali, B.; Abro, W.A. Paper recommendation based on heterogeneous network embedding. Knowl.-Based Syst. 2020, 210, 106438. [Google Scholar] [CrossRef]
  55. Du, N.; Guo, J.; Wu, C.Q.; Hou, A.; Zhao, Z.; Gan, D. Recommendation of academic papers based on heterogeneous information networks. In Proceedings of the 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), Antalya, Turkey, 2–5 November 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
  56. Nishioka, C.; Hauke, J.; Scherp, A. Influence of tweets and diversification on serendipitous research paper recommender systems. Peerj Comput. Sci. 2020, 6, e273. [Google Scholar] [CrossRef] [PubMed]
  57. Rahdari, B.; Brusilovsky, P.; Thaker, K.; Barria-Pineda, J. Knowledge-driven wikipedia article recommendation for electronic textbooks. In Proceedings of the European Conference on Technology Enhanced Learning, Heidelberg, Germany, 14–18 September 2020; Springer: Cham, Switzerland, 2020; pp. 363–368. [Google Scholar]
  58. Wang, X.; Xu, H.; Tan, W.; Wang, Z.; Xu, X. Scholarly paper recommendation via related path analysis in knowledge graph. In Proceedings of the 2020 International Conference on Service Science (ICSS), Xining, China, 24–26 August 2020; IEEE: New York, NY, USA, 2020; pp. 36–43. [Google Scholar]
  59. Márk, B. Graph Neural Networks for Article Recommendation Based on Implicit User Feedback and Content. Master’s Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2021. [Google Scholar]
  60. Chaudhuri, A.; Sinhababu, N.; Sarma, M.; Samanta, D. Hidden features identification for designing an efficient research article recommendation system. Int. J. Digit. Libr. 2021, 22, 233–249. [Google Scholar] [CrossRef]
  61. Kreutz, C.K.; Schenkel, R. Scientific paper recommendation systems: A literature review of recent publications. Int. J. Digit. Libr. 2022, 23, 335–369. [Google Scholar] [CrossRef]
  62. Aymen, A.T.M.; Imène, S. Scientific Paper Recommender Systems: A Review. In Artificial Intelligence and Heuristics for Smart Energy Efficiency in Smart Cities: Case Study: Tipasa, Algeria; Springer: Cham, Switzerland, 2022; pp. 896–906. [Google Scholar]
  63. Zhang, Z.; Patra, B.G.; Yaseen, A.; Zhu, J.; Sabharwal, R.; Roberts, K.; Cao, T.; Wu, H. Scholarly recommendation systems: A literature survey. Knowl. Inf. Syst. 2023, 65, 4433–4478. [Google Scholar] [CrossRef]
  64. Papadakis, H.; Papagrigoriou, A.; Panagiotakis, C.; Kosmas, E.; Fragopoulou, P. Collaborative filtering recommender systems taxonomy. Knowl. Inf. Syst. 2022, 64, 35–74. [Google Scholar] [CrossRef]
  65. Seridi, K.; El Rharras, A. A Comparative Analysis of Memory-Based and Model-Based Collaborative Filtering on Recommender System Implementation. In Proceedings of the International Conference on Smart City Applications, Paris, France, 4–6 October 2023; Springer: Cham, Switzerland, 2023; pp. 75–86. [Google Scholar]
  66. Zhang, Y. An Introduction to Matrix factorization and Factorization Machines in Recommendation System, and Beyond. arXiv 2022, arXiv:2203.11026. [Google Scholar]
  67. El Alaoui, D.; Riffi, J.; Aghoutane, B.; Sabri, A.; Yahyaouy, A.; Tairi, H. Collaborative Filtering: Comparative Study Between Matrix Factorization and Neural Network Method. In Proceedings of the Networked Systems: 8th International Conference, NETYS 2020, Marrakech, Morocco, 3–5 June 2020; Springer: Cham, Switzerland, 2021; pp. 361–367. [Google Scholar]
  68. Shi, K.; Zhang, J.; Fang, L.; Wang, W.; Jing, B. Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender Systems. arXiv 2024, arXiv:2403.19276. [Google Scholar]
  69. Beregovskaya, I.; Koroteev, M. Review of Clustering-Based Recommender Systems. arXiv 2021, arXiv:2109.12839. [Google Scholar]
  70. Gupta, G.; Katarya, R. A study of recommender systems using Markov decision process. In Proceedings of the Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 14–15 June 2018; IEEE: New York, NY, USA, 2018; pp. 1279–1283. [Google Scholar]
  71. Portugal, I.; Alencar, P.; Cowan, D. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 2018, 97, 205–227. [Google Scholar] [CrossRef]
  72. Burke, R. Hybrid web recommender systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Springer: Berlin/Heidelberg, Germany, 2007; pp. 377–408. [Google Scholar]
  73. Lange, K.; Lange, K. Singular value decomposition. In Numerical Analysis for Statisticians; Springer: New York, NY, USA, 2010; pp. 129–142. [Google Scholar]
  74. Bafna, P.; Pramod, D.; Vaidya, A. Document clustering: TF-IDF approach. In Proceedings of the International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 3–5 March 2016; IEEE: New York, NY, USA, 2016; pp. 61–66. [Google Scholar]
  75. Jannach, D.; Lerche, L.; Zanker, M. Recommending based on implicit feedback. In Social Information Access: Systems and Technologies; Springer: Cham, Switzerland, 2018; pp. 510–569. [Google Scholar]
  76. Van Meteren, R.; Van Someren, M. Using content-based filtering for recommendation. In Proceedings of the Machine Learning in the New Information Age: MLnet/ECML2000 Workshop, Barcelona, Spain, 31 May–2 June 2000; Volume 30, pp. 47–56. [Google Scholar]
  77. Singh, M. Scalability and sparsity issues in recommender datasets: A survey. Knowl. Inf. Syst. 2020, 62, 1–43. [Google Scholar] [CrossRef]
  78. Yuan, H.; Hernandez, A.A. User Cold Start Problem in Recommendation Systems: A Systematic Review. IEEE Access 2023, 11, 136958–136977. [Google Scholar] [CrossRef]
  79. Fayyaz, Z.; Ebrahimian, M.; Nawara, D.; Ibrahim, A.; Kashef, R. Recommendation systems: Algorithms, challenges, metrics, and business opportunities. Appl. Sci. 2020, 10, 7748. [Google Scholar] [CrossRef]
Figure 1. Diagram that summarizes the widely employed techniques in the CB approach.
Figure 1. Diagram that summarizes the widely employed techniques in the CB approach.
Bdcc 08 00190 g001
Figure 2. Collaborative Filtering by SVD method.
Figure 2. Collaborative Filtering by SVD method.
Bdcc 08 00190 g002
Figure 3. Flowchart of Weighted Hybrid RS, with w C F and w C B are the weights assigned to CF and CB, respectively.
Figure 3. Flowchart of Weighted Hybrid RS, with w C F and w C B are the weights assigned to CF and CB, respectively.
Bdcc 08 00190 g003
Figure 4. Comparison of Precision@k for different recommendation systems through (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Figure 4. Comparison of Precision@k for different recommendation systems through (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Bdcc 08 00190 g004
Figure 5. Recall@k Evaluation of CF, CB, and hybrid recommendation systems through (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Figure 5. Recall@k Evaluation of CF, CB, and hybrid recommendation systems through (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Bdcc 08 00190 g005
Figure 6. Novelty scores at different Top-N values for CF, CB, and hybrid recommendation systems on (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Figure 6. Novelty scores at different Top-N values for CF, CB, and hybrid recommendation systems on (a) CI&T Deskdrop and (b) Citeulike-t datasets.
Bdcc 08 00190 g006
Table 1. Overview of significant research on scientific paper recommendation systems.
Table 1. Overview of significant research on scientific paper recommendation systems.
YearAuthor(s)PurposeSample and MethodsKey Findings
2019Collins and Beel [52]To evaluate different document embedding methods for paper recommendationsUsed Doc2Vec and TF-IDF in Mr. DLib recommender-as-serviceDemonstrated the effectiveness of different document embedding approaches for paper recommendations
2019Chen and Ban [53]To develop a user interest clustering modelApplied LDA and pattern equivalence class mining in CPM modelSuccessfully clustered user interests using topic modeling and pattern mining techniques
2020Ali et al. [54]To create a personalized probabilistic recommendation modelDeveloped PR-HNE using citations, co-authorships, and topical relevance with SBERT and LDAEffectively integrated multiple graph information sources with semantic embeddings
2020Du et al. [55]To develop a heterogeneous network-based recommendation systemCreated HNPR using random walks on citation and co-author networksDemonstrated the effectiveness of using heterogeneous network structures for recommendations
2020Nishioka et al. [56]To incorporate users’ recent interests for serendipitous recommendationsIntegrated user tweets to capture current interestsSuccessfully enhanced recommendation serendipity through social media integration
2020Rahdari & Brusilovsky [57]To develop a customizable recommendation system for conference participantsCreated a system allowing users to control feature impactsShowed the benefits of user-controlled feature weighting in recommendations
2020Wang et al. [58]To develop a knowledge-aware recommendation systemImplemented LSTM-based path recurrent network with TF-IDF representationsSuccessfully mined knowledge graph paths for enhanced recommendations
2021Bereczki [59]To model user–paper interactions in a bipartite graphUsed Word2Vec/BERT embeddings with graph convolutionDemonstrated effective integration of text embeddings with graph-based approaches
2021Chaudhuri et al. [60]To incorporate indirect features for recommendationsDeveloped Hybrid Topic Model combining LDA and Word2VecSuccessfully utilized keyword diversification and citation analysis for improved recommendations
2022Kreutz et al. [61]To review contemporary paper recommendation systemsSurveyed studies from January 2019 to October 2021Provided comprehensive overview of methods, datasets, and challenges
2022Aymen et al. [62]To review academic works on paper recommendationsAnalyzed content-based, CF, and hybrid methodsCompared methodologies and identified open issues in the field
2023Zhang et al. [63]To survey scholarly recommendation systemsReviewed challenges and approaches in scholarly recommendationsProvided insights into broader scholarly recommendation systems
Table 2. Strengths and weaknesses of recommendation systems.
Table 2. Strengths and weaknesses of recommendation systems.
ApproachStrengthsWeaknesses
Collaborative Filtering (CF)
  • No need for item content.
  • Leverages community data.
  • Captures complex preferences.
  • Requires large user data (Scalability).
  • Struggles with new items or users (Cold Start).
  • Sparse data problem (Sparsity).
Content-Based (CB)
  • Does not require a large user base.
  • Can recommend new items.
  • Effective for unique tastes.
  • Requires item features.
  • Tends to recommend similar items.
Table 3. Statistical summary of CI&T Deskdrop and Citeulike-t Datasets.
Table 3. Statistical summary of CI&T Deskdrop and Citeulike-t Datasets.
CI&T DeskdropCiteulike-t
# of Users18957947
# of Articles312225,975
# of Interactions72,312134,860
Sparsity7.99%99.93%
Table 4. Comparison of hybrid techniques in recommender systems.
Table 4. Comparison of hybrid techniques in recommender systems.
CI&T DeskdropCiteulike-t
Hybridizing Technique NDCG@5 NDCG@10 Novelty@5 Novelty@10 NDCG@5 NDCG@10 Novelty@5 Novelty@10
Switching0.2930.2870.1800.0900.2250.2150.1250.065
Cascade0.2950.2900.1750.0880.2300.2100.1200.063
Traditional Weighted0.3050.3000.1830.0940.2400.2300.1380.074
Dynamic Weighted0.3120.3080.2000.1000.2500.2400.1500.080
Table 5. NDCG scores for different recommendation systems through CI&T Deskdrop and Citeulike-t datasets. The highest scores for each approach are shown in bold.
Table 5. NDCG scores for different recommendation systems through CI&T Deskdrop and Citeulike-t datasets. The highest scores for each approach are shown in bold.
CI&T DeskdropCiteulike-t
NDCG@5 NDCG@10 NDCG@5 NDCG@10
Collaborative-filtering(CF)0.2590.2570.2000.190
Content based (CB)0.2890.2900.2300.220
Hybrid0.3120.3080.2500.240
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

El Alaoui, D.; Riffi, J.; Sabri, A.; Aghoutane, B.; Yahyaouy, A.; Tairi, H. Comparative Study of Filtering Methods for Scientific Research Article Recommendations. Big Data Cogn. Comput. 2024, 8, 190. https://doi.org/10.3390/bdcc8120190

AMA Style

El Alaoui D, Riffi J, Sabri A, Aghoutane B, Yahyaouy A, Tairi H. Comparative Study of Filtering Methods for Scientific Research Article Recommendations. Big Data and Cognitive Computing. 2024; 8(12):190. https://doi.org/10.3390/bdcc8120190

Chicago/Turabian Style

El Alaoui, Driss, Jamal Riffi, Abdelouahed Sabri, Badraddine Aghoutane, Ali Yahyaouy, and Hamid Tairi. 2024. "Comparative Study of Filtering Methods for Scientific Research Article Recommendations" Big Data and Cognitive Computing 8, no. 12: 190. https://doi.org/10.3390/bdcc8120190

APA Style

El Alaoui, D., Riffi, J., Sabri, A., Aghoutane, B., Yahyaouy, A., & Tairi, H. (2024). Comparative Study of Filtering Methods for Scientific Research Article Recommendations. Big Data and Cognitive Computing, 8(12), 190. https://doi.org/10.3390/bdcc8120190

Article Metrics

Back to TopTop