1. Introduction
In the rapidly evolving internet landscape, the speed of information flow is astonishing, yet it generates vast amounts of unorganized digital data, making it difficult for users to filter out what they need. To address this issue, recommendation systems have emerged as a vital information filtering tool. Their primary function is to alleviate the burden of finding desired information by deeply analyzing user preferences and behavioral patterns, thereby automatically providing potential information, services, or products that align with their needs [
1].
In the vast digital information, users often feel overwhelmed by the sheer volume of data and struggle to locate content that interests them. Simultaneously, information providers face significant challenges in capturing user attention for their offerings. Recommendation systems offer an effective solution to these problems by analyzing user behavior and interests to establish connections between users and information products.
This process uncovers latent user needs, enabling the recommendation of relevant information to interested users. Recommendation systems typically rely on three core components: collection of user behavior records, preference analysis models, and implementation of recommendation algorithms [
2]. In the present digital consumption landscape, e-commerce websites’ recommendation systems stand as the most recognizable examples. Nearly every major e-commerce platform leverages personalized recommendations to enhance marketing effectiveness. Research indicates that 35% of Amazon’s early sales growth was attributed to its effective recommendation system [
3]. Beyond merchandise, recommendation systems frequently extend to diverse media content like music and films. Among various personalized recommendation algorithms, collaborative filtering is widely adopted and regarded as one of the most successful recommendation technologies.
In recommendation system research, analyzing individual user data is the most common approach. This method identifies users’ interests and preferences by recording their past consumption behaviors. Based on these behavioral patterns, recommendation systems can provide targeted personalized suggestions for future actions, helping users make wiser decisions in their thinking, judgment, and consumption choices [
4]. Activities such as watching movies, dining out, or traveling are often shared experiences within social groups. Therefore, the recommendation system employs collaborative filtering as its core algorithm. However, this approach demands substantial computational resources during processing, resulting in prolonged generation times for recommendation outputs. To address this issue, we developed an improved data processing model aimed at significantly reducing computation time. By segmenting data processing, the developed model decreases computational burdens on product and co-occurrence matrices, avoiding system delays caused by excessive computational load. This enables faster generation of recommendation results.
2. Literature Review
The purpose of recommendation systems is to help users reduce the additional costs incurred when utilizing collected information or exploring unknown content. This information filtering mechanism recommends potential information, services, or products that align with users’ preferences, interests, behaviors, or needs [
5]. These systems leverage machine learning technologies to analyze content or utilize others’ experiences, thereby providing users with valuable suggestions.
Companies can collect users’ purchase or browsing histories to deeply analyze their preferences and behaviors. This information serves as the basis for recommendation predictions, thereby stimulating consumption and increasing sales opportunities. When seeking favorite movies, users often encounter challenges such as overwhelming choices or a lack of inspiration, which are important considerations in recommendation system design. However, despite numerous recommendation systems available, not all effectively deliver personalized movie recommendations. This study addresses this issue.
2.1. Collaborative Filtering Recommendation
The origins of collaborative filtering systems can be traced back to 1992, when Goldberg first proposed the concept to address the information overload faced by Xerox Palo Alto Research Center [
6]. At this research facility, employees were inundated with a massive volume of emails daily, making it extremely difficult to effectively categorize and filter critical information. To tackle this challenge, the research team developed an experimental email management system designed to enhance information accessibility and filtering efficiency. This advancement helped employees better manage their emails and laid the technical groundwork for later e-commerce recommendation systems, such as Amazon.com. It further propelled the development of personalized recommendation technology, enabling consumers to more easily discover products of interest.
Collaborative filtering recommendation technology stands as one of the earliest and most successful techniques within recommendation systems [
2]. This technology relies on the nearest neighbor algorithm to calculate similarity between users by analyzing their historical preferences. Specifically, the system identifies other users similar to the target user and predicts the target user’s preference for specific items based on these users’ ratings. The system recommends corresponding items to the target user based on these predictions. The greatest advantage of collaborative filtering lies in its lack of special requirements for recommended items, enabling it to effectively handle various complex, unstructured objects, such as music and movies, delivering personalized recommendation experiences. A recommendation system for learning English through online videos specifically targets users of the video platform Voice Tube.
In this study, we employed collaborative filtering methods, utilizing user information scraped by a Python 2.7 web crawler to analyze groups of users with similar preferences and calculate rating values for recommended videos. The system integrates the Crab recommendation engine to deliver more precise video suggestions, which enhances user learning motivation. Additionally, user learning strategies, including intensive reading and extensive reading modes, were proposed. Through co-collection distribution simulation and cluster analysis of shared video collections, it observes user video collection patterns and learning behavior characteristics to improve system accuracy and the user learning experience.
The most effective approach involves first identifying other users with similar interests to the target user, then recommending content favored by these users to the target user. This collaborative filtering-based recommendation system automatically facilitates mutual recommendations from the user’s perspective. Simply put, the system can generate recommendations directly based on a user’s purchase patterns or browsing history, without requiring them to fill out additional questionnaires or provide other information. This not only enhances the accuracy of recommendations but also significantly reduces the effort users expend searching for content of interest, thereby improving the overall user experience.
In collaborative filtering recommendation research, Huang et al. proposed an exchange-hybrid filtering architecture integrating Learning Vector Quantization with collaborative filtering [
7]. This architecture employs a simplified two-layer network design to generate recommendations through an exchange-hybrid filtering strategy. The researchers validated their approach using the MovieLens dataset. Experimental results demonstrated that Learning Vector Quantization rapidly captures shifts in user preferences. Combined with the exchange-hybrid filtering strategy, the system effectively delivers personalized recommendations that cater to diverse user needs, thereby enhancing both the accuracy and practicality of recommendations.
Although collaborative filtering, as a classic recommendation technique, has been widely applied across multiple domains, it still faces numerous challenges that need to be overcome. Comparative analysis results of recommendation methods reveal that collaborative filtering’s advantage lies in delivering personalized recommendations based on user behavior, with its automated nature enhancing recommendation efficiency (
Table 1). However, this technology also exhibits notable shortcomings, such as data sparsity, cold-start problems, and computational efficiency issues in big data environments. Consequently, research addressing these challenges is crucial for the improvement of collaborative filtering technology in the future.
2.2. User-Based Collaborative Filtering
User-based collaborative filtering identifies adjacent users with similar interests using statistical-like methods with the following steps.
User information collection process [
11]
In recommendation systems, the first step is gathering user interest data. This typically occurs through users actively rating or reviewing products, a process known as active rating. Users actively participate by rating products they like. Another method is passive rating, where the system automatically generates ratings based on user behavior without requiring any action from the user. This approach reduces user burden while collecting valuable data. For e-commerce platforms, passive rating is particularly effective because purchase history provides rich information, enabling the system to understand user interests and needs more accurately.
- 2.
Nearest neighbor search (NNS) method [
12]
NNS is a collaborative filtering technique designed to identify other users with similar interests to a given user. The core process involves calculating similarity between users to find suitable recommendation candidates. For example, when seeking n users with interests similar to User A, we use their ratings for item M to predict User A’s rating for M. This approach enhances the personalization of recommendations. Typically, we select different similarity algorithms based on the characteristics of the dataset. Popular algorithms include Pearson correlation coefficient [
13], cosine-based similarity, and adjusted cosine similarity.
- 3.
Generating recommendation
Once the set of closest neighboring users has been identified, the target user’s interests are predicted and recommendation results are generated [
6]. This step is regarded as the core of the entire recommendation system, since the user experience is directly affected by it. Different recommendation formats are adopted based on varying recommendation objectives. Common recommendation outputs include Top-N recommendations, which are tailored for individual users. This means each user receives unique recommendation results. For example, for User A, we perform statistical analysis on their nearest neighbors. By analyzing the nearest neighbors of User A, we select items with high occurrence frequency that are absent in User A’s rating history as the recommendation results. Association-based recommendations involve mining association rules from the records of the nearest neighbors [
8].
2.3. Item-Based Collaborative Filtering
As the number of users increases, the computational time for user-based collaborative recommendation algorithms continues to grow. Consequently, Pivk proposed item-based collaborative filtering recommendation algorithms in 2001 [
6]. The fundamental assumption of this approach is that items capable of capturing a user’s interest must be similar to those items the user has previously rated highly. In other words, similarity between items is calculated first rather than similarity between users, using the following steps.
Collect user information: This step is identical to user-based collaborative filtering methods, where we need to obtain users’ rating data to understand their interests;
NNS for item search: We calculate the similarity between rated items and the item to be predicted. Using these similarity scores as weights, we weight the ratings of the rated items to predict the rating of the unrated item. For example, to calculate the similarity between items A and B, we identify users who have rated both items and then compute their similarity scores. This process is similar to user-based collaborative filtering methods;
Recommendations: Item-based collaborative filtering does not account for differences between users, which may result in slightly lower accuracy. However, its advantage lies in not requiring user history data or identification. Since item similarity remains relatively stable, we can perform extensive offline computations. This effectively reduces the computational burden online, significantly improving recommendation efficiency—especially when the number of users far exceeds the number of items.
3. Methodology
The first step in constructing a collaborative filtering recommendation system is to establish a user–item matrix model. This matrix records user ratings and provides the foundation for subsequent calculations. Once the matrix has been built, similarity scores between users are computed in order to identify the nearest neighbors who most closely resemble the target user. Based on the ratings of these nearest neighbors, movie recommendations are then generated, allowing the system to better align with user preferences. Research Method 1 follows a similar process. The method begins with the construction of the user–item matrix model. Next, the nearest neighbors are identified by calculating user similarity through the Pearson CORRELATION COEFFICIENT [
13].
where
represents user X’s rating for movie I, represents the average of all movie ratings by that user.
represents the other user Y’s rating for movie i in the database,
represents the average of all movie ratings by other user Y. Here
= [−1…1]. If user X and other user Y are more similar (r value closer to 1), it means their movie preferences are quite similar; if the similarity is lower (r value closer to −1), it means these two users’ movie preferences differ greatly.
3.1. Movie Recommendation
After the system calculates the similarity between the target user X and other users, the collaborative filtering recommendation mechanism employs a prediction algorithm [
13]. It calculates the weighted average of J similar neighbor users to predict user X’s potential preference for a recommended item
p using Equation (2):
where
represents the predicted value for movie p to be recommended to user X,
represents the average of all movie ratings by user X.
represents the similarity between user X and neighbor J.
represents neighbor J rating for movie p, and
is the average of all movie ratings by neighbor J. The steps are shown in
Figure 1.
3.1.1. Data Source
Item time content filtering criteria are based on the Internet Movie Database (IMDb) (
https://www.imdb.com/search/title/?release_date=1922-01-01,1998-12-12%20&title_type=feature, accessed on 8 June 2026), an online database that documents film actors, films, television programs, television personalities, video games, and film production teams. From this source, the recorded film duration is retrieved. In addition, MovieLens content provides the year of a film’s release and displays this information.
3.1.2. Data Extraction
Movie results displayed through data extraction are obtained from the IMDb website using the visual data acquisition tool import.io. This process yields comprehensive information on film eras, including crucial details such as actual theatrical release dates, organized by year.
3.1.3. Data Organization
The extracted data are organized by archiving records according to year and incorporating timestamps from IMDb entries for each of the one million MovieLens film titles. While MovieLens records include information such as director, actor, and genre, they lack timestamps, which must be sourced separately to enhance the completeness of the dataset.
3.1.4. Recommendation Generation
Improved MovieLens content records, combined with analysis of the User Rating Dataset, integrate users’ favorite movie characteristics with their rating histories to enable collaborative filtering applications. This process generates recommendation results for movies based on director, actor, genre, and release date. The collaborative filtering application utilizes the User Rating Dataset to determine the K most similar neighbors to a given user, subsequently producing a ranked list of movies most worth watching for that user.
4. Experiment
The experiment was conducted based on the Internet Movie Database (IMDb) (
https://www.imdb.com/search/title/?release_date=1922-01-01,1998-12-12%20&title_type=feature, accessed on 8 June 2026), an online platform that comprehensively documents films, actors, television programs, and film production teams. The release years of films are retrieved from IMDb and integrated into MovieLens to enrich the dataset. Movie release year data were extracted from IMDb using the visualization tool import.io. This process captures actual release dates, which are organized by year into separate records, thereby facilitating subsequent analysis and utilization.
For data organization, records were categorized by year, and over one million entries were logged. MovieLens film titles were cross-referenced with IMDb records to supplement missing time values. While MovieLens provides information such as directors, actors, and genres, timestamps were retrieved separately to ensure dataset completeness. After partitioning MovieLens user content, new user preference values and user-item usage tables were generated. Collaborative filtering technology was then applied to produce recommendation results that consider a movie’s director, actors, genre, and release date. The collaborative filtering algorithm calculates the K similar neighbors to the user and produces a ranked list of titles most worth watching (
Figure 2).
A key objective of the recommendation system is to reduce the effort required for users to sift through large volumes of data while providing selections that best match their preferences. To achieve this, a limit was imposed so that collaborative filtering recommendations never exceed 20 items per session. This constraint prevents information overload and enables users to quickly locate desired content. The system generated 20 recommended results per session.
The device used for the experiment was equipped with an 12th Gen Intel® Core™ i7-12700H (2.30 GHz) processor (Santa Clara, CA, USA). It contained 18 gigabytes (GB) of DDR4 memory (configured as 1 × 4 GB). The operating system was Windows 11, 64-bit edition. The recommendation software was implemented using the R programming language. The dataset employed was MovieLens, which includes one million records. Data crawling was performed using IMDb.
5. Results and Discussions
In a typical collaborative filtering computation, generating recommendation scores requires approximately 17.20 min (
Figure 3). The process begins with extracting key information from the user purchase table, including user identifications (IDs), movie IDs, and preference scores. This data provides insight into each user’s preferences.
A single user preference table was compiled to construct a co-occurrence matrix. This matrix records how many users have watched both Movie A and Movie B simultaneously, forming the core of the symbiosis matrix. By multiplying the user preference table by the co-occurrence matrix, recommendation scores are generated. These scores represent the strength of a movie’s recommendation. The top 20 movie IDs are then mapped to the movie catalog to produce a list of recommended titles. The improvements introduced in this study enable collaborative filtering calculations to generate recommendation values within 16.14 s. The process begins by computing the relevant interval of user IDs and filtering out irrelevant ones. This generates new user preference values and a revised user purchase item table.
From the revised user purchase table, individual user preference tables are extracted and used to generate a co-occurrence matrix. This matrix records the number of users who have watched both Movie A and Movie B simultaneously. Multiplying the user preference table by the co-occurrence matrix yields recommendation scores, which represent the strength of each movie’s recommendation. Finally, these scores are mapped to the movie product table to generate a list of recommended movies.
During the calculation process, a new user ID range was created to encompass all user preference values. Regardless of the magnitude of these preference values, all data were processed together. Subsequently, the resulting co-occurrence matrix was created to record how many users watched Movie A before watching Movie B. This data was compiled to form a complete matrix. Finally, these two matrices underwent multiplication. Values were accumulated through multiplication and addition during this process transform from smaller numbers into larger ones, which indirectly impacts the accuracy of the recommendation data (
Table 2 and
Figure 4).
6. Conclusions
The temporal attributes of items were obtained by filtering the time values of users’ preferred products to reduce the extensive array operations involved in both product ratings and user preference scores. These operations have a significant impact on recommendation speed. When both product and rating preference values are large, collaborative filtering recommendations are hindered by prolonged memory consumption, processing delays, and extended disk input/output pauses caused by array operations. To address these challenges, a large-volume, small-variable approach is adopted, in which data are partitioned for computation. This ensures that the system can meet reasonable recommendation processing timelines.
To mitigate the issue of insufficient computational speed, temporal segmentation is employed to reduce computational load while accelerating the generation of recommended products. The recommended items demonstrate an average similarity of 90% to those produced by the original collaborative filtering algorithm, which indicates that further refinement is possible to enhance the method. Improvements in file segmentation resolve the problem of oversized files and emphasize the critical importance of segmentation itself. This approach is valuable when executing big-data computations on smaller systems. The concept is highly significant, as it enables oversized files to be utilized more effectively and integrated with the system to achieve optimal performance.
Author Contributions
Conceptualization, G.-W.H. and H.-J.C.; methodology, G.-W.H. and H.-J.C.; software, G.-W.H.; validation, G.-W.H. and H.-J.C.; formal analysis, G.-W.H. and H.-J.C.; investigation, G.-W.H.; resources, G.-W.H.; writing—original draft preparation, G.-W.H.; writing—review and editing, G.-W.H. and H.-J.C.; visualization, G.-W.H. and H.-J.C.; supervision, H.-J.C.; project administration, G.-W.H. and H.-J.C.; funding acquisition, G.-W.H. and H.-J.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The dataset used in this study was collected and annotated by the authors. Data may be made available from the corresponding authors upon reasonable request.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Rashid, A.M.; Albert, I.; Cosley, D.; Lam, S.K.; McNee, S.M.; Konstan, J.A.; Riedl, J. Getting to know you: Learning new user preferences in recommender systems. In Proceedings of the 7th International Conference on Intelligent User Interfaces; Association for Computing Machinery: New York, NY, USA, 2002; pp. 127–134. [Google Scholar]
- Schafer, J.B.; Frankowski, D.; Herlocker, J.; Sen, S. Collaborative filtering recommender systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Brusilovsky, P., Kobsa, A., Nejdl, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 291–324. [Google Scholar]
- Linden, G.; Smith, B.; York, J. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Comput. 2003, 7, 76–80. [Google Scholar] [CrossRef]
- Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. GroupLens: An open architecture for collaborative filtering of Netnews. In Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work (CSCW); Association for Computing Machinery: New York, NY, USA, 1994; pp. 175–186. [Google Scholar]
- Isinkaye, F.O.; Folajimi, Y.O.; Ojokoh, B.A. Recommendation systems: Principles, methods and evaluation. Egypt. Inform. J. 2015, 16, 261–273. [Google Scholar] [CrossRef]
- Goldberg, D.; Nichols, D.; Oki, B.M.; Terry, D. Using collaborative filtering to weave an information tapestry. Commun. ACM 1992, 35, 61–70. [Google Scholar] [CrossRef]
- Huang, C.M.; Lin, C.Y.; Huang, J.R. Combining LVQ and Collaboration Filtering on Switching Hybrid Movie Recommendation. J. Inf. Manag. 2013, 20, 423–447. (In Chinese) [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
- Stryker, S.B.; Leaver, B.L. Content-Based Instruction in Foreign Language Education: Models and Methods; Georgetown University Press: Baltimore, MD, USA, 1997. [Google Scholar]
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Gauch, S.; Speretta, M.; Chandramouli, A.; Micarelli, A. User Profiles for Personalized Information Access. In The Adaptive Web, Methods and Strategies of Web Personalization; Brusilovsky, P., Kobsa, A., Nejdl, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4321, pp. 54–89. [Google Scholar]
- Papadopoulos, A.N.; Manolopoulos, Y. Nearest Neighbor Search: A Database Perspective; Springer: New York, NY, USA, 2005. [Google Scholar]
- Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW ‘01), Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |