Next Article in Journal
Legal AI in Low-Resource Languages: Building and Evaluating QA Systems for the Kazakh Legislation
Previous Article in Journal
Attention-Pool: 9-Ball Game Video Analytics with Object Attention and Temporal Context Gated Attention
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Intelligent Hybrid AI Course Recommendation Framework Integrating BERT Embeddings and Random Forest Classification

by
Armaneesa Naaman Hasoon
1,
Salwa Khalid Abdulateef
1,*,
R. S. Abdulameer
1 and
Moceheb Lazam Shuwandy
2,*
1
Computer Science Department, College of Computer Science and Mathematics, Tikrit University (TU), Tikrit 34001, Iraq
2
Cybersecurity Department, College of Computer Science and Mathematics, Tikrit University (TU), Tikrit 34001, Iraq
*
Authors to whom correspondence should be addressed.
Computers 2025, 14(9), 353; https://doi.org/10.3390/computers14090353
Submission received: 23 July 2025 / Revised: 17 August 2025 / Accepted: 25 August 2025 / Published: 27 August 2025
(This article belongs to the Section AI-Driven Innovations)

Abstract

With the proliferation of online learning platforms, selecting appropriate artificial intelligence (AI) courses has become increasingly complex for learners. This study proposes a novel hybrid AI course recommendation framework that integrates Term Frequency–Inverse Document Frequency (TF-IDF) and Bidirectional Encoder Representations from Transformers (BERT) for robust textual feature extraction, enhanced by a Random Forest classifier to improve recommendation precision. A curated dataset of 2238 AI-related courses from Udemy was constructed through multi-session web scraping, followed by comprehensive data preprocessing. The system computes semantic and lexical similarity using cosine similarity and fuzzy matching to handle user input variations. Experimental results demonstrate a high recommendation accuracy = 91.25%, precision = 96.63%, and F1-Score = 90.77%. Compared with baseline models, the proposed framework significantly improves performance in cold-start scenarios and does not rely on historical user interactions. A Flask-based web application was developed for real-time deployment, offering instant, user-friendly recommendations. This work contributes a scalable and metadata-driven AI recommender architecture with practical deployment and promising generalization capabilities.

1. Introduction

A course recommendation mechanism recommends similar courses based on learners’ preferences [1]. Udemy, Coursera, and EdX are online training resources that utilize such methodologies and offer a wide variety of courses across several disciplines [2]. Online course recommendation systems are becoming increasingly popular because they guide students to discover suitable courses and enrich their learning experiences by offering personalized recommendations tailored to their needs [3].
A recommendation system is a variety of information filtering systems designed to recall the possible rating or preference that a user would give a particular item. It is a program that suggests a suitable course to users [4]. For example, it can recommend a movie to watch on Netflix or suggest a product to buy from online shops [5]. A variety of these types of recommender systems contribute to filtering information by suggesting suitable content based on numerous data sources and methods. Content-based filtering and collaborative filtering are among the most widely applied and studied methods in various applications, such as education, e-commerce, and entertainment [6].
  • Content-based filtering: Under this method, items are observed and then matched with the user’s preferences. It performs tight item recommendations based on past interactions, utilizing techniques such as TF-IDF and machine learning. It is very personalized, but it has a problem of a cold start and can lead to a filter bubble, as the diversity of the shown content is reduced. To illustrate, a movie streaming service that proposes movies to the client according to his or her preferences regarding actors and genres [7].
  • Collaborative filtering: This method suggests items not based on their features, but instead on the behavior patterns of the users. It identifies similar users or items using techniques such as user-based and item-based collaborative filtering. Although efficient, it has problems with data sparsity and cold start. An example is where Netflix suggests programs to watch depending on what similar users to you have watched [8].
Online AI courses often lack individualized recommendations, resulting in many users having less-than-ideal learning experiences. Conventional methods of finding courses, such as keyword-based searches or manual browsing, do not offer customized suggestions [9,10], especially when there are numerous courses to choose from. This phenomenon is known to occur in some instances, resulting in information overload, which leads to poor learning processes and decreased levels of satisfaction among learners. In addition, a significant portion of current recommendation systems is based on the collaborative filtering algorithm, which cannot be efficiently implemented in situations where user data is sparse, as for courses or new users, resulting in the cold start problem [8].
In this regard, this paper proposes the design of a content-based recommendation system that focuses on online courses in artificial intelligence. Content-based filtering can achieve good and significant results in the course recommendation problem by utilizing only course attributes such as titles, descriptions, and categories, without requiring a considerable amount of historical user interaction data. This strategy aims to enhance course discovery, reduce information overload, and improve the overall educational experience of AI learners. In addition to strengthening course discovery and improving the overall learning experience, our method ensures learners receive accurate and personalized recommendations.
This study introduces an AI recommender system that combines TF-IDF and BERT for effective content representation and enhances the system’s accuracy with the aid of a Random Forest classifier. Unlike most systems, infer learning relies just on the course metadata and not on interaction with students. A set of 2238 Udemy AI courses was gathered by web scraping and preprocessing. Ninety-one percent of the cases were handled correctly, and the contributions of this research are the following:
  • We constructed a real-world dataset of 2238 AI-related courses collected from Udemy through multiple scraping sessions, followed by rigorous cleaning and preprocessing. Additionally, we provided a quantitative analysis of dataset characteristics (course levels, ratings, and price points) and discussed their potential impact on generalizability.
  • This work presents a hybrid recommendation architecture that combines TF-IDF for lexical feature extraction, BERT embeddings for contextual semantic representation, and a Random Forest classifier to enhance predictive accuracy. The architecture is fully metadata-driven, enabling deployment in a real-time Flask-based application for immediate, user-friendly recommendations.
  • By relying solely on course metadata rather than historical interaction data, the proposed system effectively addresses the cold-start problem, ensuring that accurate recommendations can be made for new users or unseen courses.
  • Extensive empirical evaluation, including statistical significance testing (bootstrap, McNemar’s test) with 95% confidence intervals, demonstrates that the proposed approach achieves 91.25% accuracy and a 90.77% F1-Score, outperforming recent baselines.
  • The system is deployed as a real-time interactive web application using Flask, offering immediate recommendations via a user-friendly interface. A pilot usability study with five undergraduate participants was also conducted, showing high internal agreement in Likert-scale responses (avg. 4.6/5), indicating preliminary usability and paving the way for larger-scale evaluations in future work.
The remainder of the paper is organized as follows: Section 2 presents the details of the techniques employed in this study. Next, in Section 3, we introduce the literature review on recommendation systems. Section 4 contains our methodology. The experimental results are presented in Section 5. Finally, Section 6 presents the conclusions and outlines future work.

2. Literature Review

Various applications and usage methods of recommendation systems, as well as their associated challenges, have already been outlined in numerous studies concerning the educational field. Pointing out the imperfections that our offered solution aims to address, the following section presents a detailed literature review on content-based filtering and its application in the context of online learning.
Julian et al. [11] presented a self-governing structure for a recommendation system that can be employed to enhance the education process by recommending online courses to students. The contextual data, student information, and course requirements used in the research were made available through the online learning environment (VLE). There is certainly improvement in terms of utilizing metadata, but the study has not explored hybrid and content-based filtering methods. The study also highlights the potential of intelligent systems in the context of personalized learning, which is one of the objectives of our system, namely to provide customized course recommendations.
In [12], the authors applied genetic algorithms and K-means clustering to enhance educational recommendations. Although this cutting-edge approach enhanced accuracy, the research gave special attention to collaborative filtering and did not utilize metadata to its full extent. Yonghong Tian [13] proposed a hybrid recommendation system that combines collaborative filtering and content-based filtering to enhance the utilization of library resources. In the investigation, the clustering algorithms were employed to address the problem of data sparsity. This method, however, required a substantial amount of user–item interaction data to be effective. To reduce this dependence, the proposal suggests limiting ourselves to using content-based filtering alone, utilizing metadata such as course categories, levels, and descriptions, to make recommendations based on user preferences.
In [14], Yiling Dai explored the role of explanations in improving user satisfaction and learning outcomes in a math recommender system. Although the study highlighted the significance of individualized explanations, it did not place a strong focus on metadata usage. Metadata is utilized comprehensively in the system, serving not only to enhance accurate recommendations but also to facilitate a better learning experience for the user. Additionally, in a separate article, the authors in [15] developed a personalized book recommender system that employed collaborative filtering and Euclidean distance measures. The recommender system’s accuracy was enhanced in this research work, with a primary focus on the Amazon review dataset, which comprises user ratings and reviews. Personalization could also be improved by including metadata, such as book descriptions and their corresponding genres. Zhong and Ding [8] introduced a system that recommends learning resources based on collaborative filtering. Content recommendations were generated based on behavioral data, including user interactions and learning patterns. The following issues were to be addressed in the studies: the cold start problem, data sparsity and diversity, and content-based filtering.
Research studies presented in this section demonstrate the importance of recommendation systems as a method for addressing information overload, personalization, and data sparsity issues in the education field. Collaborative filtering has had a broad application, but it has not been widely used in cases involving missing data or for new users, as it relies on user Utilization of Metadata interaction data. It, however, offers an alternative to content-based filtering through the use of extensive information that makes recommendations in this respect. However, gaps included the following:
  • To obtain proper suggestions, only a minimal number of researchers use metadata, including levels, classifications, and descriptions, fully.
  • Problem of Cold Start: Content-based filtering addresses the issue of collaborative filtering methods’ inability to handle new users and objects.
  • Scalability and simplicity: Most solutions are associated with complicated algorithms and might be challenging to scale and use.
  • Over-reliance on Collaborative Filtering: Many systems rely on information about user interactions, which makes them useless for courses and new users.
Recent studies on course recommendation systems have utilized deep learning and hybrid techniques. For example, Li and Kim (2021) introduced the DECOR model—a deep learning-based framework capturing both user behavior and course attributes—demonstrating improved accuracy over collaborative filtering methods [16].
Guruge et al. (2021), in a systematic literature review, highlighted the increasing trend towards hybrid recommender systems for online courses and noted the advantages in addressing cold-start issues [17,18].
A recent approach by Lee et al. [19] proposed a two-stage collaborative-filtering model enhanced with item-dependency awareness, achieving a high AUC of 0.97 on real-world course data. In another hybrid strategy, an AutoLFA model combined autoencoder and latent factor analysis to boost recommendation performance across multiple datasets [19].
Unlike these approaches, our system uniquely merges TF-IDF for lexical features and BERT embeddings for semantic understanding, with a Random Forest classifier layered on top. It addresses cold-start solely via metadata and includes full deployment as a real-time Flask web application—a combination not commonly explored in the existing literature.
This study addresses these gaps by developing a content-based recommendation system for online AI courses using a dataset collected from Udemy. Without relying on information about user interactions, the system generates precise recommendations by utilizing metadata, including course titles, descriptions, classifications, and levels. Content-based filtering enables the proposal of recently uploaded courses based on their qualities, although it does not entirely resolve the cold-start issue. The system’s efficient and scalable design ensures that it can be used practically in online learning environments.

3. Theoretical Backgrounds

3.1. Techniques Used in Feature Extraction

Feature extraction is a core process in the AI Course Recommender System, as it converts unstructured text data into structured numerical data [20]. Through this, the system can quantify course content and establish similarities among different courses. In this study, two of the most prominent techniques employed in feature extraction were TF-IDF vectorization and BERT embeddings. To enhance the efficiency of feature extraction, course names and course descriptions were merged into a new feature called combined content. The merging enables the title keywords and context information in the description to be utilized for similarity analysis. By combining keyword-based and semantic approaches, the system can provide more accurate course recommendations [15].

3.1.1. TF-IDF Vectorization

Term Frequency–Inverse Document Frequency (TF-IDF) is a standard feature extraction technique in Natural Language Processing (NLP) that identifies the importance of words within a document and reduces the impact of commonplace but less significant words in distinguishing meaning [21]. It is very effective for keyword detection that describes course content [15]. TF-IDF has two main elements. Firstly, Term Frequency (TF) measures the frequency of a word appearing in a course description relative to the total words in the description. Secondly, Inverse Document Frequency (IDF) assigns more importance to words that appear in fewer course descriptions so that commonplace words will not dominate the similarity analysis.

3.1.2. Bidirectional Encoder Representations from Transformers (BERT)

BERT is a deep learning NLP model that outperforms traditional feature extraction methods, such as TF-IDF, in terms of word meaning and context understanding [22]. Unlike TF-IDF, which considers word frequency, BERT learns word relationships within a sentence, such as nuances, synonyms, and sentence syntax. BERT creates high-dimensional numeric text embeddings in such a way that semantically related courses are closer to one another in the vector space, even though they might not share the exact keywords. One of the most significant features of BERT is that it is bidirectionally reading, i.e., left-to-right as well as right-to-left, to understand the general context of a sentence. This represents a significant leap from traditional keyword-based approaches, such as TF-IDF, which analyze words in isolation without considering their interrelationships [23].

3.2. Random Forest

Random Forest, as illustrated in Figure 1, is an ensemble learning method that constructs multiple decision trees and combines their predictions to enhance classification accuracy and mitigate overfitting. It can handle complex datasets and can expand as needed, making it widely used in machine learning tasks such as classification and regression. Random Forest relies on the concept of bagging (Bootstrap Aggregating), which involves training numerous decision trees on random subsets of the data and then aggregating the predictions of these trees to conclude. This enables the model to be effective in generalization across various datasets, thereby preventing overfitting, a common problem with single decision trees [24]. The final prediction is performed through a majority vote when classification is performed or through averaging when regression is performed. Random Forest is tolerant of noisy data, and it can express complex dependencies among features.

4. Methodology

This study is planned and organized to create an AI Course Recommender System by using a machine learning model. Figure 2 illustrates the main steps of the methodology, which include collecting data, preprocessing it, extracting features using the TF-IDF and BERT embeddings language model, training a Random Forest classifier to enhance accuracy, training a content-based model, evaluating the model, and developing the frontend using the Flask web application.

4.1. Data Collection

The dataset for this study was collected from Udemy, a well-known online learning platform, and was also gathered using Instant Data Scraper, a web scraping tool. The dataset consisted of AI-related courses carefully selected to enable the development of a content-based AI Course Recommender System. With the scraping tool, only between 100 and 200 courses could be extracted. Due to this limitation, data collection was conducted over ten separate scraping sessions. The collected files were then combined manually into a single, complete dataset. This dataset contains important attributes that provide us with helpful information about AI courses, including title, description, instructor name, and level. Table 1 reports the number and percentage of courses in each classification category, revealing a moderate imbalance (e.g., higher proportion of “Beginner” courses compared with “Advanced”). Figure 3 presents the distribution of course levels, ratings, and price ranges, highlighting potential platform-specific biases inherent to Udemy data. A clustering visualization of the entire dataset using t-SNE and K-Means on BERT embeddings is also provided to illustrate structural relationships across courses.
It also features key metrics such as course ratings, reviews, price, and duration, which are crucial for determining a course’s relevance. On a scale from 1 to 5, the average rating of a course shows how satisfied users are with the course, as in Table 2 below, which introduces a raw dataset of AI Courses for 2238 rows and 12 columns extracted from Udemy. This information outlines the requirements for building the Recommender System. In the following parts of the study, feature extraction, similarity computation, and model training will all be based on the collected dataset.

4.2. Data Preprocessing

Data preparation is key to ensuring that the dataset is clean, structured, and prepared for feature extraction and model training. The unprocessed dataset had missing values, duplicate entries, confused words, and numerical errors. The issues were overcome via the Python language. The preprocessing methods employed in this research are explained in Figure 4.
  • Loading the Dataset: Loading the dataset using Python 3.11.3 (Python Software Foundation, Wilmington, NC, USA) and the Pandas library. This allowed for the handling of course attributes, such as name, description, instructor name, price, and rating. This step prepares the dataset for further processing.
  • Data Cleaning: Several steps were implemented to ensure the dataset was consistent and accurate, including text normalization, removal of extra spaces and special characters, and standardization of numerical fields. To enable speedier processing, these procedures centered on standardizing textual and numerical values.
  • Handling Missing Values: While processing the dataset, we identified missing information, including instructor names and course descriptions, which required correction.
If an instructor’s name was missing, the course was marked as “Unknown”.
If a course description was unavailable, it was labeled as “No description available”. These corrections ensured dataset completeness and prevented issues caused by empty fields.
4.
Removing Duplicates: A few courses were duplicated due to the combined sessions of scraping. Initially, Python’s drop_duplicates() function was used to remove duplicate values, but some courses still appeared twice because they had minor differences in other columns. Since those courses also shared the same URL, we removed duplicates based on their URLs (course-URL). That way, we ensured that each course appeared only once. Cleaning up these duplicates guaranteed that the dataset was accurate and well-structured.
5.
Text Preprocessing: For the textual processing of the data, we employed natural language preprocessing to calculate similarities between the course titles and descriptions. The techniques were the following:
  • Lemmatization and tokenization: the course names were tokenized into individual words and lemmatized for the minimization of the words into the most fundamental form (e.g., “learning” → “learn”).
  • Stop word removal: the universally present non-descriptive words within the texts’ context were eliminated to make the dataset more informative and efficient for machine learning models.
These preprocessing steps were crucial for the dataset’s readiness for content-based models, such as BERT-based embeddings and TF-IDF, utilizing the cosine measure to recommend courses effectively.
6.
Final dataset and export: The cleaned dataset will be used for feature extraction and the development of the recommendation model. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

4.3. Feature Extraction

Feature extraction is a crucial process in the AI Course Recommender System, transforming course names and course descriptions into structured numerical features from unstructured data. Three feature extraction techniques were employed: TF-IDF vectorization for keyword-based relevance, BERT embeddings for deep semantic understanding, and cosine similarity to compute similarity. A concatenation-based approach was used to construct training data for the Random Forest model.

4.3.1. Term Frequency–Inverse Document Frequency (TF-IDF)

The Textual data about a course was transformed through the Term Frequency–Inverse Document Frequency (TF-IDF) technique to suggest content-based recommendations in the AI Course Recommender System. Keywords and contextual data, i.e., main keywords and description, were preserved due to the concatenation of course titles and descriptions into a single textual representation. To improve the data, a preprocessing step was employed to normalize the text string by converting it to lowercase and removing punctuation marks, non-alphanumeric characters, and stop words.
Subsequently, the processed text corpus was subjected to TfidfVectorizer of Scikit-learn with the following configuration parameters: ngram_range = (1, 2) to combine both unigrams and bigrams to achieve a better semantic implementation; min_df = 2 and max_df = 0.9 to remove the terms that were either rare or too common; and stop_words = ‘english’ to remove the non-informative tokens. Computing pairwise cosine similarity scores on the resulting sparse TF-IDF matrix, the system could compare the textual features of the courses and find how similar their content seemed.

4.3.2. BERT Embeddings

The various numbers of semantic relations that course descriptions imply were encoded using Bidirectional Encoder Representations from Transformers (BERT) embeddings. Such contextualized vector representations of words do not simply conduct keyword matching but rather provide a significantly better understanding of the textual meaning. To return similar courses by calculating the cosine similarity after embedding with BERT [25]. This similarity measure on the TF-IDF vectors produced values in [0, 1], where 0 meant total dissimilarity and one meant similarity. The cosine similarity measure was leveraged in a two-stage process: initially, to filter and rank candidate course recommendations based on textual relevance, and subsequently, to construct labeled training data for a Random Forest classifier, thereby enhancing the accuracy of the downstream recommendation model. Ground truth labels were defined using a dual criterion: (1) high textual similarity between courses via TF-IDF and BERT embeddings, and (2) matching or closely related course classification and level. This ensures that recommendations are pedagogically aligned and not solely based on topic similarity.

4.3.3. Fuzzy Matching for User Queries

To handle misspellings and inconsistencies, Fuzzy Wuzzy was used to match user inputs with existing course names. Effective data preparation is crucial to guarantee that the dataset remains clean, well-structured, and suitably organized for feature extraction and subsequent model training. The unprocessed dataset had missing values, duplicate entries, confused words, and numerical errors. The issues were overcome via the Python language. The preprocessing methods employed in this research are explained in Figure 4.

4.4. Random Forest Model

To further improve course recommendation accuracy from simple similarity estimations, a Random Forest classifier was integrated into the recommendation pipeline. It was tasked only with predicting whether two courses were similar or not, based on already-derived numerical features. The Random Forest classifier was applied with 100 decision trees, a maximum depth of 12, minimum samples per split of 2, and a minimum sample per leaf of 1, with parameters optimized through a grid search on an internal validation set. Stratified 10-fold cross-validation was employed to ensure balanced representation of classes and minimize overfitting risks. A random state of 42 was utilized for reproducibility. All hyperparameter settings for TF-IDF, BERT, and Random Forest are reported in Table 3 for transparency. The model was trained by fitting a classifier to the training data. Subsequently, an evaluation was conducted on the withheld test dataset to thoroughly assess predictive performance and extrapolate the results to new data.
TF-IDF parameters tuned (ngram_range, min df, max df) according to a 20-validation split grid test on the strength of the accuracy versus the cost of computing. Using 10-fold cross-validation, parameters in Random Forest (n_estimators, max_depth) have been tuned in order to eliminate overfitting at the cost of efficiency. A variant of the BERT model was chosen according to the trade-offs between the accuracy of semantic similarity and run time, and all-MiniLM-L6-v2 provided the best results in this use-case scenario.

4.5. Content-Based Filtering

Content-based filtering is among the most common recommendation approaches, which suggests items based on their intrinsic properties rather than user behavior [26]. In contrast to collaborative filtering, which is user-behavior-oriented, content-based filtering examines course attributes to identify similarities and, therefore, is particularly suitable for applications where user interaction data are limited, as shown in Figure 5. Feature extraction is a crucial process in the AI Course Recommender System, transforming course names and course descriptions into structured numerical features from unstructured data. Three feature extraction techniques were employed: TF-IDF vectorization for keyword-based relevance, BERT embeddings for deep semantic understanding, and cosine similarity to compute similarity. A concatenation-based approach was used to construct training data for the Random Forest model.
Content-based filtering was employed to recommend AI courses based on their textual descriptions. For the course-name and course-caption features, course names and descriptions were merged into a single feature, known as combined content. This means that both title keywords and description context will be utilized during the recommendation process. Content-based filtering was chosen in this research due to its scalability, flexibility, and precision in recommending courses.

4.6. Evaluation Metrics

The performance of the AI Course Recommender System is evaluated on various metrics to decide the effectiveness and accuracy of the course recommendations. These evaluation metrics are crucial for ensuring that the recommender system provides accurate and relevant course recommendations [19,27,28,29]. The following standard performance measures were used to quantify the performance of the recommender system:
  • Accuracy: The number of relevant courses predicted correctly divided by the total number of courses. It provides a rough estimate of the system’s performance. The formula for accuracy is
A c c u r a c y = T P + T N T P + F N + F P + T N
  • Precision: The number of recommended courses that were relevant. Precision is the proportion of true positives to all the predicted positive cases. The formula is
P r e c i s i o n = T P T P + F P
  • Recall: A metric of the number of correct courses that were correctly recommended. It is the number of true positives divided by the total number of actual correct courses. The formula is
R e c a l l = T P T P + F N
  • F1-Score: A blend of precision and recall, which provides a trade-off between the two. It is most effective when you need to evaluate the model’s performance, particularly when both false positives and false negatives are high. The formula is
F 1 - S c o r e = 2 . P r e c i s i o n   .   R e c a l l P r e c i s i o n +   R e c a l l
  • Mean Squared Error (MSE): Computes the mean of the squared difference between the predicted and true relevance values of the courses. It is used to estimate the extent to which the recommendations deviate from the actual expected relevance. The formula is
M S E = 1 n   i = 1 n ( y p e r d   y t r u e ) 2
  • Mean Absolute Error (MAE): approximates the mean of the absolute error between the estimated and actual values of relevance.
M A E = 1 n   i = 1 n | y p e r d   y t r u e |  
  • Mean Relative Error (MRE): MRE finds the difference between expected values and actual values, showing it as the number of times the expected value differs. It is good when you need to see the size of the error relative to the real values, not the error itself.
M R E = 1 n i = 1 n y i y ^ i y i
Classification metrics (Precision, Recall, F1-Score) were used to assess correctness of recommendations, while regression metrics (MSE, MAE, MRE) were applied to evaluate similarity score prediction accuracy. Additionally, ranking metrics—Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP)—were incorporated to assess the quality of recommendation ordering, yielding NDCG = 0.947 and MAP = 0.912.

4.7. Flask Web Application

To provide users with an interactive experience, the AI Course Recommender System was deployed as a web-based application using Flask, a lightweight Python web framework well-suited for seamless integration with machine learning models and frontend interfaces. The system architecture follows a three-tiered design: (1) the frontend, developed with HTML, CSS, and Bootstrap, which features a user-friendly input form that allows users to enter the name of a course of interest; (2) the Flask-based backend handles HTTP requests, manages the recommendation workflow, and serves the results; and (3) the model integration layer combines BERT embeddings, a pre-trained Random Forest classifier, and a TF-IDF-based similarity model to generate relevant course recommendations as shown in Figure 6. Upon receiving user input, the system employs fuzzy string matching (via Fuzzy Wuzzy) to correct potential misspellings and map the input to the closest matching course title. The recommendation engine then computes similarity scores and returns the top five most relevant courses, each accompanied by metadata such as instructor name, cost, rating, number of reviews, and a similarity percentage score, thereby enhancing the user’s decision-making process.

5. Results and Discussion

This section presents the results of the AI Courses Recommender System, including feature extraction, model training, and the performance of recommendations, which provides an evaluation of the system’s accuracy in terms of TF-IDF, BERT embeddings, and a Random Forest classifier. It will demonstrate a comparison between different methods used for recommendations.

5.1. Results of the Feature Extraction and Representation

We extract features with the course name and course descriptions. The system will be able to make accurate recommendations based on course content.

5.1.1. TF-IDF (Term Frequency-Inverse Document Frequency)

TF-IDF vectorization produced a matrix that represents the most important terms for each course. Commonly used terms, such as AI, machine learning, and Python, received high TF-IDF scores, indicating their relevance to the entire dataset. The most significant terms have been retrieved and used to define relevance for individual courses about a given user query. Figure 7 illustrates the top 25 most significant terms that the TF-IDF method extracts. Such terms would help clarify the keywords that are most important for distinguishing course contents in the system and thus, its recommendations.

5.1.2. BERT (Bidirectional Encoder Representations from Transformers) Embeddings

BERT embeddings were generated using a pre-trained BERT model, resulting in dense vector representations for each course. These embeddings capture much deeper meanings and relationships between courses, even when different phrasings are used in their content. By using cosine similarity, we visualized the similarity between courses based on their BERT embeddings, which further enhanced the performance of our recommendation system. Figure 8 shows the visualization of cosine similarity between five courses using BERT embeddings. The heatmap reveals how courses with similar content are grouped, improving the recommendation system’s ability to suggest courses based on a deeper understanding of their content.

5.2. Model Training and Evaluation

The model was trained using a Random Forest classifier, utilizing TF-IDF and BERT embeddings to predict course relevance based on cosine similarity. Key performance criteria were the primary focus of the study, which evaluated the efficiency and accuracy of the recommendation system. As shown in Figure 9, the model achieves a 91.25% accuracy rate, demonstrating its ability to suggest relevant courses effectively.

Performance Analysis

Performance evaluation metrics of the introduced model are depicted in Figure 10, including Precision, Recall, F1-Score, Mean Squared Error (MSE), and Mean Relative Error (MRE). Together, these metrics evaluate regression reliability and classification accuracy, offering insight into the model’s performance. With a high Precision of 96.63, the system showed that most of the recommendations were, in fact, appropriate to the user’s input. In recommendation systems, where it is costly to display ineffective items, this is crucial. The system effectively recovered a significant percentage of all relevant courses, as indicated by a recall score of 85.57. However, there is still a need for improvement. A balanced performance between accuracy and completeness in course retrieval was demonstrated by the F1-Score, which is a harmonic mean of Precision and Recall, which was 90.77. A recommender system tries to assess the similarity between courses. MSE and MAE describe the overall difference, but MRE helps you find mistakes for each similarity group and thus, allows you to keep the quality of the ranking in recommendations. The Mean Squared Error (MSE) for the regression-based evaluation was determined to be 0.1012, indicating a minimal variance between the actual and displayed course similarity scores. In addition, the Mean Relative Error (MRE) of 0.6257 indicates possible areas for improvement in the estimation of similarity values.
This suggests that even if the model classifies relevant courses correctly, it may be able to improve the accuracy of similarity score predictions through tuning, particularly for use cases that require precise ranking. The hybrid model, BERT embeddings, TF-IDF similarity, and a Random Forest classifier show generally practical classification skills. These results highlight the system’s potential for use in actual educational platforms and validate its reliability in providing relevant course suggestions.

5.3. Deployment of the Flask Web Application

A Course Recommender System is implemented using a web-based Flask application that generates real-time recommendations and dynamic interaction with users. This section describes the visualization of recommendation results and model evaluation metrics. Upon receiving a course name or keywords related to it, the input is processed by an integrated TF-IDF and BERT model to provide recommendations regarding the course. Consequently, an update is made dynamically on the interface displaying the results and model performance metrics of relevance, including course details such as the instructor, price, ratings, and similarity score. Figure 11 illustrates the application’s interface after a query is processed, presenting the results and associated performance metrics. This highlights the system’s responsiveness as well as its capability to provide users with instant recommendations.

5.4. Comparing the Proposed System with Other Recommenders

To better demonstrate the effectiveness of the AI Course Recommender System, we conducted comparisons with other existing systems designed for educational or course recommendation settings. All the models were trained on the same Udemy AI courses dataset and given identical preprocessing and test splits. Table 4 presents the results of Accuracy, F1-Score, Mean Squared Error, and the number of trainable parameters for each compared model. The baseline set has been expanded to include DeepFM, AutoLFA, and DECOR, ensuring a representative coverage of recent state-of-the-art hybrid approaches. All baselines were implemented and evaluated on the same dataset and preprocessing pipeline for fair comparison. Statistical significance testing (bootstrap and McNemar’s test) with 95% confidence intervals (CIs) was conducted to validate observed differences.
Table 4 shows that the highest values of the performance metrics were achieved with the proposed system, with an accuracy of 91.25, an F1-Score of 90.77, and the lowest MSE of 0.10. When comparing the results of the systems in [6,8,30], the findings denote that the suggested method not only causes an enhancement in predictive correctness but improves the balance between precision and recall and also minimizes prediction error.
The hybrid proposal yields better results than traditional collaborative filtering, which relies heavily on users’ past actions. It works exceptionally well when you have to handle a lot of new events at once, such as when AI course platforms are used every day. Despite utilizing deep learning, DeepFM requires a substantial amount of interaction data to realize its potential fully. Instead, our approach utilizes context-focused keyword relevance (TF-IDF) in conjunction with BERT-generated semantic embeddings, as well as Random Forest classification. As a result, it achieves both good results and generalization, without requiring extensive user history.

5.5. Use Case Scenario

To demonstrate the practical use of the proposed system, a preliminary user study was carried out with five undergraduate students without any knowledge of AI. Specifically, each participant was required to fill in a few specific keywords of interest that he or she was learning (e.g., “deep learning”, “AI for beginners”, “Python for ML”) within the web-based system. The system found relevant AI courses in seconds, with each user including at least one course from the top five recommendations. Feedback was gathered via a brief Likert-scale form (1–5), asking students to rate ease of use, relevance of recommendations, and interface. A summary of the participants’ responses is shown in Table 5. The average satisfaction score was 4.6/5 with majority of users finding the recommendations “very relevant” to their goal. These preliminary evaluations evidence good usability and foster deployment in academic advising systems or university course portals.

6. Conclusions and Future Works

We introduced a hybrid AI course recommendation model which uses machine learning algorithms to improve personalized course recommendation. The system integrates TF-IDF for lexical-based features representation and BERT embeddings for deep contextual-based semantics, and classification is addressed using a Random Forest model. The proposed model attained 91.25% accuracy, 96.63% precision with an F1-Score of 90.77%, demonstrating the effectiveness of this model in matching user input with the relevant course. A feature-rich web application built with Flask, offering dynamic, interactive course recommendations in real-time via an intuitive user interface. One of the main limitations of this work is the use of a single source dataset extracted from Udemy, which might reduce the diversity of the model that can be generalized. Training Random Forest classifiers on such data may lead to overfitting, and this in turn may influence recommendations of courses on unseen and wider topics. One further limitation concerns the user study, which was conducted with a small pilot group of five undergraduate participants. While this served as a preliminary usability test, future work should expand the study to a larger and more diverse set of users, enabling statistical validation and stronger generalizability of the findings. Next, one could explore enriching the dataset by adding courses from different platforms of online learning or introducing regularization and cross-validation to improve the robustness of the models or incorporating more sophisticated transformer-based models (e.g., fine-tuned BERT or GPT) to increase the adaptability of recommendations.

Author Contributions

Conceptualization, A.N.H.; methodology, A.N.H., S.K.A. and R.S.A.; software, A.N.H. and R.S.A.; validation, M.L.S., R.S.A. and S.K.A.; formal analysis, A.N.H. and S.K.A.; investigation, A.N.H. and S.K.A.; resources, S.K.A. and M.L.S.; data curation, S.K.A. and R.S.A.; writing— original draft preparation, A.N.H.; writing—review and editing, S.K.A., R.S.A. and M.L.S.; visualization, M.L.S. and R.S.A.; supervision, A.N.H.; project administration, A.N.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors thank the undergraduate students who conducted the pilot survey.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gm, D.; Goudar, R.H.; Kulkarni, A.A.; Rathod, V.N.; Hukkeri, G.S. A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access 2024, 12, 34019–34041. [Google Scholar] [CrossRef]
  2. Yurchenko, A.; Drushlyak, M.; Sapozhnykov, S.; Teplytska, A.; Koroliova, L.; Semenikhina, O. Using Online IT-Industry Courses in Computer Sciences Specialists’ Training. Int. J. Comput. Sci. Netw. Secur. 2021, 21, 97–104. [Google Scholar] [CrossRef]
  3. Madhavi, A.; Nagesh, A.; Govardhan, A. A study on E-Learning and recommendation system. Recent Adv. Comput. Sci. Commun. Former. Recent Pat. Comput. Sci. 2022, 15, 748–764. [Google Scholar] [CrossRef]
  4. Urdaneta-Ponte, M.C.; Mendez-Zorrilla, A.; Oleagordia-Ruiz, I. Recommendation systems for education: Systematic review. Electronics 2021, 10, 1611. [Google Scholar] [CrossRef]
  5. Algarni, S.; Sheldon, F. Systematic Review of Recommendation Systems for Course Selection. Mach. Learn. Knowl. Extr. 2023, 5, 560–596. [Google Scholar] [CrossRef]
  6. Hassan, R.H.; Hassan, M.T.; Sameem, M.S.I.; Rafique, M.A. Personality-Aware Course Recommender System Using Deep Learning for Technical and Vocational Education and Training. Information 2024, 15, 803. [Google Scholar] [CrossRef]
  7. Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. CSUR 2019, 52, 1–38. [Google Scholar] [CrossRef]
  8. Zhong, M.; Ding, R. Design of a personalized recommendation system for learning resources based on collaborative filtering. Int. J. Circuits Syst. Signal Process. 2022, 16, 122–131. [Google Scholar] [CrossRef]
  9. Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A survey on knowledge graph-based recommender systems. IEEE Trans. Knowl. Data Eng. 2020, 34, 3549–3568. [Google Scholar] [CrossRef]
  10. Burke, R. Hybrid Recommender Systems: Survey and Experiments. User Model User-Adap. Inter. 2002, 12, 331–370. [Google Scholar] [CrossRef]
  11. Monsalve-Pulido, J.; Aguilar, J.; Montoya, E.; Salazar, C. Autonomous recommender system architecture for virtual learning environments. Appl. Comput. Inform. 2024, 20, 69–88. [Google Scholar] [CrossRef]
  12. Chen, W.; Shen, Z.; Pan, Y.; Tan, K.; Wang, C. Applying machine learning algorithm to optimize personalized education recommendation system. J. Theory Pract. Eng. Sci. 2024, 4, 101–108. [Google Scholar] [CrossRef]
  13. Tian, Y.; Zheng, B.; Wang, Y.; Zhang, Y.; Wu, Q. College library personalized recommendation system based on hybrid recommendation algorithm. Procedia Cirp 2019, 83, 490–494. [Google Scholar] [CrossRef]
  14. Dai, Y.; Takami, K.; Flanagan, B.; Ogata, H. Beyond recommendation acceptance: Explanation’s learning effects in a math recommender system. Res. Pract. Technol. Enhanc. Learn. 2024, 19, 1–21. [Google Scholar] [CrossRef]
  15. Usman, A.; Roko, A.; Muhammad, A.B.; Almu, A. Enhancing personalized book recommender system. Int. J. Adv. Netw. Appl. 2022, 14, 5486–5492. [Google Scholar] [CrossRef]
  16. Li, Q.; Kim, J. A Deep Learning-Based Course Recommender System for Sustainable Development in Education. Appl. Sci. 2021, 11, 8993. [Google Scholar] [CrossRef]
  17. Guruge, D.B.; Kadel, R.; Halder, S.J. The State of the Art in Methodologies of Course Recommender Systems—A Review of Recent Research. Data 2021, 6, 18. [Google Scholar] [CrossRef]
  18. Lee, E.L.; Kuo, T.T.; Lin, S.D. A Collaborative Filtering-Based Two Stage Model with Item Dependency for Course Recommendation. In Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, 19–21 October 2017; pp. 496–503. [Google Scholar] [CrossRef]
  19. Ghatora, P.S.; Hosseini, S.E.; Pervez, S.; Iqbal, M.J.; Shaukat, N. Sentiment Analysis of Product Reviews Using Machine Learning and Pre-Trained LLM. Big Data Cogn. Comput. 2024, 8, 199. [Google Scholar] [CrossRef]
  20. Ramzan, B.; Bajwa, I.S.; Jamil, N.; Amin, R.U.; Ramzan, S.; Mirza, F.; Sarwar, N. An intelligent data analysis for recommendation systems using machine learning. Sci. Program. 2019, 2019, 5941096. [Google Scholar] [CrossRef]
  21. Thakkar, A.; Chaudhari, K. Predicting stock trend using an integrated term frequency–inverse document frequency-based feature weight matrix with neural networks. Appl. Soft Comput. 2020, 96, 106684. [Google Scholar] [CrossRef]
  22. Zalte, J.; Shah, H. Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model. Comput. Intell. 2024, 40, e12692. [Google Scholar] [CrossRef]
  23. Selva Birunda, S.; Kanniga Devi, R. A Review on Word Embedding Techniques for Text Classification. In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 267–281. [Google Scholar] [CrossRef]
  24. Parmar, A.; Katariya, R.; Patel, V. A Review on Random Forest: An Ensemble Classifier. In International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 758–763. [Google Scholar] [CrossRef]
  25. Pawar, A.; Patil, P.; Hiwanj, R.; Kshatriya, A.; Chikmurge, D.; Barve, S. Language Model Embeddings to Improve Performance in Downstream Tasks. In Proceedings of the 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN), Indore, India, 22–23 December 2024; pp. 1097–1101. [Google Scholar]
  26. Javed, U.; Shaukat, K.; Hameed, I.A.; Iqbal, F.; Alam, T.M.; Luo, S. A review of content-based and context-based recommendation systems. Int. J. Emerg. Technol. Learn. 2021, 16, 274–306. [Google Scholar] [CrossRef]
  27. Sultan, L.R.; Abdulateef, S.K.; Shtayt, B.A. Prediction of student satisfaction on mobile learning by using fast learning network. Indones. J. Electr. Eng. Comput. Sci. 2022, 27, 488–495. [Google Scholar] [CrossRef]
  28. Kiran, R.; Kumar, P.; Bhasker, B. DNNRec: A novel deep learning based hybrid recommender system. Expert Syst. Appl. 2020, 144, 113054. [Google Scholar] [CrossRef]
  29. Shuwandy, M.L.; Alasad, Q.; Hammood, M.M.; Yass, A.A.; Abdulateef, S.K.; Alsharida, R.A.; Qaddoori, S.L.; Thalij, S.H.; Frman, M.; Kutaibani, A.H.; et al. A Robust Behavioral Biometrics Framework for Smartphone Authentication via Hybrid Machine Learning and TOPSIS. J. Cybersecur. Priv. 2025, 5, 20. [Google Scholar] [CrossRef]
  30. San, K.K.; Win, H.H.; Chaw, K.E.E. Enhancing Hybrid Course Recommendation with Weighted Voting Ensemble Learning. J. Future Artif. Intell. Technol. 2025, 1, 337–347. [Google Scholar] [CrossRef]
Figure 1. Random Forest classifier. The blue node represents the new input sample. Green nodes indicate the decision path and selected leaf nodes contributing to the final prediction. Grey nodes represent the remaining internal nodes not involved in the decision.
Figure 1. Random Forest classifier. The blue node represents the new input sample. Green nodes indicate the decision path and selected leaf nodes contributing to the final prediction. Grey nodes represent the remaining internal nodes not involved in the decision.
Computers 14 00353 g001
Figure 2. Steps of the methodology.
Figure 2. Steps of the methodology.
Computers 14 00353 g002
Figure 3. Distribution of courses across levels in the Udemy dataset.
Figure 3. Distribution of courses across levels in the Udemy dataset.
Computers 14 00353 g003
Figure 4. Data Preprocessing Steps.
Figure 4. Data Preprocessing Steps.
Computers 14 00353 g004
Figure 5. Content-based Recommender System.
Figure 5. Content-based Recommender System.
Computers 14 00353 g005
Figure 6. Architecture of the Relevant Course Recommendation System.
Figure 6. Architecture of the Relevant Course Recommendation System.
Computers 14 00353 g006
Figure 7. Top 25 terms extracted via TF-IDF.
Figure 7. Top 25 terms extracted via TF-IDF.
Computers 14 00353 g007
Figure 8. Course content similarity using BERT embeddings.
Figure 8. Course content similarity using BERT embeddings.
Computers 14 00353 g008
Figure 9. Model accuracy visualization.
Figure 9. Model accuracy visualization.
Computers 14 00353 g009
Figure 10. Model performance metrics.
Figure 10. Model performance metrics.
Computers 14 00353 g010
Figure 11. The application’s interface.
Figure 11. The application’s interface.
Computers 14 00353 g011
Table 1. Distribution of Courses by Level.
Table 1. Distribution of Courses by Level.
Classification CategoryNumber of CoursesPercentage (%)
Beginner112050.09%
Intermediate68030.38%
All Levels32014.30%
Advanced1185.23%
Total2238100%
Table 2. Raw Dataset of AI Courses.
Table 2. Raw Dataset of AI Courses.
No.Course-Card-ImageCourse-URLCourse-NameCourse-CaptionCourse-InstructorCourse-PriceCourse-ReviewsCourse-HoursCourse-LecturesCourse-LevelCourse-RatingCourse-Classification
1https://img-c.udemycdn.com/course/240x135/2284943_3b99_2.jpg
(accessed on 2 March 2025)
https://www.udemy.com/course/ibm-watson-for-artificial-intelligence-cognitive-computing/?couponCode=MT260825G1 (accessed on 2 March 2025)IBM Watson for Artificial IntelligenceBuild smart, AI, and ML applications and so onPackt Publishing$69.998215 total hours77Beginner3.4Cognitive Computing
2https://img-c.udemycdn.com/course/240x135/3978988_25d9_3.jpg 
(accessed on 2 March 2025)
https://www.udemy.com/course/cognitive-behavioural-therapy/?couponCode=MT260825G1 (accessed on 2 March 2025)Cognitive Behavioral Become a Certified Behavioral Kain Ramsay$79.993554831.5 total hours121All Levels4.6Cognitive Computing
2237https://img-c.udemycdn.com/course/240x135/5282600_e201_2.jpg
(accessed on 5 March 2025)
https://www.udemy.com/course/developing-implementing-employee-recognition-programs/?couponCode=MT260825G1 (accessed on 5 March 2025)Mastering EmployeeDesign & Implement Effective EmployeeGenMan Solutions$19.991062 total hours27All Levels4.2Speech Recognition
2238https://img-c.udemycdn.com/course/240x135/2881844_69a7_4.jpg
(accessed on 13 March 2025)
https://www.udemy.com/course/sentiment-analysis-beginner-to-expert/?couponCode=MT260825G1 (accessed on 13 March 2025)Sentiment AnalysisSentiment AnalysisTaimoor khan$49.99928.5 total hours79All Levels4.2Speech Recognition
Table 3. Parameter Settings for TF-IDF, BERT, and Random Forest.
Table 3. Parameter Settings for TF-IDF, BERT, and Random Forest.
Component ProcedureParametersValueTuning
BERTModel name
Embedding size
All-Mini-L6-V2
384 dimensions
Determined by the model architecture
TF-IDFngram_range(1, 2)Bigram enhanced contexts
min_df2Tested min_df ∈ {1, 2, 5}
max_df0.9Tested max_df ∈ {0.85, 0.9, 0.95}
Random forestn_estimators100100 balanced performance/speed
random_state42Ensures reproducibility
criterionGini impurityDefault
max_depthNoneThe tree should grow as large as possible
Table 4. A comparative analysis between the proposed model and the state-of-the-art based on different measures.
Table 4. A comparative analysis between the proposed model and the state-of-the-art based on different measures.
StudyAccuracy (95% CI)F1-ScoreMSE
Proposed system91.25 [88.79%, 93.95%]90.770.10
[6]91 [88.37%, 93.51%]79.40.18
[8]81.9 [78.48%, 85.65%]86.70.13
[30]51.1 [46.41%, 55.83%]45.7 -
Table 5. Summary of questionnaire responses from pilot participants (1 = lowest, 5 = highest).
Table 5. Summary of questionnaire responses from pilot participants (1 = lowest, 5 = highest).
ParticipantEase of Use (Q1)Relevance
(Q2)
Satisfaction
(Q3)
Student 1555
Student 2454
Student 3544
Student 4445
Student 5555
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hasoon, A.N.; Abdulateef, S.K.; Abdulameer, R.S.; Shuwandy, M.L. An Intelligent Hybrid AI Course Recommendation Framework Integrating BERT Embeddings and Random Forest Classification. Computers 2025, 14, 353. https://doi.org/10.3390/computers14090353

AMA Style

Hasoon AN, Abdulateef SK, Abdulameer RS, Shuwandy ML. An Intelligent Hybrid AI Course Recommendation Framework Integrating BERT Embeddings and Random Forest Classification. Computers. 2025; 14(9):353. https://doi.org/10.3390/computers14090353

Chicago/Turabian Style

Hasoon, Armaneesa Naaman, Salwa Khalid Abdulateef, R. S. Abdulameer, and Moceheb Lazam Shuwandy. 2025. "An Intelligent Hybrid AI Course Recommendation Framework Integrating BERT Embeddings and Random Forest Classification" Computers 14, no. 9: 353. https://doi.org/10.3390/computers14090353

APA Style

Hasoon, A. N., Abdulateef, S. K., Abdulameer, R. S., & Shuwandy, M. L. (2025). An Intelligent Hybrid AI Course Recommendation Framework Integrating BERT Embeddings and Random Forest Classification. Computers, 14(9), 353. https://doi.org/10.3390/computers14090353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop