Toward Social Media Content Recommendation Integrated with Data Science and Machine Learning Approach for E-Learners

: Electronic Learning (e-learning) has made a great success and recently been estimated as a billion-dollar industry. The users of e-learning acquire knowledge of diversiﬁed content available in an application using innovative means. There is much e-learning software available—for example, LMS (Learning Management System) and Moodle. The functionalities of this software were reviewed and we recognized that learners have particular problems in getting relevant recommendations. For example, there might be essential discussions about a particular topic on social networks, such as Twitter, but that discussion is not linked up and recommended to the learners for getting the latest updates on technology-updated news related to their learning context. This has been set as the focus of the current project based on symmetry between user project speciﬁcation. The developed project recommends relevant symmetric articles to e-learners from the social network of Twitter and the academic platform of DBLP. For recommendations, a Reinforcement learning model with optimization is employed, which utilizes the learners’ local context, learners’ proﬁle available in the e-learning system, and the learners’ historical views. The recommendations by the system are relevant tweets, popular relevant Twitter users, and research papers from DBLP. For matching the local context, proﬁle, and history with the tweet text, we recognized that terms in the e-learning system need to be expanded to cover a wide range of concepts. However, this diversiﬁcation should not include such terms which are irrelevant. To expand terms of the local context, proﬁle and history, the software used the dataset of Grow-bag, which builds concept graphs of large-scale Computer Science topics based on the co-occurrence scores of Computer Science terms. This application demonstrated the need and success of e-learning software that is linked with social media and sends recommendations for the content being learned by the e-Learners in the e-learning environment. However, the current application only focuses on the Computer Science domain. There is a need for generalizing such applications to other domains in the future.


Introduction
The recommendation system (RS) creates possible options for users based on user interest [1]. The proposed recommendation system is based on information which the user gave to the system in the past. The given information may have many ratings which show that the aim of the user is to get information from a particular domain-e.g., research area, documents, tweets, etc. Based on the recommendation system architecture, the given information can be used as training data, either supervised learning or unsupervised learning-e.g., clustering or document classification problems [2,3].
In recent years, the recommendation system becomes a popular engine to implement on many websites to find the preference of users. Two main techniques which present this system are known as content-based recommendation (CB) and collaborative filtering recommendation (CF) [4][5][6]. Comparing these two systems, CF is the most often used technique. The CB system is processed as recommending similar options to the user based on user choices, and extract the target from user input information which is accessible to the user profile and also the output profile [7].
The DBLP is known as the Digital Bibliographic Library Project, which indexes more than 2 million research publications. The dump of the website is freely available for download. This dataset will help to identify top-rated articles based on learners profile, context, and history. Twitter is a microblogging website where the social community generates a large number of small text messages of 140 characters. It is noted that, daily, more than 300 million tweets are generated by the social community, expressing the opinions and sentiments related to different things. The proposed system will identify the top-rated and most popular Twitter user relevant to the user's profile. For this, the system will perform network analysis and identify the most popular Twitter users using different social network analysis techniques, such as by measuring in-degree centrality. The centrality of a node describes its popularity. These users are matched with the current learner activity. Not getting relevant recommendations from social networks for e-learners based on the user's context, historical data, and profiles are the problem statement.
The developed system provides recommendations from the social network for e-learners. The recommendations are based on the users' local context, profiles, and historical data. The system is a web application that searches the required data from the Web using different sources, such as the Twitter user network and DBLP. The developed system recommends top-ranked information from these sources depending upon the context, profile, and history of e-Learners. The application also provides an interface to e-learners, where they can share their content and can read the contents shared by other users, which is relevant to them. The reinforcement learning algorithm is presented as a recommendation platform in the proposed system. Reinforcement learning is a learning algorithm which works based on user feedback. The quality control of the system improves based on the recommendation rating. As we know, the social media contents are unstructured and have a lack of trust. Based on the main architecture of the reinforcement learning, when the user gets a good recommendation then the feedback will be good and the reward sent to system is good. In this case, the system is learning the user's request and the recommendations are also trained based on that. Similarly, if the user feedback is not good, then the system learns to avoid recommending bad articles. To look for the history of the recommendation system, first, it must find the solution to solving the problem of information overloading in Internet sources. Along with the number of uploaded files, Internet sources are relatively high, and users do not know how to control it and also spend more time and energy looking for and extracting the topic which they need. Based on the Eugene search result, the information overloading issue was discovered in the information retrieval system, and in 1950 was submitted by Moors [8]. In the proposed model, the recommendation system brings the following contribution for e-learners: The main contribution of this paper summarized as below: • A real-time system which provides top-ranked Twitter user networks to e-learners from Twitter, according to their context, history and profiles. • The proposed system recommends top-ranked articles according to the e-learner's context, e-learner's history, and e-learner's profiles from DBLP.

•
The system also makes recommendations to e-learners from a local database.

•
The main objective of this study is the use of data mining and machine learning approaches for social media content recommendation.
In this work, we proposed a social media content recommendation which provides the learning material to e-Learners. The developed system is a real-time application that identifies the required data from the Web using different sources, such as Twitter and DBLP. The designed system will recommend top-ranked information from Twitter and DBLP sources depending upon the context, profile, and history of e-Learners. The application will also provide research articles related to the users searched topic. Moreover, we have used data mining and machine learning approach to improve the accuracy of social media content recommendation. Reinforcement learning is used as a machine learning algorithm which combined with data mining techniques to extract the hidden knowledge from users tweets. Finally, we illustrate the constructiveness of reinforcement learning, which applied for prediction and recommendation of social media contents. The remainder of this paper is organized as follows: Section 2 gives the literature review of the recommendation system. Section 3 explains the data analysis for social media contents recommendation. Section 4 presents the predictive analysis of Twitter and DBLP dataset using a reinforcement learning algorithm. Section 5 presents the prediction result of Twitter and DBLP platform and, finally, we conclude the paper in Section 6.

Literature Review
In this section, we discuss the pros and cons of the existing recommendation system [9]. Moreover, we will also investigate the state-of-art approaches for Twitter recommendation, DBLP recommendation and recommendation based on reinforcement learning.

Recommendation System
The recommendation system is a service to help users for easy access to their request in different areas [10,11]. Tan and He [12] presented a procedure of physical resonance that is famous for resonance similarity (RES). This approach shows the comparison of superior prediction and traditional similarity based on user evaluation. Similarly, there are many IoT-based platforms, such as healthcare [13][14][15], indoor localization [16,17], and many other IoT systems [18][19][20][21][22], which have improving possibilities based on integration with the functionality of the recommendation system [23,24]. Hwang et al. [25] execute the hotel reviews for a hotel management system based on Trip Advisor review information and a Latent Dirichlet Allocation (LDA) semantic-based process to recognize and capture the performance of the Term Frequency-Inverse Document Frequency (TF-IDF) process. In the presented approach, all features related to hotels are extracted. The final results show that the LDA has less precision than word-based LDA. To make the System more accurate, Jannach et al. [26] presented regression-based and item-based recommendation. The developed recommendation systems with different researchers used collaborative filtering techniques and algorithms [27][28][29][30][31][32][33]. Collaborative filtering gets information based on user input knowledge and evaluates the relationship between different users to accomplish specific deductions of feature spaces.

Twitter Recommendation
Twitter is one of the social media platforms based on sharing, uploading user opinions and providing information about new studies, interests, etc. [34][35][36]. There are many research articles related to Twitter classification in various goals. Some tweet recommendation systems proposed Twitter as a reliable information spreader. Tweet recommendation and also Twitter users are the main research direction in this topic too [37]. Based on the proposed methodology, there are three main options to find the user influence on the Twitter platform that are named as followers, re-tweets and page rank [38]. Building the recommendation system of followers to find the differences between tweets and user profile, page rank or tweet rank estimate the efficiency of user influence to find the similarity between shared link and user profile structure. All the proposed methods in previous studies were based on the content-based recommendation to propose tweets without reflection of joint view. Tweet recommendation is to target user, using a latent factor, collaborate ranking and specific feature. User interest re-tweets are collected and estimated to establish user preference and make a recommendation. The latent factor is the improved version of collaborating ranking for ranking criterion. The latent factor is used as a parameter to increase the accuracy of system [39].

DBLP Recommendation
DBLP is one of the online and open source references for published articles in the computer science area. Based on the need for the user by visiting the DBLP website, it is comfortable and easy to access recently published or any specific articles. DBLP was developed from small experiments on web servers to famous open-data access servers in the computer science research area [40]. One of the critical parts in DBLP recommendation, based on users searching for information-related articles on their own interests, is also recommended. The user searching process is explained step by step in Figure 1. Articles contain complete information about authors, publication time, access pages, etc. [41].

Reinforcement Learning Recommendation
Recommendation, based on reinforcement learning (RL), is simplified to the Markov Decision Process (MDP). This model works as a long-run performance system. Most of the RL based systems have challenges from large-scale separated action space. There are some proposed systems for solving this issue, such as the strength of previous information about the actions around them, which generate proto-action using the k-nearest neighbor search system. This method rejects the dimensions from negative influences, which user does not care about, and replaces them through convenient action [42][43][44][45][46][47][48][49]. Moreover, MDP is used to model the recommendation process in RL. Compared with the Multi-Armed Bandit (MAB) based system, MDP cannot obtain the running frequency of reward. They try to define the state as an n-gram or model the item in MDP and define the action as the recommendation between items. This process cannot apply to the large datasets. If the candidate set item becomes more significant at the same time as the size of the state, the space also increases and transition data face sparsity problems and can just apply them on related parameters in a specified state [50][51][52][53][54]. Table 1 shows the comparison of various recommendation systems and their objectives and advantages. In the mentioned table, ten various recommendation systems are measured.

Social Media Content Recommendation for E-Learners
The proposed recommendation system is comprised of two main modules-a recommendation system and the predictive analysis of social media content recommendation.

E-Learners Recommendation System
The proposed recommendation system is comprised of three-parts-presentation layer, business layer, physical layer-which are shown in Figure 2. The presentation layer is responsible for exposing the services to the front end through the user interface. The business layer represents the core functionality of the recommendation system, which is categorized into two modules-i.e., the e-learning system and the reinforcement learning-based social media content recommendation system. The e-learning system is responsible for providing relevant recommendations from social media to the e-learner. The e-learner can get top-ranked articles on the Twitter user network, which is according to the user's interest. Similarly, the reinforcement learning-based social media content recommendation system is to use data mining and machine learning approach to improve the accuracy of social media content recommendation. Reinforcement learning is used as a machine learning algorithm, which is combined with data mining techniques to extract the hidden knowledge from users tweets. Lastly, the physical layer represents the back-end database, which is responsible for storing the data.
The data collection phase is one of the primary tasks in the knowledge discovery process. The knowledge discovery process identifies hidden patterns from an enormous amount of data. We perform knowledge discovery by identifying user profiles from the DBLP and Twitter website. We targeted published articles and uploaded tweets for our experiments and the data crawling process was customized accordingly. Table 2 shows the detailed information of collected Twitter and DBLP datasets. Figure 3 shows the class diagram of the data collection process.    In total, two open-access social media websites were selected for this process, which contain comments, tweets, short texts and research articles. A total of 70% of the dataset was used in the training set and 30% for the test set. The primary block diagram of the proposed system is shown in Figure 4. The designed block diagram of the proposed data and predictive analysis model based on Twitter and DBLP platform is composed of four main sections. The first section is designed as a data collection layer. The data collection section contains two social media platform datasets, named Twitter and DBLP library. The collected dataset from the Twitter platform includes tweets, projects, comments, photos, news and conferences. The collected dataset from the DBLP platform includes articles, publication access point, publication time and publication date. To process the collected dataset for further steps, two data analysis techniques were applied in this process which are in the second section or the pre-processing data section. Data analysis and predictive analysis technique was applied to the input dataset. The data analysis technique contains the time series analysis, statistical analysis, tweet analysis and article analysis. The predictive analysis contains reinforcement learning prediction techniques. The next section is the recommendation layer. It presents the output information of the previous steps and relevant recommendation results based on user preferences. The final section is the user feedback, which is the main point of this system to improve the quality of the recommendation. Based on using the reinforcement learning algorithm as a recommendation technique, the system learns from user positive and negative responses to the agent and by repeating this process, improving the system recommendation and trust quality.

Dataset
In this system, the collected dataset is from "Twitter social media platform records (Twitter API)" and "DBLP research library history" to analyze and explore the hidden information for improving the recommendation system. Data mining approaches and techniques were applied in the proposed dataset to clean and pre-process it to refine the performance and stability of dataset. Moreover, the following steps were performed to process a better service for e-learners on the social media platform: After managing the social media dataset and enterprising the information, data pre-processing was applied for further process for normalizing dataset and for keeping the necessary information. Data normalization was needed for changing the data form and structure to make it convenient for further steps. The following Table 3 presents the extracted information and features from a dataset.

Data Mining and Visualization
In the proposed system, we applied data mining techniques to determine the necessary and useful information from the dataset for a suitable recommendation, based on user interest. The mentioned analysis below exploits the collected dataset: Time series analysis applied in this process to produce the new information for article recommendation. The selected analysis is based on the date and time of sharing information which is available in the dataset. To inform the time series analysis, the duration of data is for (2019) crawled information from mentioned platforms. To start the analysis data segregated into two sections (monthly and daily) to produce the Twitter and DBLP frequency. Figures 5 and 6 present the daily and monthly basis of recommendation to e-learners. Daily basis records show the total record of the user activities in one day, and monthly basis shows the total record of the monthly user activities.

Twitter API Analysis Based on Profile Address
In this part, profile address analysis accomplishes analyzing the Twitter API dataset. To visualize Twitter API, based on profile address, street names are extracted from profiles and apply them as location labels to visualize the Twitter frequency. The following parameters-e.g., profile address and tweet topics-are used as inputs of visualization based on Twitter API profile addresses. The following Figure 7 describes the minimum and maximum updates based on profile addresses. The following address is randomly selected from the dataset. Each location presenting one area based on the profile addresses.

Discover Patterns and Features
Using data mining techniques improve the process to extract the hidden information from the generated results. Table 4 presents the details of the extracted features from a dataset. The prediction process in the proposed system causes improved system performance and, similarly, recommends highly related information to the e-learner.

Interaction Model for the Proposed Recommendation Platform
The work-flow of the proposed RL recommendation model is illustrated in Figure 8. The developed system is comprised of the technical infrastructure of the system. Tweets will be fetched from Twitter using Twitter API (tweet environment). Whenever any user uploads an article in the System, the System will remove the noise data from the title of the article. Afterwards, the system will make a list of words which contain the title words and terms related to these words. The grow bag will provide the related terms. Against each word from the list, tweets will be fetched and will be saved in the database. When any e-learner wants the recommendations from Twitter against any article, the System will make two lists of words. The first list will contain the interest and terms related to the interest of an e-learner. The second list will contain the local context and terms related to the local context of the e-learner. The system will get all the tweets which were saved in the database against that article. The system will match the sub-strings of each tweet with the words of both lists. Whenever any word matches with a tweet, one score will be added to the score of that tweet and will recommend the top tweet to the user.  Figure 8. System work-flow of the proposed recommendation platform.
Tweets are ranked by using the following Equation (1).
K LC is a list of keywords of local context. K UH is a list of keywords from the user history. K UP is a list of keyword from the user profile. W 1 , W 2 and W 3 are constants and are assigned based on parameter importance. RS i is the relevancy similarity of the tweet.

Predictive Analysis of Twitter and DBLP Data Using Reinforcement Learning
The availability of a considerable amount of digital articles and tweets pose a challenge to discover highly relevant contents for e-learners. Current search approaches have inherited problems and use a limited set of parameters for searching the meta-data, mainly based upon the indexed keywords only. This is not enough for the users, and users are often frustrated, mainly due to the availability of a huge number of search results for a searched query. There is a need for the system, especially for the e-learners community, which can provide online real-time information from social media to the e-learner. E-learners can get top-ranked articles from the Twitter user network, which is according to the user's interest. A recommendation of social networks for the e-learner is a system that will provide the learning material to the e-Learners. The system will be a web application that will search for the required data from the Web using different sources, such as the Twitter user network and DBLP. The developed system will recommend top-ranked information from these sources depending upon the context, profile, and history of e-Learners. The application will also provide research articles related to the users searched topic. The system inputs would be the usage and viewing history of users and user profiles built by the users and the user local context. Based on this information, the system will find research articles from DBLP and Twitter users from the Twitter microblogging website. This section presents the predictive analysis related to generated knowledge and details based on the previous sections. The presented tweet and article recommendations for e-learners are shown in Figure 9. The Applied Reinforcement Learning machine learning technique is the proposed system for recommending tweets and articles to e-learner users. The presented process predictive analysis is divided into three main sections. The first section contains the input data collected from social media platforms. The data provide the information related to e-learners IP, article title, access page, e-learner preference, e-learner click information, access date, article ID, access day, article category, article type and access time. They move to the second section, before training the dataset, pre-processing, feature engineering, data transformation and feature selection is applied to make the dataset ready for further process. After splitting the data in the train and test set, a reinforcement learning technique is applied for recommending the information, based on user preferences.

Reinforcement Learning Optimization
Reinforcement learning recommendation system contains various methods to optimize user interest. In this process, user interest directly optimizes by using FeedRec [65] through the simulation process. To do this, we need the find to "ground truth" of the system to get the maximum user engagement. Based on this, the processing algorithm shows if it is possible to get the optimal policy and maximized user engagement delay. The procedure of simulating is defining S(z t , i t ; β z ) with a mini-batch SGD by applying the predicted dataset. This dataset prepared based on prediction policy π b and is immediately used as a manufacturing simulator. To get the efficiency of prediction policy π b , the main loss in weight is minimized as in Equations (2) and (3).  N is defined as the total number of directions in the predicted dataset. To reduce the disparity, v 0:t is defined as the significant ratio between π. π is the policy which extracted from the Q-network-e.g., ∈-greedy. Cross entropy is defined as Ψ, which shows the loss function. c is defined as the hyper-parameter to avoid from the large ratio. The multi-task loss function is defined as δ t (β z ) for evaluating the comparison between regression loss and multi-task loss and λ is defined as a hyper-parameter controller to evaluate the various tasks. Based on the updates from β z , the extracted π from the Q-network is continuously changing. To keep the standard policies adaptive, as well as π, the S network also saves changes to ensure the optimal accuracy. To improve user satisfaction in previous research, diversity was mentioned as an effective process for the recommendation. Similarly, it is an unintentional system to optimize user engagement. Based on the above-mentioned FeedRec framework, it is possible to optimize user engagement through various means in diversity immediately. To generate the simulation data, two types of lists are defined as user engagement and a list of recommendations.
1. Linear Style: In this process, the most satisfying results belong to a linear relationship with higher entropy. Based on this, the user can get more information and also use the system often. The probability of the user in using the system and searching for articles is defined as Equation (4).
The article recommendation system is defined as (φ 1 , ..., φ n ), and the meaning of entropy is defined as xα(φ 1 , ..., φ n ). x and y are used in the range of 0, 1.
2. Quadratic Style: The highly user satisfaction made by moderate entropy. The probability of user in using a system and searching for articles is defined as Equation (5).
The above evaluations show the relationship between the user and system agent. The output of this process shows that FeedRec contains the ability to fit various types of dispensation among the entropy of recommendation list and user engagement.

Prediction Result of Twitter and DBLP Platform
In this section, the development environment, prediction results and implementation process of the proposed recommendation system for e-learners are presented in detail.

Experimental Environment and Setup
The implementation of the proposed model structure and environment is presented in this section. Table 5 summarizes the experimental set up of the proposed model. All experiments and results of the system are carried out using Intel(R) Core(TM) i7-8700 CPU @3.20 GHz 3.19 GHz processor with 32 GB memory. The reinforcement learning technique used for the recommendation system. Similarly, the library and framework used in the proposed system is Jupyter notebook. The programming language used in the designing of this System is WinPython-3.6.2.

Performance Evaluation
The selected users are bachelors, masters and PhD students. The ranked tweets by the system are given to users for evaluation. For each query, nine tweets are given to the users-three out of nine belong to categories: context-based recommendations, profile-based recommendations, and history-based recommendations. In this evaluation a form is provided to each user. The evaluation form consists of user's personal information, and a scenario, keyword and ranked tweet relevant to that keyword. For the evaluation form, the user reads the scenario and checks the keyword relevance with the tweets. The user reads the first tweet if this tweet is relevant to the given keyword then the user marks this tweet as relevant or otherwise irrelevant. The user does the same steps for all tweets. According to the result, the context has high weight over profile and history. The mentioned weight is calculated by using the following Equations (6)- (8).
Pro f ile = (Pro f ile/(Context + History + Pro f ile)) (7) History = (History/(Context + History + Pro f ile)) ( Performance evaluation describes the formal procedure to estimate the model performance results. To specify the movement of our model, we applied three statistical evaluation method listed as Mean Square Error (MSE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).

•
Mean Square Error This statistical evaluation measure the relationship between predicted value and actual value based on the mentioned Equation (9). •

Mean Absolute Error
This statistical evaluation measures the square of differences between predicted value and actual value based on the mentioned Equation (10). •

Root Mean Square Error
This statistical evaluation measure the error rate, error size based on the target value which mentioned in Equation (11).

Prediction Results
Prediction results contain the output of the experiments based on the above-mentioned machine learning regression algorithms. This process contains 10 top related values based on e-learner requests and activities. All the experiments and machine learning algorithms were implemented in winpython programming environment. Figure 10 presents the efficiency of the operated models.

Recommendation Results
Based on the proposed process of the recommendation system on social media contents for e-learners, the reinforcement learning recommendation shows the system output on state, reward, loss and model frequency. Table 6 presents system response time detail information. Response time shows the output for three-timing information, containing loading time, searching time and execution time. Loading time is the time that it takes the user to load the web page. Searching time is the time it takes that user to search for the unique contents, and execution time is the time it takes to show the final search result. The main interface is shown in Figure 11. It gives two options "Sign in" and "Sign up". The user selects the appropriate option as per their requirement. When a user selects the "Sign up" option, the user is redirected to the figure shown as (2), sign up. The registration process of users is shown in (2). Clicking on the "Sign up" option requires the user to fill in the personal information. Once the user selects the "Save" button, the form is sent to the server, and the user account has been made. The profile information of users is shown in (3). It gives multiple options "Edit", "Choose File", "View Profile", "My Articles", etc. The user selects the appropriate option as per their requirement. Selecting the "Edit" option allows them to update their profile information. Selecting the "My Articles" option shows the uploaded articles by the user shown in (4). When the user clicks on the "My Articles" button, it gives multiple options, such as, "View", "DBLP", "View Profile", "My Articles", "Upload Articles", etc. The user selects the appropriate option based on their requirement. Selecting the "View" option redirects them to the same page where the user can read the selected article. When the user selects the "DBLP" option, it gives multiple options, such as, "Search (Button)", "DBLP", "View Profile", "My Articles", "Upload Articles", etc. The user selects the appropriate option as per their requirement. The user needs to fill the text box with the appropriate article name. When the user selects the "Search (Button)", they are redirected to the (9), where the user can see the recommendation from "DBLP". When the user selects the "Upload Articles" option, they are redirected to the (6), in which they can click on the "Upload Articles" button. It gives multiple options, such as, "Save (Button)", "Choose File", "DBLP", "View Profile", "My Articles", "Upload Articles", etc. In (6), the user needs to fill in the article information to upload articles in the system. Once the user selects the "Save" button, the form is sent to the server, and the articles are uploaded successfully. When the user selects the "Search Articles" option, then the system redirects to (5), where they can search for articles from the system. The articles against user query are shown in (8). It gives multiple options "View", "DBLP", "View Profile", "My Articles", "Upload Articles", etc. When the user selects "View", they are redirected to the same page where the user can read the selected article. The recommendation from DBLP is shown in (9).

Comparison and Baseline
Based on the proposed recommendation system, various algorithms compare together to show the system performance. Using the future reward (DDQN) increases the RL recommendation result above the (DN). Similarly, (DBGD) applied as an exploration system using ∈-greedy to pass the system loss. Figure 12 shows the detail of system performance. In total, ten techniques are compared to get the system performance result. The applied techniques are defined as LR, FM, W&D, LinUCB, HLinUCB, DN, DDQN, DDQN+U, DDQN+U+EG and DDQN+U+DBGD. Based on the comparison, DDQN+U+DBGD has the highest score. Applying EG to DDQN+U, did not have much effect on improving the accuracy of the system.  Table 7 presents the diversity of user clicks that were measured by using cosine similarity. The smallest output represents better diversity.
Similarly, some baseline methods-e.g., HLineUCB-achieve relatively equivalent recommendation diversity, which demonstrates UCB can get sensible result too.  Figure 13 shows the preferences based on the system rewards. The presented reward is based on the recommended articles and total available articles.  Figure 1 shows the comparison of different studies related to our topic. In this figure, we compare the presented result with four recent research articles on the recommendation system, and it shows the proposed result that an F-measure output of 88% has a better consequence. The mentioned studies are proposed by Verma et al. [66], Zhang et al. [67], Hsieh et al. [68] and Liu et al. [69].

Conclusions and Future Work
In this paper, we present a reinforcement learning framework to customize online Twitter and DBLP article recommendation. The main differences between the proposed method and other methods are the efficient modeling of the articles, comments, user's feature, and also the design of explicitly reach a great reward. Based on the user clicks on URLs and user searching process, the system obtains more information from user feedback. Similarly, using the effective exploration strategy in this framework increases the recommendation diversity and also gets more reward recommendations. Experimental results suggest that the proposed system has higher accuracy for recommendation diversity and can distribute in other recommendation systems too. The system quality control and trust rely on user rating and feedback, which is the main concept of reinforcement learning. In the future, we are planning to develop the offline recommendation evaluation and generate other types of methods using the proposed framework.