Design and Application of a Multi-Variant Expert System Using Apache Hadoop Framework

: Movie recommender expert systems are valuable tools to provide recommendation services to users. However, the existing movie recommenders are technically lacking in two areas: first, the available movie recommender systems give general recommendations; secondly, existing recommender systems use either quantitative (likes, ratings, etc.) or qualitative data (polarity score, sentiment score, etc.) for achieving the movie recommendations. A novel approach is presented in this paper that not only provides topic-based (fiction, comedy, horror, etc.) movie recommendation but also uses both quantitative and qualitative data to achieve a true and relevant recommendation of a movie relevant to a topic. The used approach relies on SentiwordNet and tf-idf similarity measures to calculate the polarity score from user reviews, which represent the qualitative aspect of likeness of a movie. Similarly, three quantitative variables (such as likes, ratings, and votes) are used to get ﬁnal a recommendation score. A fuzzy logic module decides the recommendation category based on this ﬁnal recommendation score. The proposed approach uses a big data technology, “Hadoop” to handle data diversity and heterogeneity in an efﬁcient manner. An Android application collaborates with a web-bot to use recommendation services and show topic-based recommendation to users.


Introduction
Since the advent of web intelligence, artificial intelligence-based services, frameworks and products have become popular in the World Wide Web. One of the key services of such web intelligence applications is a recommendation system. In recent times, recommendation systems have become popular in the domain of movies, music, books, restaurants, garments, mobile applications and many other fields of life. Such recommendation systems filter huge amount of structured and unstructured data and predict the preference of a user that one would give to an item. In the last decade, a few movie recommendation systems have been presented using conventional methods [1][2][3]. However, previous movie recommender systems lack various features and/or accuracy of true recommendation. A majority of these recommender systems use quantitative variables (likes or ratings) and a few others use qualitative variables (polarity score, etc.) [4][5][6][7]. This paper proposes an intelligent and automated recommendation system that provides two-fold novelty. First, our recommender system uses a multi-variant popularity matrix to recommend a suitable movie to a user on the basis of both quantitative and qualitative variables to achieve true recommendations. Secondly, a fuzzy logic-based module provides the final recommendation of movies in a particular field of user's choice (such as comedy, action, horror, fiction, etc.), whereas the currently available systems give general recommendations.
In our multi-variant recommendation system, one of the challenges was opinion mining of users' reviews to calculate a polarity score that shows the degree of likeness or dis-likeness of a movie by a user [8][9][10][11][12]. Such polarity scores provide a qualitative aspect of user's opinions about a particular movie. Another challenge was how to handle the diversity of heterogenous data as the presented approach uses both quantitative and qualitative data. To handle this issue a big data solution involving Hadoop was used in our approach because it efficiently handles data heterogeneity and data diversity in a better way.
Nowadays society has changed, people own smartphones and they are highly dependent on mobile applications such as recommendation systems, which need to communicate with smartphone Apps so that users can easily interact with the services and efficiently select the recommended items [9]. Therefore, in this paper, a recommender system is coupled with an Android application and a web-bot offering open web services and merging movie data from linked data composed with different external resources.
Big data is defined by four dimensions represented by four V's (volume, variety, velocity, and veracity). Volume is represented by the amount of text data that we use to generate recommendation. Variety represents the different types of data extracted from different sources like blogs, Facebook, and Twitter as well as different review and opinion sites. Reviewers can write their reviews, remarks, and feedback in any format-like structure, semi-structured, or unstructured and these should be handled by the system. Velocity represents the speed of data generation on the internet. Veracity represents the trust worthiness of the data.
A multi-variant recommendation system can get benefit of a NoSQL environment in reducing complexity and to handling the sparsity by factorization, ensuring scalability by using an empowered server machine and dealing with heterogeneity by using Hadoop platform to handle the big data issues [13][14][15].
Ontology and linked data can be information sources for movies' descriptions and are available among Internet applications and are provided through the semantic queries of standard web technologies, such as URIs, RDF, HTTP and the semantic web. The linked data from Google Places, Trovacinema, Wikipedia and netflix or linked movie databases (http://linkedmdb.org) are useful for the recommender systems [16].
The work presented in [13] discusses a movie recommendation system that uses movie ratings to recommend movies only in a general category. However, this work seriously lacks accuracy of true recommendations. The reason for less accuracy in [13] is the use of only numeric data such as likes and ratings. Such numeric features only cover the quantitative aspect of the users' likeness. However, the qualitative aspect of likeness of users is totally ignored in this work, which makes the results of this work questionable. Here, it is important to mention that quantitative and qualitative aspects of likeness of users can provide us with true recommendations. The qualitative aspect of likeness can be achieved from text reviews that were not covered by the approach used in [13].
Moreover, this approach is tested on a single small dataset. Other issues with [13] are tabulated in Table 1. There is need of a multivariate approach that involves both quantitative and qualitative aspects to finalize a recommendation and achieve highly accurate results. Table 1 represents differences in previous approaches and our multi-variant approach. Multi-Data Sources During the literature review of modern recommender systems, Hsieh's movie recommender system [13] was identified as the relevant. This system has used the benefits of big data solutions and also provides a mobile app to interact with the recommender system. However, the key short-coming in this work is the limited approach used to recommend movies. Major issues with this recommendation system are discussed and comparison with our approach is given in Table 2. Table 2. Difference to Hsieh's work [13].

#
Hsieh's Work Our Approach 1 It is a general recommendation system for movies.
Our approach supports topic vise recommendation of movies such as drama, comedy, action, horror, etc.
2 This approach only uses quantitative data (ratings and likes) for recommendation that provides less accuracy. .
Our approach uses both quantitative (votes, likes, etc.) and qualitative data (polarity score) for true recommendation of movies.

3
This approach banks on simplistic calculation in the base of similarity measures. No real decision making approach is used that makes quality of results questionable.
Our approach uses Fuzzy Logic approach for better decision making and true recommendations of movies.

4
True likeness of users is not reflected by this approach.
Our approach reflects true likeness of the users as qualitative aspects of likeness is also considered.

5
This approach is tested only on one limited dataset.
Our approach is tested on three large datasets. 6 This approach calculates recommendation on two quantitative variables.
Our approach calculates recommendation on three quantitative variables and one qualitative variable.
The recommender systems field has made significant progress with many new techniques proposed and new systems developed. However, modern systems still require significant improvements to provide better recommendations. The major contribution to knowledge and novelty of the work is outlined below: i.
A topic (action, comedy, horror, etc.) based recommendation is supported. ii.
Multi-variant (ratings, votes, likes and polarity score) parameters are used. iii.
Both quantitative and qualitative data is used for movie recommendations. iv.
Three external data resources are used for datasets (Metacritics, IMDB, and Fandango). v.
A web-bot used to fetch the web contents collaborates with the server. vi.
Filters and integrates movie descriptions from linked data or ontology (linkedmdb). vii.
Recommender system is developed using NoSQL environment with apache Hadoop. viii.
Fuzzy sets are established for movie ranking categorization. ix.
Front end App collaboration with the movie recommender through web services is supported.
The rest of this paper is organized as follows. In Section 2 related work is discussed, where recommender systems running for different subjects could clear up the native problems of user's data processing. A multi-variant ranking model is presented, for movie recommender using a mobile application, and Apache Hadoop in Section 3. Experiments and results, and evaluation of the system are discussed in Sections 4 and 5, respectively. The conclusion and future works are presented in Sections 6 and 7, respectively.

Related Work
Recommendation services typically rely on customer reviews or customer ratings and such recommendations can provide a useful service for new customers. Emotional expressions, Sustainability 2018, 10, 4280 4 of 21 social interaction and behavior changes of the users are studied on the Twitter, allowing management to distinguish clients who do or do not return [20]. Vox Civitas obtains responses from social media (e.g., Twitter), which can support journalistic investigations in more effective ways [21]. Anomalies are removed and pure data is obtained which reflect the United Kingdom's worst influenza [22]. Detection of noise in text (tweets) from micro-blogs is discussed in [23].
Sentiment analysis approaches can be used to extract sentiments associated with positive or negative polarities for specific subjects from a document, instead of classifying the whole document as positive or negative [24]. An NLP-based methodology of sentiment evaluation on user's comment has been used as a way to retrieve the best and perfect YouTube videos. The process works in four steps. First, a review collection and preprocessing component extracts data (comments) from the particular YouTube video and language preprocessing is undertaken to prepare for the next process. Second, the processed text goes through NLP-based methods to generate data sets. Subsequently, the sentiment classifier (Sentistrength) is applied on the data sets to calculate the positivity and negativity ratings. Finally, the standard deviation applied to get the rating result [25].
Features level sentiment analysis, which is based on the idea that an opinion consists of a sentiment (positive or negative) and a feature of movies is another approach. Each short comment is represented as a sequence of sentiment words and underlying states [9,12]. A linear regression model (LRM), a supervised machine learning technique to classify twitter gossip (positive and negative) has been used to predict the box-office revenue for different movies [8]. Neural networks (NNs) classification of sentiment analysis of large movie reviews has been handled by introducing a method. Recursive neural networks wrap the previous sentence-level-sentiment classification and are used with recurrent neural networks. Recursive neural networks are used for sentence-level analysis and a recurrent neural network is used for whole passage analysis to create better results [26]. The vector space model (VSM) was used to implement the instance-based learning (IBL) classification method. Text documents were treated as vectors in IBL algorithms to identify the class (positive or negative review) of the document [27]. Sentiment analysis is negative when the text includes some negative words, such as "bad acting, stilted dialog." It is positive if the text includes some positive words such as "it's funny". A suggestion instead of an exact rating is done by sentiment classification of the comments (polarity), and then aggregated into a rating score selected as the recommended list of popular movies [11,17]. For example, the hotel management of Starwood Hotels and Resorts use social media's strength to stay connected with their guests, to guide them, and seek responses to the services they provide. [28].
Other related work includes typical recommender frameworks that construct calculations in light of different fuzzy set theoretic likeness measures (the fuzzy set augmentations of the Jaccard list, cosine, closeness or relationship similitude measures), and aggregation techniques for figuring suggestion certainty scores (the maximum-minimum or weighted-whole fuzzy set theoretic accumulation strategies) for recommendation [4]. The strategy for ranking in light of the content involves building a sentiment graph from the collocation of adjectives, PageRank algorithm and a very small set of adjectives (such as 'good', 'excellent', etc.) that rank different movies using reviews of box office movies by users of a popular movie review site [18]. With regard to the utilization of labels with the end goal of recommendation of movies, the German movie website, Moviepilot uses viewers, and movie ratings, and all out labels are marked to every movie. Labels are allotted by a group of moderators and viewer are then able to rate how well the labels fit every motion picture [3]. This collaborative filtering was first applied elsewhere in filtering the information in Usenet news [29].
Music recommender systems provide personalized music recommendations and Ringo Agent was one of the first applications [30]. Content-based filtering recommends movies based on a comparison between user profile data and content of movies. Content-based filtering is also called cognitive-filtering. The recommendations are generated by matching users and movie content [4]. Collaborative filtering is also called social filtering. The fundamental rule behind collaborative filtering is that if a user likes a certain category of movie in the past, then they may like similar movies in the future. This information is used in deciding which movie to suggest [19,29,30]. Hybrid filtering is a combined technique of content filtering and collaborative filtering [31].
The previous work discussed above suggests that most of the approaches used for recommendation services, especially for movie recommendation are uni-variant and use varaiables such as ratings, which tend to provide results with low accuracy. There are other variables including likes, number of reviews, and the sentiment score of a review that can help in achieving an accurate and efficient recommendation, and in this paper we aim to use these new variables for the proposed movie recommendation service.

Multi-Variant Expert System
The proposed approach works on the fetched data (scores and reviews) from a set of movie websites and databases. The collected data is heterogeneous in nature, such as numeric data (for example, number of votes, number of likes and number of ratings) and text data (for example, user reviews of movies). A web crawler was developed to fetch structured and unstructured data and store the fetched data in a NoSQL database on a server machine for further processing.
The used approach works in two parallel streams. In the first stream, the text data (such as movie reviews) is preprocessed using NLP modules, tf-idf algorithms and the SentiWordNet auxiliary database to identify polarity scores of terms (lexicons) in the form of negative and positive scores. All the movies reviews are processed for an aggregate polarity score for each movie from each participating external data source. In the second stream, all the numeric scores and weights (rating, votes and likes) of the movies are normalized and computed to achieve weighted aggregate of polarity scores. The result of a search query of the movies is shown in the user interface of an Android app (see Figure A1). The user query interacts with the server and the server processes the request by forwarding it to the web crawler. The web crawler module responds to the server's request by crawling the web for keywords (lexicons) matching and downloading the webpages to the server, and then the server processes the data to generate a recommendation as shown in Figure 1. The previous work discussed above suggests that most of the approaches used for recommendation services, especially for movie recommendation are uni-variant and use varaiables such as ratings, which tend to provide results with low accuracy. There are other variables including likes, number of reviews, and the sentiment score of a review that can help in achieving an accurate and efficient recommendation, and in this paper we aim to use these new variables for the proposed movie recommendation service.

Multi-Variant Expert System
The proposed approach works on the fetched data (scores and reviews) from a set of movie websites and databases. The collected data is heterogeneous in nature, such as numeric data (for example, number of votes, number of likes and number of ratings) and text data (for example, user reviews of movies). A web crawler was developed to fetch structured and unstructured data and store the fetched data in a NoSQL database on a server machine for further processing.
The used approach works in two parallel streams. In the first stream, the text data (such as movie reviews) is preprocessed using NLP modules, tf-idf algorithms and the SentiWordNet auxiliary database to identify polarity scores of terms (lexicons) in the form of negative and positive scores. All the movies reviews are processed for an aggregate polarity score for each movie from each participating external data source. In the second stream, all the numeric scores and weights (rating, votes and likes) of the movies are normalized and computed to achieve weighted aggregate of polarity scores. The result of a search query of the movies is shown in the user interface of an Android app (see Figure A1). The user query interacts with the server and the server processes the request by forwarding it to the web crawler. The web crawler module responds to the server's request by crawling the web for keywords (lexicons) matching and downloading the webpages to the server, and then the server processes the data to generate a recommendation as shown in Figure 1.

NLP Module
Real-world data is generally incomplete and noisy, and is likely to contain irrelevant and redundant information or errors. By pre-processing, raw unstructured data can be converted into a structured, understandable form as shown in Figure 2. Since the real-world data can contain ambiguity and anomalies, it is necessary to remove these abnormalities before the actual analysis of data. The data was pre-processed to remove anomalies and to identify the abbreviated language of the actual English and also remove the reviews which were in other languages other than English [32][33][34].

NLP Module
Real-world data is generally incomplete and noisy, and is likely to contain irrelevant and redundant information or errors. By pre-processing, raw unstructured data can be converted into a structured, understandable form as shown in Figure 2. Since the real-world data can contain ambiguity and anomalies, it is necessary to remove these abnormalities before the actual analysis of data. The data was pre-processed to remove anomalies and to identify the abbreviated language of the actual English and also remove the reviews which were in other languages other than English [32][33][34].

Stemming (Lemmatization)
This is optional; most stemming usesthe Porter Stemmer. English words like "look" can be arched with a morphological suffix to deliver "looks, looking, looked". These have a similar stem, "look" [37,38].

Stop Word Evacuation
Most regularly used words do not convey much significance.For example: "the, an, of, for, in ...". We used a small corpus based library to exclude stp words from the input data. This library is developed in Java.

POS-Tag Generation
The query was then analyzed and POS tags were generated of all the words in the query. Then, the resulting string of words and their relevant POS (Parts of Speech) tags are tokenized on the basis of space. A good example of an English sentence, "This movie is so riddled", is Pos-tagged as [

Lexical Frequency Measuring
For this purpose, we applied a tf-idf frequency measure [40,41]; we first calculated the frequency of the valid/important terms, which represents the number of times that term 't' occurs in the document (review) 'd', as in the following Equation (1):

Stemming (Lemmatization)
This is optional; most stemming usesthe Porter Stemmer. English words like "look" can be arched with a morphological suffix to deliver "looks, looking, looked". These have a similar stem, "look" [37,38].

Stop Word Evacuation
Most regularly used words do not convey much significance.For example: "the, an, of, for, in ...". We used a small corpus based library to exclude stp words from the input data. This library is developed in Java.

POS-Tag Generation
The query was then analyzed and POS tags were generated of all the words in the query. Then, the resulting string of words and their relevant POS (Parts of Speech) tags are tokenized on the basis of space. A good example of an English sentence, "This movie is so riddled", is Pos-tagged as [

Lexical Frequency Measuring
For this purpose, we applied a tf-idf frequency measure [40,41]; we first calculated the frequency of the valid/important terms, which represents the number of times that term 't' occurs in the document (review) 'd', as in the following Equation (1): After calculating tf we calculated idf (inverse document frequency) of the terms to obtain information about how rare or common that term is in the documents (reviews). We used the Equation (2): idf(t, D) = logN/dt (2) where as d ∈ D and t ∈ d, N is total number of documents in the corpus N = |D|. The end results are then obtained by applying the Equation (3):

Polarity Identification
SentiWordNet 3.0 automatically annotates all WordNet 2.0 (synsets) according to their degrees of positivity, negativity and neutrality. In this step, the SentiWordNet score was used in the sentiment analysis of the documents (reviews) [42,43]. For this purpose, we applied the Equation (4): The SentiWordNetScore (positive or negative) of the term and its frequency were computed to get the overall sentiment of the terms in the documents.
The SentiWordNetScore of each term for all reviews of the movie is calculated and the score (negative or positive) tells us how many terms are positively or negatively important in the review. Then, all the positive terms scores are added to obtain the positive term's weight, and also all the negative terms scores are combined to obtain the negative term's weight in a review. The polarities of all the reviews of the movies from each participating website are calculated as follows.

Polarity of a Term
By applying sign P(t) on term, if SentiWordNetScore (S t i ) of terms (t i ) of the review (r i ) of movie (m i ) is less than zero then the term lies in the negative poll (nt), if greater than zero than it lies in the positive poll (pt) and if it is equal to zero than it lies in the neutral poll (tn) as shown in equation (5).
Polarity of a Document (Reviews) For calculating the polarity of a document (review) polarity r i (r), positive terms pt r i and negative terms nt r i are aggregated for each document (review) from each participating websites' negative_term r i (x) and negative_term r i (y) and then take their differences are taken to find the polarity of each documents (reviews) by applying the sign function f(r) as shown in Equations (6) and (7).
where as pt r i ∧ nt r i ∈ r i and r i ∈ m j Polarity of a review is calculated as shown in Equations (8) and (9): −p r i , p r i < 0 ( nr = negative review ) p r i , p r i = 0 (rn = neutral review) +p r i , p r i > 0 (pr = positive review) (9) by applying sign f(r) on each review, If the difference of aggregated positive_term r i (x) of review (r i ) of the movie (m j ) from website (w k ) and aggregated negative_term r i (y) is less than zero then review is sentimentally lie in negative poll (nr), if greater than zero than lie in positive poll (pr) and if equal to zero than lie in neutral poll (rn).
Polarity of a Collection (Movie Reviews) review_positive_score m j (a) is the aggregated polarity score of positive reviews pr and review_negative_score m j (b) is the aggregated polarity score of negative reviews nr of a particular movie m j from a particular website (w k ) used to calculated the polarity of the movie from participating sites as given in Equations (10) and (11).
where as pr i ∧ nr i ∈ m j and m j ∈ w k . Polarity of a collection is calculated using Equation (12): where as g m j ∧ h m j ∈ m j and m j ∈ w k . Here w k is a movie website such as IMDB.

Weighted Polarity Manipulation
Opinion mining determines the emotions (positive or negative) of textual communication on social media, and examines the positive or negative emotions by simply extracting polarity scores from the review (number of stars or thumbs up/down and votes etc.). However, we used both the polarity score and weight score (rating, votes and likes) of the movies. First, we computed the aggregated polarity score of each movie from each participating site, and then we took the average of the aggregated polarity by total reviews of the respective movie and their site. Again, we take aggregation of average polarity score. Also, total likes of the movie were combined with weighted_average_polarity to find the aggregated_weighted_average_polarity. After that, the final score of the movie was rescaled to get the ranked score and category of the movie. In this computation, Equations (13)-(18) are used.
whereas g m j ∈ m i , m i ∈ w k . weight m j = (votes w k ) + rating w k (14) whereas weight m j ∈ m i , m i ∈ w k . weighted_average_polarity m j (a) = g m j n + weight m j (15)  whereas n is number of reviews of movie (m j ) from movie website (w k ).
where as a w k ∈ m j , m j ∈ w k , w k ∈ N.
(N) is the number of movie websites (Metacritic, IMDBand Fandango) which has huge collection of material.
Here R is the rescale value of the normalized average aggregated score (r m j ) of the movie (m i ) from movie websites (N) to rank the top five movies (M).

WordNet
The WordNet library was used in our approach to find synonyms and alternative forms of query terms, e.g., "weather" = {"weather report", "weather forecast", etc.}. Identification of synonyms in data helps in obtaining better and more accurate results.

Recommendation
The final recommendation is achieved by using the fuzzy logic approach on the following fuzzy set to evaluate the final score and find the category of the movie as follows.
Step 1 if final score ≥ 8 then Category: "A: Recommended" Step 2 else if final score ≥ 6 then Category: "B: Top Recommended" Step 3 else if final Score ≥ 4 then Category: "C: Recommended Average" Step 4 else if Final Score ≥ 2 then Category: "D: Least recommended" Step 6 else Category: "F: Not recommended" Figure 3 shows the final recommendations of the movies in a particular category (such as comedy, horror, fiction, etc.) in one of the five different classes. The user interface showing the output is discussed in Appendix A and Figure A1.

NoSQL for Big Data Stroage
The number of publicly available test corpora is quite limited and comparatively of small size with respect to the number of texts documents in a corpus. Thus, producing adequately precise comparisons between reported performances is difficult. So, we decided to build a new corpus and for this purpose, we used three different external data source websites to extract a large number of reviews, votes, ranking and likes. The data for our corpus was retrieved by a web bot implemented in PHP. We wrote a webpage (web-bot) scraping scripts which extract movie URLs with matching user's queries, if the query keywords (lexicons) are matched then the crawler downloads the page in a server machine NoSQL environment using Hadoop, otherwise this page is discarded [13,14,[47][48][49][50][51][52]. This procedure is depicted in Figure 4. The process of data extraction uses the following steps.
Step 1 Receive URLs for a movie type i.e., comedy, horror, fiction, etc.
Step 2 Matches the keywords from the query to the page If

Step 3 Keywords matched Then
Step 4 Download the web page Step 5 Send it for storage Step 6 Discard the page Step 7 Repeat the step 2 to 6 until all the matched web pages are found.

NoSQL for Big Data Stroage
The number of publicly available test corpora is quite limited and comparatively of small size with respect to the number of texts documents in a corpus. Thus, producing adequately precise comparisons between reported performances is difficult. So, we decided to build a new corpus and for this purpose, we used three different external data source websites to extract a large number of reviews, votes, ranking and likes. The data for our corpus was retrieved by a web bot implemented in PHP. We wrote a webpage (web-bot) scraping scripts which extract movie URLs with matching user's queries, if the query keywords (lexicons) are matched then the crawler downloads the page in a server machine NoSQL environment using Hadoop, otherwise this page is discarded [13,14,[47][48][49][50][51][52]. This procedure is depicted in Figure 4. The process of data extraction uses the following steps.  Web crawler (web-bot) downloads the webpages (crawled pages) by which it extracts more contents (Meta tags) like movie reviews, rating, votes and likes and other irrelevant pages discarded as shown in Figure 4.
Computational processing can occur on data stored either in a file-system (unstructured) or in a Step 1 Receive URLs for a movie type i.e., comedy, horror, fiction, etc.
Step 2 Matches the keywords from the query to the page If

Step 3 Keywords matched Then
Step 4 Download the web page Step 5 Send it for storage Step 6 Discard the page Step 7 Repeat the step 2 to 6 until all the matched web pages are found.
Web crawler (web-bot) downloads the webpages (crawled pages) by which it extracts more contents (Meta tags) like movie reviews, rating, votes and likes and other irrelevant pages discarded as shown in Figure 4.
Computational processing can occur on data stored either in a file-system (unstructured) or in a database (structured) as shown in Figure 5. Apache Hadoop is the de facto data operating system.
It is an open-source software framework processing of big data on clusters of commodity hardware. Web crawler (web-bot) downloads the webpages (crawled pages) by which it extracts more contents (Meta tags) like movie reviews, rating, votes and likes and other irrelevant pages discarded as shown in Figure 4.
Computational processing can occur on data stored either in a file-system (unstructured) or in a database (structured) as shown in Figure 5. Apache Hadoop is the de facto data operating system. It is an open-source software framework processing of big data on clusters of commodity hardware. A multi-variant web agent is implemented in the Hadoop environment to handle big data generated by recommendation systems in order to improve the scalability and efficiency. The abovementioned Figure 5 shows the interaction and computation in the Hadoop environment between the Android user app, web bot and the external participating sites for data. In each Mapper, there are various algorithms porterStemmer(), tokenizer(), POStager(), polarityComputation(), weightedRanking(), Webcrawler(), etc. and in Table A1, the Server machine's specification and Android device specification are presented. The hardware used in implementation is discussed in Appendix B and mentioned in Table A1.

Experiment and Results
We used three repositories whose reviews should be trusted, that is, IMDB, Metacritic and Fandango, which contain all the required data (reviews and scores). These repositories contain movie A multi-variant web agent is implemented in the Hadoop environment to handle big data generated by recommendation systems in order to improve the scalability and efficiency. The above-mentioned Figure 5 shows the interaction and computation in the Hadoop environment between the Android user app, web bot and the external participating sites for data. In each Mapper, there are various algorithms porterStemmer(), tokenizer(), POStager(), polarityComputation(), weightedRanking(), Webcrawler(), etc. and in Table A1, the Server machine's specification and Android device specification are presented. The hardware used in implementation is discussed in Appendix B and mentioned in Table A1.

Experiment and Results
We used three repositories whose reviews should be trusted, that is, IMDB, Metacritic and Fandango, which contain all the required data (reviews and scores). These repositories contain movie (2016 and 2017) data for 1000 of the most popular movies (with a significant number of votes and ratings) and their reviews were released in 2016 and 2017, and as of 22 March 2017.
We computed the polarity score of the text data (reviews) by computing the movie's reviews corpus which was fetched from each participating external data source sites. This procedure was followed by data preprocessing, tf-idf classification and polarity identification using SentiWordNet to compute the polarity scores of each term of each document from each participating data source sites. The following tables illustrate the values of movies fetched data which were processed and evaluated.
Here, Table 3 shows the movie's title and the corresponding movie ID as follows.  Table 4 shows the popular external data source sites and their corresponding movies sites ID for better formulation. Some computed values such as polarity scores of movie reviews from participating sites, which are already labeled are shown in Table 5. Here we normalized the scores "likes" metascore and IMDB rating scores are normalized ratings to a (0-5) scale because Metacritic and IMDB user rating is out of ten stars, but Fandango rating is out of five stars so we normalized the Metacritic and IMDB to five stars. These normalized values are given in the Table 6. Here Metacritic_votes, IMDB_votes and Fandango_votes are the number of the votes, which are allotted by users to the particular movies from specific movie websites and are represented in the Table 7.
After calculating and aggregating the polarity score from each participating movie site and taking an average of polarity scores by total movies, and taking the weighted average polarity by adding the weights (normalized ranking and votes) to each average polarity, Likes may also represent the behavior of users, which impact the movie rating. That is why we selected the likes in our model to present the multi-variant approach. We added Facebook likes to them to take the aggregated weighted average polarity for better recommendations. The classified scores are shown in Table 8. These multi variants (votes, ranking and likes) are computed according to our model, which ranked the movies as mentioned from the corpora of different movie data, the final score and category are represented in Table 9.
The multi-variant score of m1 movie "Avengers: Age of Ultron" is greater than six which is why it is categorized "B", the m2 movie "Cinderella" score is greater than four so it lies in Category "C", m3 and m4 "Ant-Man" and "Do You Believe?", respectively, are greater than six so these are categorized "B", and the m5 "Hot Tub Time Machine 2" movie score is greater than four so it is also lies in "C" category. Figure 6 represents the final ranking score of a movie's particular category such as fiction, horror, drama, etc. Figure 7 shows the rating scores of a particular movie from three specific movie websites We compared their normalized rating scores among these sites, and we observed that rating by Fandango is so high that is not significant individually. These multi variants (votes, ranking and likes) are computed according to our model, which ranked the movies as mentioned from the corpora of different movie data, the final score and category are represented in Table 9. The multi-variant score of m1 movie "Avengers: Age of Ultron" is greater than six which is why it is categorized "B", the m2 movie "Cinderella" score is greater than four so it lies in Category "C", m3 and m4 "Ant-Man" and "Do You Believe?", respectively, are greater than six so these are categorized "B", and the m5 "Hot Tub Time Machine 2" movie score is greater than four so it is also lies in "C" category. Figure 6 represents the final ranking score of a movie's particular category such as fiction, horror, drama, etc.  Figure 7 shows the rating scores of a particular movie from three specific movie websites We compared their normalized rating scores among these sites, and we observed that rating by Fandango is so high that is not significant individually. In Figure 8, voting for movies also represents the user's interest in movies from different participating websites, which indicates a huge difference if we select only one site. One site is not These multi variants (votes, ranking and likes) are computed according to our model, which ranked the movies as mentioned from the corpora of different movie data, the final score and category are represented in Table 9. The multi-variant score of m1 movie "Avengers: Age of Ultron" is greater than six which is why it is categorized "B", the m2 movie "Cinderella" score is greater than four so it lies in Category "C", m3 and m4 "Ant-Man" and "Do You Believe?", respectively, are greater than six so these are categorized "B", and the m5 "Hot Tub Time Machine 2" movie score is greater than four so it is also lies in "C" category. Figure 6 represents the final ranking score of a movie's particular category such as fiction, horror, drama, etc.  Figure 7 shows the rating scores of a particular movie from three specific movie websites We compared their normalized rating scores among these sites, and we observed that rating by Fandango is so high that is not significant individually. In Figure 8, voting for movies also represents the user's interest in movies from different participating websites, which indicates a huge difference if we select only one site. One site is not adequate for a ranking approach, which is the reason we selected multi-variants from different sites.   Figure 8, voting for movies also represents the user's interest in movies from different participating websites, which indicates a huge difference if we select only one site. One site is not adequate for a ranking approach, which is the reason we selected multi-variants from different sites. Here, Figure 9 represents the differences in weighted average polarity scores by computing the multi-variant to show the categories. Time complexity was computed and also the watched time at different machines was observed. Time complexity in the worst case of our approach is O(n) because n number of datasets are used, and all following operation take one unit time, so time complexity of following is operations O(1). The computation watched time details are presented in Table 10.  Here, Figure 9 represents the differences in weighted average polarity scores by computing the multi-variant to show the categories. Here, Figure 9 represents the differences in weighted average polarity scores by computing the multi-variant to show the categories. Time complexity was computed and also the watched time at different machines was observed. Time complexity in the worst case of our approach is O(n) because n number of datasets are used, and all following operation take one unit time, so time complexity of following is operations O(1). The computation watched time details are presented in Table 10.  Time complexity was computed and also the watched time at different machines was observed. Time complexity in the worst case of our approach is O(n) because n number of datasets are used, and all following operation take one unit time, so time complexity of following is operations O(1). The computation watched time details are presented in Table 10.

Evaluation
Recall is defined as the number of relevant movies retrieved by a search divided by the total number of existing relevant movies, while precision is defined as the number of relevant movies retrieved by a search divided by the total number of movies retrieved by the search. The precision is the proportion of recommendations that are good recommendations, Precision = tp/(tp + fp) (19) and recall is the proportion of good recommendations that appear in top recommendations.
Recall = tp/(tp + fn) (20) tp: predicted positive interested movie it is true, it is really interested. tn: predicted positive uninterested movie it is true, it's really uninterested. fp: predicted positive interested movie but wrong, it is actually interesting. fn: predicted negative uninterested movie but wrong, it is actually uninteresting. In the recommendation domain, a perfect precision score of 1.0 means that every movie recommended in the list was good (although this says nothing about if all good recommendations were suggested) whereas a perfect recall score of 1.0 means that all good recommended movies were suggested in the list. Typically, when a recommender system is tuned to increase precision, recall decreases as a result (or vice versa).
F-Score = 2. (precision.recall)/(precision + recall) (21) Table 11 shows some outcomes of our recommendation system.  16 5 In Table 12 and Figure 10 for ecommendations in this domain, a single value is obtained by combining both the precision and recall measures and indicates the overall utility of the recommendation list. One thousand movies were used as exemplary data sets. Evaluations are really important in the recommendation engine building process, which can be used to empirically discover improvements to a recommendation algorithm.
This research used the MovieLens 1K dataset. There are 943 users and 1000 movies; we used the 1000 ratings, votes, likes and views from the users on the films to test the performance of proposed method.
The results of the f-measure differentiated the accuracy of our work from others. If we use the multi-variants system it provided an accuracy of about 98.6%. This research used the MovieLens 1K dataset. There are 943 users and 1000 movies; we used the 1000 ratings, votes, likes and views from the users on the films to test the performance of proposed method. The results of the f-measure differentiated the accuracy of our work from others. If we use the multi-variants system it provided an accuracy of about 98.6%.

Conclusions
This paper presented an intelligent and automated recommender system to provide topic (action, comedy, horror, etc.) based, accurate recommendations of movies to users. The used approach relies on both quantitative and qualitative data for achieving authentic recommendations. The used quantitative data includes ratings, votes, likes, etc., and the quantitative data is the polarity score that is calculated from user reviews using NLP and opinion mining techniques. The developed application was tested on three external data sources such as Metacritics, IMDB, and Fandango and achieved better results in terms of true recommendations as compared to previous approaches. The presented recommender system was developed using a NoSQL environment with Apache Hadoop to filter and integrate movie descriptions from linked data or ontology (linkedmdb). Our approach used a Fuzzy logic approach for movie ranking categorization. A front-end application is designed in Android for the interaction of a user with the movie recommender through web services. When users search for a movie through a mobile app then the server effectively responds to users with a recommended list of movies. Thus, users can take a decision before and secure a watch time for the movie and can conserve other important resources, like money and energy, etc.

Future Work
Further work is required to enhance the system for both registered and unregistered viewers or users of apps by adding more parameters, such as showbiz industry influence, movie quality, movie trends and a user's profile-based movie recommendation system in a NoSQL distributed environment. Semantic and sentiment computation are required to find the semantic relation between the movies and users as well as the psychological influence of movies. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.