Methodology for Analyzing the Traditional Algorithms Performance of User Reviews Using Machine Learning Techniques

Android-based applications are widely used by almost everyone around the globe. Due to the availability of the Internet almost everywhere at no charge, almost half of the globe is engaged with social networking, social media surfing, messaging, browsing and plugins. In the Google Play Store, which is one of the most popular Internet application stores, users are encouraged to download thousands of applications and various types of software. In this research study, we have scraped thousands of user reviews and the ratings of different applications. We scraped 148 application reviews from 14 different categories. A total of 506,259 reviews were accumulated and assessed. Based on the semantics of reviews of the applications, the results of the reviews were classified negative, positive or neutral. In this research, different machine-learning algorithms such as logistic regression, random forest and naïve Bayes were tuned and tested. We also evaluated the outcome of term frequency (TF) and inverse document frequency (IDF), measured different parameters such as accuracy, precision, recall and F1 score (F1) and present the results in the form of a bar graph. In conclusion, we compared the outcome of each algorithm and found that logistic regression is one of the best algorithms for the review-analysis of the Google Play Store from an accuracy perspective. Furthermore, we were able to prove and demonstrate that logistic regression is better in terms of speed, rate of accuracy, recall and F1 perspective. This conclusion was achieved after preprocessing a number of data values from these data sets.


Introduction
In an information era where a large amount of data needs to be processed every day, minute and second-and the huge demand on computers with high processing speeds to outcome accurate results within nanoseconds, it is said that all approximately 2.5 quintillion bytes of data can be manually or automatically generated on a daily basis using different tools and application. Moreover, this illustrates the importance of text-mining techniques in handling and classifying data in a meaningful way. There are a variety of applications that help in classifying a string or a text, such as those used to detect user sentiments on comments or tweets and classification of an e-mail as spam. In this research,
We scraped raw data from the Google Play Store, collected these data in chunks and normalized the dataset for our analysis; 3.
We compared the accuracy of various machine-learning algorithms and found the best algorithm according to the results; 4.
Algorithms can check the polarity of sentiment based on whether a review is positive, negative or neutral. We can also prove this using the word cloud corpus.
This research serves our key contributions as below: • One of the key findings is that logistic regression performs better compared to random forest and naïve Bayes multinomial to multi-class data; • A good preprocessing affects the performance of machine learning models; • Term frequency (TF) overall results after preprocessing were better than the term frequency/inverse document frequency (TF/IDF).
Text mining-also referred to as text data-is the process for deriving data in which information is derived via patterns inventing of and trends along with pattern learning [5]. Text mining requires simplifying the input text through parsing-typically, along with the accession of many derived linguistic aspects and the removal of others-and the following insertion into a database, even deriving patterns inside the structured information, and last, interpreting and analyzing the output [6]. Text mining describes a combo of meanings that is novelty and fascination. Typical text mining jobs comprise text categorization, text clustering, concept/entity extraction, production of granular taxonomies, opinion analysis, document summarization and connection mimicking, i.e., understanding links between named entities [7].
Several constraints prevent analysts and development teams from utilizing the information in the reviews. To explain this further, user reviews, which demand considerable effort are available on program stores. Recent research showed that iOS consumers submit around 22 reviews per day per application [8].
Top-rated apps, like Facebook, get more than 4000 reviews. Second, the quality of the reviews fluctuates widely, from helpful advice to sardonic comments. Third, a review is nebulous, which Algorithms 2020, 13,202 3 of 27 makes it challenging to filter negative from positive feedback. In addition to that, the star rating of a certain application represents the mean of the overall reviews done to be the users combining positive and negative ratings and hence is limited for the application development group [9].
In linguistics, evaluation is the process of relating syntactic constructions, in the degree of words, phrases, sentences and paragraph composing. In addition, it entails removing characteristics specific to the extent that this type of project is possible, to contexts [10]. The components of idiom and figurative speech, being cultural, are often additionally converted into invariant significance in semantic evaluation. Semantics-although associated with pragmatics-is different in that the former deals using term-or sentence-choice in any given circumstance, while pragmatics consider the meaning that is exceptionally derived from tone or context. In other words, in various conditions, semantics is about the meaning and pragmatics, the meaning encoded keywords which are translated by an audience [11]. In information retrieval, TF/IDF, brief for term frequency/inverse document frequency, is a statistic that is meant to reveal how important a word is to some document from the corpus or even some collection. This is often employed as a weighting factor in hunts of user-friendliness and text mining data retrieval. The TF/IDF value rises to the number of times each word appears in the document and the number of documents in the corpus that contain the word; this may help to rectify the fact that states that there are few words that appear more frequently in general [12].
TF/IDF is a common schemes today; TF/IDF is utilized by 83% of text-based recommender systems in libraries. Employing common words in TF/IDF, e.g., posts get a significant weight even when they provide no exact information about common words. In TF/IDF, the more familiar a word in the corpus, the less the weight it receives. Thus, weights are received by common words like posts. However, words, that are assumed to carry additional information receive more weight [13]. Beautiful Soup is a library that is used in Python for extracting data from XML and HTML documents. It functions together with the favorite parser to provide ways of navigating, searching and modifying the parse tree [14].
The RE-module offers sophisticated approaches to produce and utilize rational expressions [15]. A regular expression is a sort of formulation that specifies patterns in strings. The title "regular expression" stems from the prior mathematical treatment of" routine collections. We are still stuck with the term. Describing regular expressions can provide us with a straightforward way to define a pair of related strings by describing. Accordingly, we compose a pattern to summarize a set of strings. This pattern string can be compiled to effectively determines whether and where a given string matches the design [16,17].

Literature Review
The authors [18] mechanically extracted relevant features from reviews of programs (e.g., information about bugs, plugins and prerequisites) and analyzed the sentiment related to each. In this study, three major building blocks: (i) topic simulating, (ii) belief investigation and (iii) summarization port are discussed. The subject modeling block aims to find topics that w semantic from textual remarks, extracting the attributes based on the words of every issue. The opinion analysis block detects the sentiment associated with each discovered feature.
The summarization interface provides programmers [19] with an intuitive visualization of these features (i.e., topics)-along with their associated sentiment-providing more valuable information than a 'star rating'. Our analysis demonstrates that the topic modeling block may organize the information supplied by users into subcategories that facilitate the comprehension of features that may be positive, negative-along with also a neutral impact on the overall evaluation of the program. For seeing user satisfaction, the authors could see that-in spite of the star rating being an extreme measure of investigation-the sentiment analysis method was more precise in capturing the opinion transmitted from the consumer using a remark.
The authors discussed [20] the sentiment analysis of application reviews. In this study, the result shows the sentiment analysis of app review approaches that are helpful for the developers. With these In another research [25], results have shown that semantic frames may enable an economic review classification process that was quick and exact. Nevertheless, in reviewing summarization jobs, our deductions assert that summarization creates comprehensive summaries than summarization. In closing, authors have introduced MARC 2.0, a review classification and summarization package that implements the algorithms investigated in the analysis.
In another research [26], nowadays, the use of apps has increased with the use of advanced mobile technology. Users prefer to use mobile phones for mobile applications over any other devices. Users already downloaded different mobile applications in their mobile phones and they use these applications and left reviews about it.
The authors of [27] research in the mobile application market, fallacious ranking points, may lead to pushing up mobile apps in the popularity list. Indeed, it turns more periodic for application developers to use fake mechanism. The study has, at this moment, proposed a semantic analysis of application review for fraud detection in mobile apps. First, authors have submitted to detect the misrepresentation by excavating the active periods correctly, also called as leading sessions, of the mobile apps [27].
The authors of [28] have an intention to inspect two types of evidence: ranking-based review and -based and use natural language processing (NLP) to get action words. Next, authors agreed to convert reviews to ratings and finally perform pattern analysis on the session with application data gathered from the application store. Hence, the study proposed an approach to validate its effectiveness and show the scalability of the detection algorithm.
The author of [29]-another research-find user reviews to be a part of available application markets like the Google Play Store. The question arises: How do writers make sense from these and summarize millions of consumer reviews? Unfortunately, beyond straightforward summaries like histograms of consumer evaluations, few analytic tools exist that can provide insights into user testimonials.
According to the research [30], this application may (a) find inconsistencies in reviews (b) identify reasons why consumers prefer or dislike a given program and also provide an interactive, zoomable view of evolving user reviews; and (c) offer valuable insights into the entire application market, Algorithms 2020, 13, 202 5 of 27 differentiating important user issues and preferences of different kinds of programs. The goods on "amazon.com" the mobile application, are constantly evolving, with newer versions immediately superseding the older ones. The App store utilizes an evaluation program, which aggregates each test delegated into a store rating.
Authors describe [31] the researchers resisted the store ratings of more than apps daily, whether the store rating captures the erratic user-satisfaction levels regarding application versions to examine. However, many app version evaluations increased or dropped; their store evaluation was resilient to changes once they had gathered a quantity of raters. The result of ratings is not energetic enough to catch varying user satisfaction levels. This durability is an issue that could discourage programmers from improving program quality.
In this research, authors [32] propose a self-learning base architecture that is used in the analysis process. According to this research, this architecture is best for an analysis of a huge amount of data sources with minimal interference of the user.
In this study, the authors [33] examined the problem of sparse principal component analysis (PCA). The PCA is a tool that is commonly used for the analysis of data as well as for visualization. This study presents an algorithm for a single-factor sparse PCA problem and this algorithm performance is slightly better than the other methods. They use a different type of dataset, for example, news data, voting data, etc. to get their desire results. According to this study, they use convex relaxation methods for good results and for good solation quality greedy algorithm found better in this research.
The authors [34] recommend a framework for discrete records and document modeling of the topical structure. According to the method, this model allocates a word in a text document to a specific topic. There are many-to-many relationship methodologies that are used between the topic and the words, as well as among the topic and documents. LDA makes this model easy, and it can easily use with the complex architecture. This model is not only feasible in the document cluster for the topic, but also reasonable in various dimensions. This model was found to be efficient at improving several traditional models.
In another research, reference [35] propose a method probabilistic latent semantic analysis for unsupervised learning and its base on a latent statistical model. This method is very accurate than normal LSA because of its base on statistical approaches. This research found for fitting procedure tempered expectation maximization is the dominant method. Result prove the study get their desired result and PLSA is an excellent method use with a various application that is based on information extracting and text learning.
The authors of [36] propose a new visualization engine called automatic exploration of textual collections for healthcare (ACE-HEALTH), which is used to analyze medical data using latent Dirichlet allocation (LDA). This engine combines various visualization techniques to provide controllable revealing consoles. The result shows the effectiveness of this engine with its compactness.
The result proposes an encouraging method that is used in the current scenario. The current methods of machine learning are feasible for this dataset and prove better results. Experiments also show the use of these combined features can improve the model of machine learning and improve performance.

Materials and Methods
This section consists of the explanation of the dataset used for data collection, its visualization and the proposed methodology used in our research on the chosen dataset.

Data collection Description
In this research, we scraped thousands of user reviews and ratings of different applications based on different categories, as shown in Figure 1. We selected 14 categories from the Google Play Store, and scraped different applications from each class, as shown in Table 1. These application categories were: action, arcade, card, communication, finance, health and fitness, photography, shopping, sports, Algorithms 2020, 13, 202 6 of 27 video player editor, weather, casual medical and racing. We scraped thousands of reviews and ratings of application and converted these data into a ".CSV" file format. After this, we are applied a preprocess to remove special characters, remove a single character, remove a single character from the start, subtract multiple spaces with single spaces, remove prefixes, convert data into lowercase. We used stop words and a stemming technique on data in the ".CSV" file. We then evaluated the results by using different machine-learning algorithms to find the best algorithm for classification. We downloaded 148 apps that appeared in 14 categories from the Google Play Store, fetched several reviews and entered the required pages according to the reviews. We collected a total of 506,259 reviews from the Google Play Store website, as shown in Figure 1. To fetch the data, in the first step we used a requests library. We used Python's Scikit-Learn library (Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel & Vanderplas, 2011) for machine learning because this library provides machine-learning algorithms like classification, regression, clustering, model validation, etc. [32]. The requests library allows the user to send HTTP/1.1 requests using Python to add content like headers. This library enabled users to process response data in Python. Then we used the Re-library for text processing. A regular expression is a unique sequence of characters that help the user match or find other strings or sets of strings, using a specific syntax held in a pattern. After using the Re-library, we used the Beautiful Soup library to extract data from the HTML and XML files. The measurement of scraped results from different categories is shown in Table 1.
Algorithms 2020, 13, x FOR PEER REVIEW 6 of 28 into lowercase. We used stop words and a stemming technique on data in the ".CSV" file. We then evaluated the results by using different machine-learning algorithms to find the best algorithm for classification. We downloaded 148 apps that appeared in 14 categories from the Google Play Store, fetched several reviews and entered the required pages according to the reviews. We collected a total of 506,259 reviews from the Google Play Store website, as shown in Figure 1. To fetch the data, in the first step we used a requests library. We used Python's Scikit-Learn library (Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel & Vanderplas, 2011) for machine learning because this library provides machine-learning algorithms like classification, regression, clustering, model validation, etc. [32]. The requests library allows the user to send HTTP/1.1 requests using Python to add content like headers. This library enabled users to process response data in Python. Then we used the Re-library for text processing. A regular expression is a unique sequence of characters that help the user match or find other strings or sets of strings, using a specific syntax held in a pattern. After using the Relibrary, we used the Beautiful Soup library to extract data from the HTML and XML files. The measurement of scraped results from different categories is shown in Table 1.

Methodology
In this methodology for classification, we started by scraping application reviews. On Google Play Store using the AppID request for the scrape, we scraped the reviews of a specific application for several pages with reviews and ratings of the applications. We have scraped this dataset to classify user reviews that were positive, negative or neutral review. After scraping the bulk raw reviews, the next step was preprocessing of those reviews. In preprocessing different levels, we normalized our reviews after preprocessing. These steps involved removing a particular character, deleting a single character, removing a single character from the start, subtracting multiple spaces with individual spaces and removing prefixes, then converting data into lowercase. At the end of these, stop words and stemming was performed. These were the main steps for refining our reviews. After improving reviews, the bag of words approach was presented. In the next level, apply term frequency (TF) on reviews by using a Python language. After this, we applied term frequency-inverse document frequency (TF/IDF); this is often used in information retrieval and text mining. After applying TF/IDF, feature extraction was performed on each application. By using Python, we used a different algorithm for classification naïve Bayes, random forest and logistic regression and checked the various parameters like accuracy, precision, recall and F1-score and found the statistical information of these parameters. After analyzing and testing from statistical data, we determined the result of which algorithm had the maximum accuracy, precision, recall and F1-score information. We can see which algorithm is best for analyzing reviews for classification in Figure 2.
for several pages with reviews and ratings of the applications. We have scraped this dataset to classify user reviews that were positive, negative or neutral review. After scraping the bulk raw reviews, the next step was preprocessing of those reviews. In preprocessing different levels, we normalized our reviews after preprocessing. These steps involved removing a particular character, deleting a single character, removing a single character from the start, subtracting multiple spaces with individual spaces and removing prefixes, then converting data into lowercase. At the end of these, stop words and stemming was performed. These were the main steps for refining our reviews. After improving reviews, the bag of words approach was presented. In the next level, apply term frequency (TF) on reviews by using a Python language. After this, we applied term frequency-inverse document frequency (TF/IDF); this is often used in information retrieval and text mining. After applying TF/IDF, feature extraction was performed on each application. By using Python, we used a different algorithm for classification naïve Bayes, random forest and logistic regression and checked the various parameters like accuracy, precision, recall and F1-score and found the statistical information of these parameters. After analyzing and testing from statistical data, we determined the result of which algorithm had the maximum accuracy, precision, recall and F1-score information. We can see which algorithm is best for analyzing reviews for classification in Figure 2.

Supervised Machine Learning Models
In this section, we implement different machine-learning algorithms. For the implementation of machine-learning algorithms, we use the Scikit-learn library and NLTK. These three algorithms mainly use for classification and regression problems. For this purpose, we use the naïve Bayes multinomial, random forest algorithm and logistic regression algorithm.

Classifier Used for Reviews
This section contains the implementation of different machine-learning algorithms used in this study.

Supervised Machine Learning Models
In this section, we implement different machine-learning algorithms. For the implementation of machine-learning algorithms, we use the Scikit-learn library and NLTK. These three algorithms mainly use for classification and regression problems. For this purpose, we use the naïve Bayes multinomial, random forest algorithm and logistic regression algorithm.

Classifier Used for Reviews
This section contains the implementation of different machine-learning algorithms used in this study.

Logistic Regression
Logistic regression is one of the techniques used for classification. Its performance modeling of an event occurring versus event is not occurring. We check the probability of events occur or not occur. In Binary classification, there are two choices, i.e., (0 or 1, yes or not) on the other hand in multiclass Classification, there are more than two categories.
where X 1 , X 2 . . . X p is independent variables and B 0 , B 1 ... B p is the coefficients.

Random Forest
Random forest is an ensemble model that is well-known for producing accurate prediction results. In the first study [35] about the random forest, which explains some concepts about the ensemble of decision tree known as random forest. Using a single tree classifier, maybe there some problems raised, e.g., outliers, which may affect the overall performance of the classification method. However, because of the uncertainty random forest is one of the types of classifiers that forcefully to outliers and noise [36]. Random forest classifier has two kinds of characters; one is for with rest to data and the second one is for its features. The main features of random forest are bagging and bootstrapping [35].

Naïve Bayes Multinomial
We now all are aware the naïve Bayes classifier is predicated upon exactly the bag-of-words version. Using the bag-of-words together with all the versions, we assess which word-of this text-document looks in also a negative-words-list or a positive-words-list. In case the term arises within a positive-words-list, this text's rating was upgraded together using all +1 and vice versa. In the end, if the result score is positive, then the text is classified in the category as positive and if it is in negative form, so the text is classified as in negative category [36].

Result and Discussion
Here, we present statistical information of the different algorithms used in these experiments on the base of the various parameters after preprocessing. We compare the different algorithms and identify the best to be used for classifying and analyzing user reviews.

Analytical Measurement and Visualization after Preprocessing
Below are the statistical information of different algorithms on the base of the various parameters after preprocessing. We compare and find the best algorithm that uses for the analysis and classification of reviews.

Naïve Bayes Multinomial
Naïve Bayes was used for classification. It assumes that the occurrence of a specific feature is independent of the occurrence of other features. From a prediction perspective, the performance of this model is considered very fast compared to other models. We scraped 148 apps reviews from 14 categories from Google Play Store. There were 40 reviews on one page, we collected a total of 506,259 reviews from Google Play Store applications. We applied the naïve Bayes algorithm for classification on that dataset of reviews and found different information on different parameters concerning TF and TF/IDF. We calculated the classification accuracy of the model for each application category and reported the precision, recall and F1 scores. Figure 3 illustrates a visualization bar chart for the naïve Bayes algorithm in which series1 indicates the accuracy of the algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.
Algorithms 2020, 13, x FOR PEER REVIEW 10 of 28 on that dataset of reviews and found different information on different parameters concerning TF and TF/IDF. We calculated the classification accuracy of the model for each application category and reported the precision, recall and F1 scores. Figure 3 illustrates a visualization bar chart for the naïve Bayes algorithm in which series1 indicates the accuracy of the algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.

Random Forest Algorithm
The Random forests classifier is an ensemble learning method for classification that operates by constructing a multitude of decision trees in which the outcomes are calculated based on the random selection of data. In our experiment, this was done by classifying the reviews and applying different information on different parameters concerning TF and TF/IDF in which we calculated the accuracy of the classification of each application category. In statistical information, we reported precision, recall and F1 scores. Figure 4 illustrates a visualization bar chart for the random forest algorithm in which series1 indicates the accuracy of the random forest algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.

Random Forest Algorithm
The Random forests classifier is an ensemble learning method for classification that operates by constructing a multitude of decision trees in which the outcomes are calculated based on the random selection of data. In our experiment, this was done by classifying the reviews and applying different information on different parameters concerning TF and TF/IDF in which we calculated the accuracy of the classification of each application category. In statistical information, we reported precision, recall and F1 scores. Figure 4 illustrates a visualization bar chart for the random forest algorithm in which series1 indicates the accuracy of the random forest algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.

Logistic Regression Algorithm
According to the statistics, the logistic regression algorithm can be a reliable statistical version, in which its essential type that runs on the logistic functionality to simulate a binary determining factor; many complex extensions exist. It is an application for binomial Regression. In this experiment, we applied the logistic regression algorithm for classification on the dataset for reviews and found different information on different parameters concerning TF and TF/IDF. We calculated the accuracy of classification of each category application and reported precision, recall and F1 scores. Figure 5 illustrates a visualization bar chart for the logistic regression algorithm in which series1 indicates the accuracy of the logistic regression algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.

Logistic Regression Algorithm
According to the statistics, the logistic regression algorithm can be a reliable statistical version, in which its essential type that runs on the logistic functionality to simulate a binary determining factor; many complex extensions exist. It is an application for binomial Regression. In this experiment, we applied the logistic regression algorithm for classification on the dataset for reviews and found different information on different parameters concerning TF and TF/IDF. We calculated the accuracy of classification of each category application and reported precision, recall and F1 scores. Figure 5 illustrates a visualization bar chart for the logistic regression algorithm in which series1 indicates the accuracy of the logistic regression algorithm, series2 indicates the precision, series3 indicates the recall, and series4 indicates the F1 score measurement.

Different Machine-Learning Algorithm Comparison after Preprocessing
The Google Play Store is an online market place that provided free and paid access to users. Google Play Store, users can choose from over a million apps from various predefined categories. In this research, we scraped thousands of user review and application ratings. We evaluated the results by using different machine-learning algorithms like naïve Bayes, random forest and logistic regression algorithm that can check the semantics of reviews about some applications from users to determine if their reviews are good, bad, reasonable, etc. We calculated the term frequency (TF) with different parameters like accuracy, precision, recall and F1 score after the preprocessing of the raw reviews in the concluded results compared the statistical result of these algorithms. We visualized these statistical results in the form of a pie chart, as shown in Figure 6. Figure 7 shows the calculated term frequency (TF) and inverse document frequency (IDF) based result with different parameters after the preprocessing of the raw reviews in the form of pie chart. After comparison, we found that the logistic regression algorithm was the best algorithm for checking the semantic analysis of any Google application user reviews on both TF and TF/IDF bases. As in the sports category in the TF base, we found the logistic regression algorithm had 0.622% accuracy, 0.414% precision, 0.343% recall and 0.343% F1 score, and the statistical information with another category of application, as shown in Table 2. In addition, in TF/IDF base we showed that the logistic regression algorithm had a 0.621%

Different Machine-Learning Algorithm Comparison after Preprocessing
The Google Play Store is an online market place that provided free and paid access to users. Google Play Store, users can choose from over a million apps from various predefined categories. In this research, we scraped thousands of user review and application ratings. We evaluated the results by using different machine-learning algorithms like naïve Bayes, random forest and logistic regression algorithm that can check the semantics of reviews about some applications from users to determine if their reviews are good, bad, reasonable, etc. We calculated the term frequency (TF) with different parameters like accuracy, precision, recall and F1 score after the preprocessing of the raw reviews in the concluded results compared the statistical result of these algorithms. We visualized these statistical results in the form of a pie chart, as shown in Figure 6. Figure 7 shows the calculated term frequency (TF) and inverse document frequency (IDF) based result with different parameters after the preprocessing of the raw reviews in the form of pie chart. After comparison, we found that the logistic regression algorithm was the best algorithm for checking the semantic analysis of any Google application user reviews on both TF and TF/IDF bases. As in the sports category in the TF base, we found the logistic regression algorithm had 0.622% accuracy, 0.414% precision, 0.343% recall and 0.343% F1 score, and the statistical information with another category of application, as shown in Table 2. In addition, in TF/IDF base we showed that the logistic regression algorithm had a 0.621% accuracy, 0.404% precision, 0.319% recall and 0.315% F1 score and the statistical information with another category of application, as shown in Table 3.

Analytical Measurement and Visualization without Preprocessing of Dataset
These are the statistical information of different algorithms on the base of the various parameters after data collection; compare and find the best algorithm that uses for the analysis and classification of reviews.

Naïve Bayes Multinomial
Naïve Bayes is a commonly used classification algorithm. Naïve Bayes assumes that the occurrence of a specific feature is independent of the existence of other features. It is fast to make models and make predictions. We applied the naïve Bayes algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. We found the accuracy of classification of each category application, and in the statistical information found precision, recall and F1-scored these all parameters used to measure the accuracy of the dataset. In addition, bar-chart visualization of naïve Bayes algorithm in which series1 shows the accuracy of the naïve Bayes algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement as shown in Figure 8.

Analytical Measurement and Visualization without Preprocessing of Dataset
These are the statistical information of different algorithms on the base of the various parameters after data collection; compare and find the best algorithm that uses for the analysis and classification of reviews.

Naïve Bayes Multinomial
Naïve Bayes is a commonly used classification algorithm. Naïve Bayes assumes that the occurrence of a specific feature is independent of the existence of other features. It is fast to make models and make predictions. We applied the naïve Bayes algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. We found the accuracy of classification of each category application, and in the statistical information found precision, recall and F1-scored these all parameters used to measure the accuracy of the dataset. In addition, bar-chart visualization of naïve Bayes algorithm in which series1 shows the accuracy of the naïve Bayes algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement as shown in Figure 8.

Random Forest Algorithm
The Random forests classifier is the class of all methods that are designed explicitly for decision tree. It develops many decision trees based on a random selection of data and a random selection of variables. We applied the random forest algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. Find the accuracy of classification of each category application and in statistical information find precision, recall and F1

Random Forest Algorithm
The Random forests classifier is the class of all methods that are designed explicitly for decision tree. It develops many decision trees based on a random selection of data and a random selection of variables. We applied the random forest algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. Find the accuracy of classification of each category application and in statistical information find precision, recall and F1 score these all parameters use to measure the accuracy of the dataset. In addition, bar-chart visualization of the random forest algorithm in which series1 shows the accuracy of the random forest algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement, as shown in Figure 9.
Algorithms 2020, 13, x FOR PEER REVIEW 18 of 28 score these all parameters use to measure the accuracy of the dataset. In addition, bar-chart visualization of the random forest algorithm in which series1 shows the accuracy of the random forest algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement, as shown in Figure 9.

Logistic Regression Algorithm
In statistics, the logistic product can be a trusted statistical version, which, in its essential type that runs on the logistic functionality to simulate a binary determining factor, many complex extensions exist. Back in Regression investigation, logistic regression will be estimating the parameters of the logistic version; it is an application of both binomial Regressions. We applied the logistic regression algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. Find the accuracy of classification of each category application and in statistical information find precision, recall and F1 score these all parameters use to measure the accuracy of the dataset. In addition, bar-chart visualization of the logistic regression algorithm in which series1 shows the accuracy of the logistic regression algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement as shown in Figure 10.

Logistic Regression Algorithm
In statistics, the logistic product can be a trusted statistical version, which, in its essential type that runs on the logistic functionality to simulate a binary determining factor, many complex extensions exist. Back in Regression investigation, logistic regression will be estimating the parameters of the logistic version; it is an application of both binomial Regressions. We applied the logistic regression algorithm for classification on that dataset of reviews and find different information on different parameters concerning TF and TF/IDF. Find the accuracy of classification of each category application and in statistical information find precision, recall and F1 score these all parameters use to measure the accuracy of the dataset. In addition, bar-chart visualization of the logistic regression algorithm in which series1 shows the accuracy of the logistic regression algorithm, series2 shows the precision, series3 shows the recall, and series4 shows the F1 score measurement as shown in Figure 10.

Different Machine-Learning Algorithm Comparison without Preprocessing of Dataset
Using the Google Play Store, users can choose from over a million apps from various predefined categories. We evaluated the results by using different machine-learning algorithms like naïve Bayes, random forest and logistic regression algorithm that can check the semantics of reviews of some applications from users that their reviews were good, bad, average, etc. We calculated term frequency (TF) of different parameters like accuracy, precision, recall and F1 score after the data collection of the raw reviews in the concluded results compared the statistical result of these algorithms. We visualized these analytical results in the form of a pie chart, as shown in Figure 11. Figure 12 shows the calculated term frequency (TF) and inverse document frequency (IDF) based result with different parameters without preprocessing of the raw reviews in the form of pie chart After comparison, we found that the logistic regression algorithm was the best algorithm to check the semantic analysis of any Google application user reviews on both TF and TF/IDF bases. As in the sports category in the TF base, we show the logistic regression algorithm has 0.623% accuracy, 0.416% precision, 0.35% recall and 0.353% F1 score and the statistical information with another category of application as shown in Table 4. In addition, in TF/IDF base show that the logistic regression algorithm has 0.629% accuracy, 0.416% precision, 0.331% recall and 0.328% F1 score and the statistical information with another category of application as shown in Table 5.

Different Machine-Learning Algorithm Comparison without Preprocessing of Dataset
Using the Google Play Store, users can choose from over a million apps from various predefined categories. We evaluated the results by using different machine-learning algorithms like naïve Bayes, random forest and logistic regression algorithm that can check the semantics of reviews of some applications from users that their reviews were good, bad, average, etc. We calculated term frequency (TF) of different parameters like accuracy, precision, recall and F1 score after the data collection of the raw reviews in the concluded results compared the statistical result of these algorithms. We visualized these analytical results in the form of a pie chart, as shown in Figure 11. Figure 12 shows the calculated term frequency (TF) and inverse document frequency (IDF) based result with different parameters without preprocessing of the raw reviews in the form of pie chart After comparison, we found that the logistic regression algorithm was the best algorithm to check the semantic analysis of any Google application user reviews on both TF and TF/IDF bases. As in the sports category in the TF base, we show the logistic regression algorithm has 0.623% accuracy, 0.416% precision, 0.35% recall and 0.353% F1 score and the statistical information with another category of application as shown in Table 4. In addition, in TF/IDF base show that the logistic regression algorithm has 0.629% accuracy, 0.416% precision, 0.331% recall and 0.328% F1 score and the statistical information with another category of application as shown in Table 5.

Semantic Analysis of Google Play Store Applications Reviews Using Logistic Regression Algorithm
After checking the different parameters, we analyzed that the logistic regression algorithm was the best algorithm having the highest accuracy. In this section, we performed analysis and classified all reviews in different classes as positive, negative or neutral. Set target value if the value of the comment is positive, it is equal to 1 if the review is negative and it is equal to 0. In addition, we analyzed the neutral class with the confidence rate if the confidence rate is between the 0 and 1 then classified this to neutral class. Different parameters in our dataset like the category of application, application name, Application ID, Reviews and rating are shown in Figure 13. However, for checking the semantics of each review, these parameters were more enough. This is why we selected only reviews of all applications.

Semantic Analysis of Google Play Store Applications Reviews Using Logistic Regression Algorithm
After checking the different parameters, we analyzed that the logistic regression algorithm was the best algorithm having the highest accuracy. In this section, we performed analysis and classified all reviews in different classes as positive, negative or neutral. Set target value if the value of the comment is positive, it is equal to 1 if the review is negative and it is equal to 0. In addition, we analyzed the neutral class with the confidence rate if the confidence rate is between the 0 and 1 then classified this to neutral class. Different parameters in our dataset like the category of application, application name, Application ID, Reviews and rating are shown in Figure 13. However, for checking the semantics of each review, these parameters were more enough. This is why we selected only reviews of aplication s.

HTML Decoding
To convert HTML encoding into text and in the start or ending up in the text field as '&amp,' '\amp' & 'quote.' Data Preparation 2: '# ' Mention "#" carries import information that must deal with is necessary.

URL Links
Remove all URLs that appear in reviews remove them.

UTF-8 BOM (Byte Order Mark)
For patterns of characters like "\xef\xbf\xbd," these are UTF-8 BOM. It is a sequence of bytes (EF, BB, BF) that allows the reader to identify a file as being encoded in UTF-8.

Hashtag/Numbers
Hashtag text can provide useful information about the comment. It may be a bit tough to get rid of all the text together with the "#" or with a number or with any other unique character needs to deal.

HTML Decoding
To convert HTML encoding into text and in the start or ending up in the text field as '&amp,' '\amp' & 'quote.' Data Preparation 2: '#' Mention "#" carries import information that must deal with is necessary.

URL Links
Remove all URLs that appear in reviews remove them.

UTF-8 BOM (Byte Order Mark)
For patterns of characters like "\xef\xbf\xbd," these are UTF-8 BOM. It is a sequence of bytes (EF, BB, BF) that allows the reader to identify a file as being encoded in UTF-8.

Hashtag/Numbers
Hashtag text can provide useful information about the comment. It may be a bit tough to get rid of all the text together with the "#" or with a number or with any other unique character needs to deal.

Negation Handling
∼is the factor that is not suitable in the review to remove them. Parse the whole comment into small pieces/segments and then merge again. After applying the above rules on cleaning, the reviews cleaned the form of reviews, as shown in Figure 14.
Algorithms 2020, 13, x FOR PEER REVIEW 24 of 28 Negation Handling ~is the factor that is not suitable in the review to remove them.

Tokenizing and Joining
Parse the whole comment into small pieces/segments and then merge again. After applying the above rules on cleaning, the reviews cleaned the form of reviews, as shown in Figure 14.

Find Null Entries from Reviews
It seems there were about 700-800 null entries in the reviews column of the dataset. This may have happened during the cleaning process to remove the null entries with using commands, as shown below.

Negative and Positive Words Dictionary
By using word cloud corpus, I made a negative and positive word dictionary based on the occurrence of words in a sentence to get the idea of what kind of words are frequent in the corpus, as shown in Figure 15.

Find Null Entries from Reviews
It seems there were about 700-800 null entries in the reviews column of the dataset. This may have happened during the cleaning process to remove the null entries with using commands, as shown below.

Negative and Positive Words Dictionary
By using word cloud corpus, I made a negative and positive word dictionary based on the occurrence of words in a sentence to get the idea of what kind of words are frequent in the corpus, as shown in Figure 15. Negation Handling ~is the factor that is not suitable in the review to remove them.

Tokenizing and Joining
Parse the whole comment into small pieces/segments and then merge again. After applying the above rules on cleaning, the reviews cleaned the form of reviews, as shown in Figure 14.

Find Null Entries from Reviews
It seems there were about 700-800 null entries in the reviews column of the dataset. This may have happened during the cleaning process to remove the null entries with using commands, as shown below.

Negative and Positive Words Dictionary
By using word cloud corpus, I made a negative and positive word dictionary based on the occurrence of words in a sentence to get the idea of what kind of words are frequent in the corpus, as shown in Figure 15.

Semantic Analysis of Google Play Store Applications Reviews Using Logistic Regression Algorithm
In the result, classify all reviews three different classes and can check the confidence rate of each rate that how much that comment is positive, negative and neutral. Set the target value equal to 0 to 1 and check the confidence value in that ratio and check the class of the review using the logistic regression algorithm, as shown in Figure 16.

Semantic Analysis of Google Play Store Applications Reviews Using Logistic Regression Algorithm
In the result, classify all reviews three different classes and can check the confidence rate of each rate that how much that comment is positive, negative and neutral. Set the target value equal to 0 to1 and check the confidence value in that ratio and check the class of the review using the logistic regression alorithm, as shown in Figure 16.

Conclusion and Future Work
On the Google Play Store, users may download as many applications from different categorized groups. In this research, we have considered hundreds of thousands of mobile application user reviews putting into consideration 14 different application categories and downloaded 148 application reviews in which the total number of reviews we worked on was accumulated to 50,6259 reviews out of Google Play Store. Assessed the consequence results using machine-learning algorithms such as naïve Bayes, random forest and logistic regression algorithm, which will evaluate the semantics of applications users' reviews are equally positive, negative and neutral in which we calculated the term frequency (TF) and inverse document frequency (IDF) with various parameters such as accuracy, precision, recall and F1 score and regarding the statistical impact of those calculations. We did not face any challenges while using TF-especially with common phrases for example, in articles in which we get large weight corpus. In regards to TF/IDF, we noticed that the more a common term from the corpus, the larger the weight it received. Common phrases like posts received weights that were small. However rare words that are assumed to carry details got more weight. In our research, we used the visualization of bar charts to simplify the comparison between the different algorithms and the results achieved out of each one of them. The results reveal that logistic regression could be a perfect algorithm in the term of classification and prediction compared to other algorithms used specially with the dataset used in this experiment and hence, it is worth mentioning that up to our knowledge, this is the first time that such dataset is used for classification In-addition to that, logistic regression got the optimal speed of precision, accuracy, recall and F1 score after preprocessing the dataset. As an example, we were able to achieve good accuracy result for the sport category after preprocessing the dataset and implementing the logistic regression algorithm as in TF 0.622% accuracy, 0.414% precision, 0.343% recall and 0.343% F1 score and in TF/IDF based logistic regression algorithm has 0.621% accuracy, 0.404% precision, 0.319% recall and 0.315% F1 score. In addition, the sports category in TF base after data collection the logistic regression algorithm has 0.623% accuracy, 0.416% precision, 0.35% recall and 0.353% F1 score and in TF/IDF based logistic regression algorithm has 0.629% accuracy, 0.416% precision, 0.331% recall and 0.328% F1 score and the statistical information with another category of applications analyze in the concluded table below that shows the authenticity of this analysis. The section performs analysis and classifies all reviews

Conclusions and Future Work
On the Google Play Store, users may download as many applications from different categorized groups. In this research, we have considered hundreds of thousands of mobile application user reviews putting into consideration 14 different application categories and downloaded 148 application reviews in which the total number of reviews we worked on was accumulated to 50,6259 reviews out of Google Play Store. Assessed the consequence results using machine-learning algorithms such as naïve Bayes, random forest and logistic regression algorithm, which will evaluate the semantics of applications users' reviews are equally positive, negative and neutral in which we calculated the term frequency (TF) and inverse document frequency (IDF) with various parameters such as accuracy, precision, recall and F1 score and regarding the statistical impact of those calculations. We did not face any challenges while using TF-especially with common phrases for example, in articles in which we get large weight corpus. In regards to TF/IDF, we noticed that the more a common term from the corpus, the larger the weight it received. Common phrases like posts received weights that were small. However rare words that are assumed to carry details got more weight. In our research, we used the visualization of bar charts to simplify the comparison between the different algorithms and the results achieved out of each one of them. The results reveal that logistic regression could be a perfect algorithm in the term of classification and prediction compared to other algorithms used specially with the dataset used in this experiment and hence, it is worth mentioning that up to our knowledge, this is the first time that such dataset is used for classification In-addition to that, logistic regression got the optimal speed of precision, accuracy, recall and F1 score after preprocessing the dataset. As an example, we were able to achieve good accuracy result for the sport category after preprocessing the dataset and implementing the logistic regression algorithm as in TF 0.622% accuracy, 0.414% precision, 0.343% recall and 0.343% F1 score and in TF/IDF based logistic regression algorithm has 0.621% accuracy, 0.404% precision, 0.319% recall and 0.315% F1 score. In addition, the sports category in TF base after data collection the logistic regression algorithm has 0.623% accuracy, 0.416% precision, 0.35% recall and 0.353% F1 score and in TF/IDF based logistic regression algorithm has 0.629% accuracy, 0.416% precision, 0.331% recall and 0.328% F1 score and the statistical information with another category of applications analyze in the concluded table below that shows the authenticity of this analysis. The section performs analysis and classifies all reviews in different classes positive, negative and neutral. In our research, we set target values of (0 and 1) based on user's comments where 0 identify negative comments while 1 represents positive comments. However, values between 0 and 1 are set with a confidence rate and considered under the neutral class.
In the future, work, we may consider including more application categories, increasing the number of reviews and comparing the logistic regression algorithm accuracy results with different algorithms.
In addition, we may consider generating clusters and checking the relationship between reviews and ratings of the application that can analyze each application more precisely.