Profiling and Predicting the Cumulative Helpfulness (Quality) of Crowd-Sourced Reviews

With easy access to the Internet and the popularity of online review platforms, the volume of crowd-sourced reviews is continuously rising. Many studies have acknowledged the importance of reviews in making purchase decisions. The consumer’s feedback plays a vital role in the success or failure of a business. The number of studies on predicting helpfulness and ranking reviews is increasing due to the increasing importance of reviews. However, previous studies have mainly focused on predicting helpfulness of “reviews” and “reviewer”. This study aimed to profile cumulative helpfulness received by a business and then use it for business ranking. The reliability of proposed cumulative helpfulness for ranking was illustrated using a dataset of 1,92,606 businesses from Yelp.com. Seven business and four reviewer features were identified to predict cumulative helpfulness using Linear Regression (LNR), Gradient Boosting (GB), and Neural Network (NNet). The dataset was subdivided into 12 datasets based on business categories to predict the cumulative helpfulness. The results reported that business features, including star rating, review count and days since the last review are the most important features among all business categories. Moreover, using reviewer features along with business features improves the prediction performance for seven datasets. Lastly, the implications of this study are discussed for researchers, review platforms and businesses.


Introduction
The rapid growth of the Internet and the popularity of crowd-sourced review platforms have introduced electronic Word-of-Mouth (e-WoM) communities that provide a massive amount of User-Generated Content (UGC), i.e., online product reviews [1,2].The popular review websites, e.g., Yelp, Amazon, TripAdvisor, IMDB, Yahoo, Google, etc., serve as an essential source of information and help users in evaluating product quality and making purchase decisions [3][4][5][6].These websites, despite differing, i.e., Yelp reviews business, Amazon is an e-commerce website and review products, TripAdvisor is a booking website, etc., the principle of review helpfulness are common [7].According to Bright Local [8], 86% of consumers read online reviews, whereas 91% of consumers trust online reviews.The volume of online review is increasing day by day.Currently, there are more than 730 million reviews on TripAdvisor [9] and more than 184 million on Yelp [10].
The colossal quantity of unstructured data generated by e-WOM communities has become a source of big data to study real consumer behavior [11][12][13], which also introduced many challenges for both businesses and consumers [14].The "review helpfulness" is an important dimension of online reviews, which shows the subjectivity and quality perceived by the crowd [15,16].To overcome the problem of information overload and facilitate the consumers in finding helpful reviews from thousands of confusing reviews several solutions have been proposed using statistical modelling and Machine Learning (ML) [17][18][19].
The topic of predicting helpfulness of reviews has been studied by many researchers using similar features but reported inconsistent and contradictory results regarding the performance of different features in predicting helpfulness [20].Most of the solutions introduced by previous studies were for a specific category, product or platform [21,22].Researchers have tried to propose a generalized solution for different review platforms and product categories by using only textual features in making the prediction.However, they also suggested utilizing reviewer and product features to enhance the prediction performance [23].The datasets used for predicting helpfulness by previous studies are mostly different, small size and overdispersed [21,24].Diaz and Ng [7] highlighted the disorganized status of research in the area of review helpfulness prediction.
The quality of a business is represented by the average star rating of all reviews.Similarly, the quality of reviews received by a product is reflected by the helpfulness of reviews based on their information value.By using the review helpfulness as a tool, customers' ability to access the quality of a product or business has been greatly improved.The helpful user reviews are of great use to potential consumers as they provide information about the quality of a product that helps in evaluating and making purchase decisions [25][26][27].The businesses with more useful reviews are likely to attract more customers and the increased revenue in comparison to businesses with less useful reviews [28][29][30].
The review website usually allows the reader to give feedback to a review, i.e., helpful/not-helpful vote on Amazon, Useful vote on Yelp.This simple feedback boosted Amazon revenue by $2.3 billion [31].Lu et al. [32] reported that a major portion of the reviews has very few or no useful votes because the latest reviews did not get enough time to receive useful votes.Hence, the useful votes for individual reviews are too sparse to access the quality of reviews received by a product [33,34].There is a huge volume of reviews even for a single business, and it is challenging to see the quality reviews received by the business.Moreover, the quality of reviews received by one business is different from others, even for the same category.Therefore, similar to the average star rating of the business, the cumulative helpfulness of reviews for a business should be calculated as well.The cumulative helpfulness can be calculated from the perspective of the reviewer as well as business."Cumulative helpfulness" is the total helpful votes received by all reviews for a specific business or written by a particular reviewer.
Due to the importance of review helpfulness, the number of studies trying to explore the helpfulness of crowd-sourced reviews is continuously increasing.Despite these rising numbers, the majority of studies have explored the helpfulness of reviews for limited categories, e.g., shopping, restaurants, etc., and platforms, e.g., Amazon.com,while ignoring reviews categories, i.e., travel, hotel, health, and platforms, i.e., Yelp and TripAdvisor [35].In addition, the researchers have proposed many statistical and ML models for (i) predicting helpfulness of "review" and "reviewer"; and (ii) finding and ranking the top-k helpful "review" and "reviewer".However, according to our knowledge, there is no published research article attempting to find and predict the cumulative helpfulness (quality) of reviews for a business.
Therefore, the cumulative helpfulness of reviews received by a "business" still needs to be investigated.To fill the gap, this study aimed at finding the cumulative helpfulness of reviews received by a business and compared the prediction performance of various ML algorithms on datasets of different size and business categories.The main contributions of this paper are summarized as follows: (a) propose and calculate the cumulative helpfulness of reviews received by a business; (b) rank and compare top k businesses using cumulative helpfulness, review count and star rating; (c) identify and operationalize the business and reviewer features for predicting cumulative helpfulness of reviews received by a business; (d) analyze the performance of various learning algorithms to predict the cumulative helpfulness of reviews for a business using datasets of different size and business categories; (e) examine the impact of reviewer features in predicting cumulative helpfulness of a business; and (f) explore the importance of different business and reviewer features for predicting cumulative helpfulness.
The rest of the paper is organized as follows.Section 2 gives a brief overview of literature related to online review helpfulness prediction.Section 3 illustrates the research methodology.Section 4 reports and discusses the experimental results.Section 5 discusses the implications.Section 6 outlines the limitations and future work.Finally, Section 7 concludes the study.

Literature Review
The literature on predicting helpfulness of reviews is continuously increasing as it becomes a critical factor for consumers in making purchase decisions [20,36].This section provides an overview of the current state of the literature on predicting helpfulness and ranking reviews using multiple features, i.e., review content, reviewer, product/business, emotions, etc., and various techniques.A study found that review extremity, depth, and type of product affect the perceived helpfulness of reviews by analyzing data collected from Amazon.com.The type of product plays the role of moderator between depth and helpfulness [25].Cao et al. [15] studied the relation of review features with helpfulness.It was found that reviews with extreme opinions are more helpful when compared with neutral reviews.A study explored the helpfulness of online reviews by using both qualitative, i.e., reviewer experience, and quantitative, i.e., word count, features.The analysis was performed on 1375 reviews and data of the top-ranking 60 reviewers from Amazon.com.The relation of the length of review with the helpfulness appeared significant up to a certain threshold.In addition, the reviewer experience had reported no significant relation with helpfulness.However, the past record of reviewer helpfulness can predict future helpfulness.The study reported a changing impact of different review and reviewer related features on perceived helpfulness [37].The important reviewer features, along with review features, were examined.Performance of popular ML algorithms.i.e., NNet, Random Forest (RandF), Stochastic GB, etc., were compared by performing analysis over three datasets containing 32,434, 109,357, and 59,188 reviews collected from Amazon.com.The proposed review content-related features give the best performance in comparison with reviewer features and previously proposed models.The linguistic features of reviews along with the reviewer helpfulness per day are also strong predictors of review helpfulness [36].
The features that influence review helpfulness prediction were analyzed using review collected from Amazon.com and ML algorithms including Logistic Regression (LGR), Support Vector Regression (SVR), Model tree (M5P) and RandF.The results reported that the relation of different features with the review helpfulness prediction varies for all five categories tested.Moreover, SVR shows the best performance in predicting the review helpfulness for all five categories in comparison with LGR, M5, and RandF.To identify the most helpful review from the massive volume of reviews for a given product or business, a NNet based prediction model was proposed.The results reported the significance of features for predicting helpfulness of reviews [38].Wu [39], inspired by communication theories, tried to explore the effectiveness of reviews by keeping in consideration review popularity and helpfulness.The results from the analysis performed on Amazon.comreviews showed the importance of review popularity and helpfulness in evaluating the effectiveness of reviews.The review, reviewer, and product-related features were analyzed using ML algorithms.The data collected from Amazon.com contain 32,434 reviews and 3100 products were analyzed.The results revealed that the proposed review category and reviewer features are better predictors of review helpfulness.
The recency of reviewer, along with the length of activity, also showed statistically significant relation with the helpfulness of reviews [40].
The impact of emotions on the helpfulness of online reviews collect from Amazon.com was studied using Deep Neural Network (DNN).NRC emotion Lexicon was used to extract the emotions attached to reviews.The features that were previously studied, i.e., reviewer, product and linguistics, were used for predicting helpfulness.It was evident from the results that emotions were the best predictors of review helpfulness when features were taken individually.Moreover, the mixture of other features and emotion was reported to produce better overall performance [41].The relation of review title features with the review helpfulness has been explored by using data for 475 book reviews from Amazon.com.A model was proposed based on review content, reviewer, readability and title features.The proposed model was tested on a collected dataset of book reviews using ML algorithms i.e., Decision Tree (DT) and RandF.It was reported that the review title features were not a significant predictor of review helpfulness [42].A model based on GB algorithm was proposed to predict review helpfulness by using textual features of reviews, i.e., readability, polarity, and subjectivity.The analysis was performed on reviews related to books, baby products, and electronic products collected from Amazon.in.The results reported that textual features are a better predictor of review helpfulness [19].
Gao et al. [43] studied the consistency and predictability of rating behavior of reviewers over time along with their review helpfulness.The data collected from TripAdvisor.com was analyzed using econometric models.The results reported that the rating behavior of reviewers is consistent over time.Moreover, the reviewers that currently have higher ratings were reported to be more helpful in future reviews.The results were robust when tested over different product categories.The review content and rating were not significantly related, as reported by previous studies.A review helpfulness prediction model was developed by considering the unexplored features.The analysis was performed by collecting 1500 hotel reviews from TripAdvisor.com.The results reported that many notions in review and review type have varying impact on the helpfulness of hotel reviews [44].The classification of reviews into helpful and not-helpful was performed using 1,170,246 reviews collect from TripAdvisor.com.The ML classification algorithms used include DT, RandF, LGR, and Support Vector Machine (SVM).Accuracy, sensitivity, specificity, precision, recall, and F-measure were used to evaluate the performance.The results reported that the reviewer features were a good predictor for predicting review helpfulness in comparison with review quality and sentiment [45].
Customer reviews from Amazon.in and Snapdeal.comwere analyzed using two-layered Convolutional Neural Network (CNN) to predict the most helpful review for a given product.Three filters, namely tri-gram, four-gram, and five-gram, were used to extract the textual features for predicting helpfulness of reviews.As the study relied only on textual features, the proposed approach was reported to be flexible for predicting helpfulness of reviews for any domain.The results showed better performance for the CNN model in comparison with other ML models [23].The unexplored assumptions, i.e., star rating, equal review visibility, the constant status of review and reviewer, made in previous studies were investigated using data collected from TripAdvisor.com.The review visibility features, e.g., days since the review was posted, days review was displayed on the home page, etc. showed a strong relation with review helpfulness.The M5P showed better performance in comparison with LNR and SVR [35].Saumya et al. [22] proposed a review ranking approach based on their predicted helpfulness.The features related to review content, reviewer and product were extracted from Amazon.in and Snapdeal.comreviews.RandF was used for the classification of reviews as high-quality and low-quality reviews.Afterwards, GB regressor was used to calculate the helpfulness score of the high-quality review.The top-k reviews were ranked according to the helpfulness score, whereas the low-quality were simply added at the end.The results reported a fair ranking of reviews as the top ten review include few latest reviews along with a few previous reviews.
The impact of review numerical and textual features in predicting review helpfulness were explored by using Amazon reviews.The analysis was performed on the collected data using RandF.It was reported that the numerical features are a significant predictor of review helpfulness for all three types of reviews, i.e., regular, suggestive and comparative reviews.The review length and complexity were also a significant predictor of helpfulness.However, the relation of review complexity with helpfulness was inverted U-shaped [46].The effect of user-controlled features, along with other predictors, was investigated using reviews collected from TripAdvisor.com.The results showed varying relation of user-controlled filters with selected features.The Recency, Frequency, and Monetary (RFM) model showed consistency among all controlled variables.Moreover, the rating of review and length were reported as the most important predictors of review helpfulness [47].The impact of including RFM characteristics of reviewers on the performance of predicting review helpfulness were analyzed using data collected from Amazon.com and Yelp.com.The hybrid approach combining textual features extracted using the Bag-of-Words (BoW) model and RFM features produced best results [48].Mohammadiani et al. [49] divided reviewers into two groups based on their strength of the relationship.The analysis performed on data collected from Epinions.comshowed that the effect of review helpfulness on the influence of the reviewer is significant for high similarity.
A study introduced a Deep Learning (DL) model to understand the quality of online hotel reviews.The data collected from Yelp.com and TripAdvisor.comwere analyzed using CNN and Natural Language Processing (NLP) to explore the relation of photo provided by the user and review helpfulness.The DL models outperform the other models in predicting helpfulness of reviews.The results reported that the photos provided by the user alone are not a good predictor of review helpfulness.Moreover, combining the photos with the features of review text yielded better performance [50].The influence of reviewer profile photo on perceived review helpfulness was explored by extracting decorative and information features from photos of 2178 mobile gaming reviews collected from the Google Play store.The experimental results performed using Tobit regression model, reported that the profile photo plays a significant role in the perception of review helpfulness.However, the type of photo did not show any significant impact on review helpfulness.More interestingly, the review length moderates the relation between profile image and review helpfulness rather than review valance or equivocality [51].The textual features of the review were examined using ML models, i.e., RandF, Naïve Bayes (NB), etc., to identify the quality of hotel reviews available on TripAdvisor.The stylistic features were reported as a more important determinant of review helpfulness, however, by combining stylistic features with content features, produced better prediction results [52].
The language used by reviewers in writing product reviews varies a lot.Four stylistic features were identified and analyzed for their relationship with review helpfulness using data collected from Epinions.com.The stylistic features were reported a good predictor of review helpfulness in comparison with other features.However, it was suggested to use the stylistic features along with social features to gain better performance [53].Krishnamoorthy [5] proposed a predictive model to investigate the review features that have an impact on reviews helpfulness.The data collected from Amazon.com was analyzed using ML algorithms, i.e., NB, SVM, and RandF.The linguistic features extracted from the review content were analyzed along with readability, subjectivity, and metadata.It was concluded from results that the hybrid set of features produce better accuracy.Moreover, linguistic features were reported as a good predictor for some categories, e.g., books and games.A multilingual technique was introduced to overcome the gap of predicting the review helpfulness for reviews in languages other than English.The dataset of 4248 non-English reviews was collected from Yelp.com.The previously identified features related to review content, business and reviewer were analyzed using regression, i.e., LNR, and classification techniques, i.e., SVM [21].The analysis of scripts for predicting review helpfulness was performed with the help of human annotators that highlight the important phrases that make a review helpful.The results showed that the script enriched model gives better performance even with small training set in comparison with traditional models, e.g., BoW [54].
This research hypothesized that the cumulative helpfulness prediction using business features, along with reviewer features, give more accurate results and enhance prediction performance.Moreover, the cumulative helpfulness of a business calculated from online reviews can be used as an alternative to rank businesses efficiently and effectively.This research also explored which ML algorithm gives the best performance and which features are more important in predicting cumulative helpfulness.

Research Methodology
In this section, the stages of data collection, problem definition, feature generation and selection, modeling, and evaluation are described in detail.The research methodology of this study is illustrated in Figure 1.

Data Collection and Pre-Processing
The dataset used in this study was provided by Yelp that spans from 12 October 2004 to 14 November 2018 [55].The dataset includes information about 192,609 businesses, 6,685,900 reviews, 1,223,094 tips, 200,000 photos, check-in information of 161,950 business.In addition, the dataset also contains information about 1,673,138 users who reviewed the selected business.The dataset contains information of business for 10 metropolitan areas across two countries.The database schema created for the shared dataset is illustrated in Figure 2.This study used information from all sources, excluding tips.To generate a dataset for experimentation, firstly the user's information was mapped across each review.Then, reviews information was grouped and mapped for each business.Along with this, we generated check-in and photo count features for each business.There was a difference of three business after mapping when compared with the actual business count because no reviews were found in the reviews table for those businesses.The label H m (the cumulative helpfulness of business m) was also generated before generating the final dataset of 192,606 businesses.In the final step, we mapped features from review table and business table to final dataset having 192,606 records.The procedure of creating the dataset used in this study is illustrated in Figure 3.Each feature is discussed and described in detail in Section 3.2.The features were normalized using Z-Transformation before being used for predictive modeling.The category of each business was labeled to study its impact along with different features in predicting cumulative helpfulness and analyze the performance of ML models on different sized datasets.We created 11 sub-datasets based on the business category.The information about 12 datasets created and used in this study along with the distribution of businesses by each category is given in Table 1.

Problem Formulation and Model Features
The symbols and variables used in this paper are described in Table 2.This study aimed to profile the cumulative helpfulness of a business.Moreover, this study compared the top-k businesses based on star rating, review count and cumulative helpfulness.
In profiling the cumulative helpfulness for each business, there is a set of businesses B = {b 1 , b 2 ,. . ., b m }, a set of users U = {u 1 ,u 2 ,. . ., u n } who write the reviews, and a set of reviews R = {r 1 , r 2 ,. . ., r i }.H m denotes the cumulative helpfulness of a business m and calculated as in Equation ( 1), whereas the predicted cumulative helpfulness for business m is denoted by Ĥm .H m,i represents the helpfulness of review i for business m.
B_Stars show star rating, and it ranges from 1 to 5. The average stars received by a business is represented by B_Stars m and the star rating against a single review is given by B_Stars m,i .B_Stars n,i is the star rating given by a user to review i. m, n, and i represent the number of businesses, number of users and number of reviews, respectively.The total number of check-ins for a business B m is denoted by B_Checkin_Count m .B_Photo_Count m denotes a total number of photos uploaded for a business B m .The total number of reviews received by a business B m is represented by B_Review_Count m .Moreover, B_Activity_Len m as in Equation ( 4), B_First_Review m as in Equation ( 2), and B_Last_Review m as in Equation (3) denote the duration in days between first and last review posted for a business B m , the duration in day since the first review was posted until the data collection date, and the duration in days since the last was posted until the data collection date, respectively.
The average review count of users for business B m is represented by U_Review_Count m as in Equation (5).U_Fans_Count m as in Equation ( 6) denotes an average number of fans count for the users who have reviewed business B m .The average number of user's friends who reviewed business B m is denoted by U_Friends_Count m as in Equation ( 7), whereas U_Compliment_Count m as in Equation ( 8) is used for the average number of compliments received by users who reviewed business B m .
B_First_Review m = Data Collection Date − First Review Date (days) B_Last_Review m = Data Collection Date − Last Review Date (days) This study aimed to predict the Ĥm for any business B m using seven business features.These features include B_Stars m , B_Checkin_Count m , B_Photo_Count m , B_Review_Count m , B_Activity_Len m , B_First_Review m , and B_Last_Review m .In addition to business features, four reviewer features were also used to study the impact of the profile strength of the user who reviewed a business on predicting Ĥm for business B m .The pseudocode as in Algorithm 1 was used to map and generate features.The overview of cumulative helpfulness prediction is illustrated in Figure 4.

Modeling and Evaluation Metrics
Due to the numerical nature of all features, we selected LNR, GB, and NNet as learning methods in this study.The selection of learning models is also influenced by the use and performance of these models reported by previous studies.GB algorithm is an ensemble learning technique in which models are developed based on ensemble tree [56].For the task of predicting helpfulness of large dataset of Amazon reviews, GB showed better performance than linear regression and NNet [19,36].A NNet with three layers, namely input, hidden and output, was used for the task of helpfulness prediction showed better performance than regression methods across all datasets [38].The linear regression did not show better performance as reported by a few studies.However, it is still the most widely adopted for the task of helpfulness prediction due to its fast execution time and explanatory power when compared with other methods [21,25,[57][58][59][60][61][62].To validate the proposed models, 10-fold cross-validation was used.The evaluation metrics used in this study were Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Squared Correlation (R 2 ).Total (12 (datasets) * 3 (Learning Algorithms)) * 2 (Feature Sets) = 72 models were developed and their performances were compared in this study.

Ranking and Comparison of Top-10 Businesses
The top k businesses ranking comparison was done using the following three criteria: (a) The ranking described in Table 3 was based on the star rating and the number of reviews.(b) The ranking based on cumulative helpfulness is given in Table 4. (c) Table 5 shows the ranking of businesses according to the star rating and cumulative helpfulness.Previously, a study classified the reviews as high-quality and low-quality reviews.In addition, high-quality reviews were further ranked based on their votes count.The difference between the actual ranking of online reviews from review portals and the predicted review ranking was used for evaluation [22].However, in our case, there exists no list of online ranking of businesses to evaluate the results.To show the difference, we used the above-mentioned three criteria for ranking businesses.The quality of a business is mostly judged using a star rating.However, for ranking, we cannot rely only on star rating, as many businesses have the same star rating.Similarly, a business that has a five-star rating based on 300 or more reviews cannot be treated or ranked the same as a five-star rating based on one review.To overcome this problem, a simple solution that was adopted to rank the business is by using a star rating along with review count.The Top 10 businesses ranked using this criterion are given in Table 3.In this research, we propose the idea of cumulative helpfulness as an indicator of the quality of reviews received by a business.As shown in Table 3, out the Top 10, five businesses have cumulative helpfulness even less than the number of reviews.This raises the question of whether ranking businesses using this criterion is valid.To propose a solution to this cumulative helpfulness is only used to rank the businesses and the Top 10 businesses ranked using this cumulative helpfulness are given in Table 4.The results seem surprising as no five-star rated businesses make it into the Top 10.Moreover, only one 4.5-star company makes it into the Top 10 using this criterion, whereas the rest of the businesses have 3-, 3.5-and 4-star ratings.Ranking using this criterion also did not seem promising, as we only considered the quality of reviews received by a business and ignored the quality of the business itself.
To encounter this, the ranking of business is done based on the star rating and cumulative helpfulness.The Top 10 businesses based on this ranking are given in Table 5.The results of ranking using this criterion appears to be more promising, as it ranks businesses based on rating and quality of reviews that define the rating.It is also seen that three business from the Top 10 using the first ranking criteria also make it into the Top 10 using this criterion.It is interesting to see that the second and third places are taken by businesses from the auto category, which is ignored by the previous two ranking criteria.The first place is taken by the business from restaurant category in all rankings.

Cumulative Helpfulness Prediction
We used 10-fold cross-validation method to evaluate the predictive performance of LNR, GB, and NNet for all 12 datasets using seven business features.The prediction results of cumulative helpfulness using business features were then compared with actual values to compute values for performance metrics.To see the impact of reviewer features on cumulative helpfulness, we performed the above process using seven business features along with four reviewer features.The values of RMSE, MAE and R 2 values for each dataset are given in Table 6. Figure 5 illustrates the feed-forward NNet model with back propagation for Nightlife dataset.The input layer takes seven business, and four reviewer features as input nodes.There are seven nodes at the hidden layer, whereas one node at the output layer.The sigmoid activation function was used in forward propagation for the hidden layer.As we were solving a regression problem, the linear activation function was used for the output layer.The weights and bias were optimized by using a back propagation algorithm to minimize the cost function.In experimental results, we used RMSE value to report the performance.

Prediction Using Business Features
The results show that GB achieved the lowest RMSE for All category (0.534), Other (0.587) and Auto (0.643).The lowest RMSE using LNR was achieved for Restaurants (0.486), Shopping (0.605), Health (0.690), Arts, Entertainment and Events (0.412), Travel and Hotel (0.296) and Nightlife (0.345).The NNet showed the best performance on Home and Local Services (0.618), Beauty and Fitness (0.626) and Pets (0.564).LRN achieved the lowest RMSE (0.296) among all experiments on Travel and Hotel dataset.The LNR showed the best prediction performance for small-and medium-sized datasets.For large sized dataset, GB outperformed LNR and NNet.NNet also showed better performance for small and medium dataset,s as also seen in previous studies [38].The results are in accordance with the prediction performance of GB for large datasets in comparison with other ML Models [19,36].Overall, LNR showed the best performance over six datasets, while NNet and GB each gave the lowest RMSE for three datasets.The results show that NNet is not suitable to perform the helpfulness prediction task on large datasets.

Prediction Using Business and Reviewer Features
GB achieved the best performance for the largest dataset of All categories with the lowest RMSE of 0.530 in comparison to LNR (0.537) and NNet (0.756).In addition, GB also gave the best prediction performance for others (0.573) and Beauty and Fitness (0.619).For Restaurants (0.486), Shopping (0.608), Auto (0.623), Arts, Entertainment and Events (0.404), and Nightlife (0.337), the lowest RMSE was achieved by LNR.NNet showed the best performance for Home and Local Services (0.628), Health (0.680), Travel and Hotel (0.297) and Pets (0.595).In this experiment, GB showed the lowest RMSE for three datasets, LNR showed the best performance for five datasets and NNet gave the lowest RMSE for four datasets.The results are similar to the literature on the performance of GB over large datasets [19,36].

Impact of Reviewer Features on Performance
We explored the impact of reviewer features on the performance of each model by seeing percentage improvement in RMSE.For All categories, the best RMSE (0.530) was achieved by GB using business and reviewer features and showed an improvement of 0.75%.Restaurants RMSE (0.486) given by LNR remained the same using both types of feature sets.Adding reviewer features for Shopping decreased the performance, the best performance (RMSE = 0.605) being achieved using business features.The prediction performance for Home and Local Services also decreased by adding reviewer features.The performance of Other dataset increased by 2.39% using both business and reviewer features.Adding reviewer features gave a boost to prediction performance for Beauty and Fitness by GB (RMSE of 0.619).For Health, the prediction performance improved with RMSE of 0.680 given by NNet.By adding reviewer features, the prediction performance for Auto improved with RMSE of 0.623.The performance of Arts, Entertainment, and Events improved by adding reviewer features.
For Travel and Hotel and Pets, the performance decreased by adding reviewer features.Moreover, the performance of Nightlife increased by 2.32% using business and reviewer features.The comparison of RMSE values for Nightlife and Auto are illustrated in Figures 6 and 7, respectively.Overall, out of twelve datasets, the prediction performance of one dataset showed no change, four datasets decreased and increased for seven datasets by adding reviewer features.The highest improvement in prediction was given by NNet (8.9%) for Travel and Hotel dataset by using both business and reviewer features.The prediction performances of 36 models using business features were compared with the prediction performances of 36 models using both feature sets, as given in Table 6.The results show no change for two models, decreased performance for fifteen models and increased performance for nineteen models.We also see that, in most of the case, the model that gave the lowest performance with business features also gave the lowest performance by adding reviewer features.However, in a few cases, the best performing models also changed, e.g., GB to LNR for Auto, LNR to NNet for Travel and Hotel, LNR to NNet for Health, and NNet to GB for Beauty and Fitness.The changes in the performance of models were seen for small and medium datasets.This reflects that adding more features to small datasets can alter the performance of the model in comparison with the larger datasets.

Importance of Features
The importance of each feature related to business and reviewer varied for each dataset and model used in the experiments.However, to see the overall importance of all features in predicting the cumulative helpfulness of reviews for a business, correlation analysis was performed.Based on the correlations of features, the weights were assigned that reflect the importance of each feature.The importance of each feature is presented in Figure 8.
Among proposed business features, B_Stars was the most important feature for all datasets, as weights assigned by correlation analysis were above 0.98.The weights assigned to B_Stars were comparatively higher than other features.B_Review_Count was important for all datasets, except All Categories, Restaurant, and Nightlife.B_Checkin_Count was less significant for Restaurant and Nightlife dataset, but it appeared as an effective feature for the remaining categories.B_Photo_Count showed no importance for Pets, Auto, Health and Beauty, and Fitness, but was important for the remaining datasets.The B_Activity_Len and B_First_Review appeared to be the most effective features for Beauty and Fitness, Health and Auto, compared to their importance for the remaining datasets.The importance of these features has also been reported by previous studies in helpfulness prediction of reviews [36].Lastly, B_Last_Review appeared to be the most important feature for all datasets.
When looking at the reviewer features, we found that U_Review_Count, U_Friends_Count, and U_Compliment_Count were more significant for Pets, Auto, Health and Beauty and Fitness in comparison to their importance for the other datasets.U_Fan was an effective feature for Pets, Auto, Health and Beauty and Fitness, whereas it showed no importance for the other features.Overall, B_Star, B_Review_Count, and B_Last_Review appeared to be the most important features among all datasets.

Implications
This study has both theoretical as well as practical implications.From a theoretical perspective, the cumulative helpfulness for a business proposed and its use in ranking businesses will encourage researchers and academics to explore further the problem of predicting helpfulness from this perspective.Previously, researchers ranked online reviews and reviewers [22].However, this study paves a new way of ranking online products, businesses, and services by combining both average star rating and cumulative helpfulness of a business.The best prediction results were achieved by the combination of business and reviewer features for the majority of datasets.However, there are few datasets in which the use of both business and reviewer features reduced the prediction performance.The results of this study will encourage researchers to further explore and verify the proposed features on different datasets.This study also shows the important features for predicting cumulative helpfulness that can be used in future studies.The experimental results validated that GB shows the best performance for the large dataset.In short, this study created a whole new dimension to investigate the problem of helpfulness prediction from a business perspective that was previously ignored.Moreover, it will be interesting to study the impact of cumulative helpfulness of reviews for business on review helpfulness prediction and ranking of reviews and reviewers.
The practical implications of this study include a new criterion to rank the businesses that will be more helpful to the user in identifying the business with more quality review.Previously, the information on most of the review platforms is the average star rating and review count.The comparison of ranking strategy will encourage the review platforms to make the stats related to business more useful to viewers by adding cumulative helpfulness score.To further ease the users in searching and selecting businesses, the proposed ranking criteria can be used for ranking businesses or products based on individual category and location.This study gives insights to companies in exploring the importance of factors to make strategic changes and controlling features that will bring more quality reviews for a business.Moreover, the experimental results and performance of different ML algorithms for different sizes of datasets will also guide the practitioners in selecting an appropriate learning algorithm.

Limitations and Future Work
As with other studies, this study also has several limitations.Firstly, the cumulative helpfulness is used for ranking and prediction only for a dataset of businesses from Yelp.com.The future work will consider the application of the proposed features for businesses, products, and services from other popular review platforms and e-commerce websites.Secondly, to evaluate the review ranking, the previous studies matched it with actual online reviews of review portals.However, in this study, there exists no online ranking of business with which it can be compared.A future extension of this work will be focused entirely on the ranking of businesses where a valid measuring metric will be proposed to verify the ranking of businesses.In addition, it will be compared with the existing ranking techniques for reviewers and reviews.Thirdly, this study illustrates the use of cumulative helpfulness for businesses for all categories.However, future studies can rank a business based on the individual category and location.Fourthly, only numerical features of reviews and reviewers were used in this study.The impact of textural features of reviews received by a business in predicting cumulative helpfulness should be explored.Lastly, the use of cumulative helpfulness of reviews for a business for the ranking purpose will attract fake votes, similar to the fake reviews that affect the overall ranking.Therefore, future research should also take into consideration the detection of fake votes, along with the detection of fake reviews.The researchers can further explore and enhance the performance of predicting cumulative helpfulness by identifying and testing new features using DL models.The future studies can use cumulative helpfulness of reviews for a business along with other features to perform prediction and ranking tasks associated with crowd-sourced reviews.

Conclusions
The increased volume of reviews makes it difficult for consumers and retailers to evaluate the quality of products.The importance of reviews has encouraged researchers to model and predict the helpfulness of the reviews as it becomes a critical factor for consumers in making purchase decisions.However, helpfulness for "review" and "reviewer" has only been focused on by previous studies.This study proposed the concept of cumulative helpfulness of reviews for a "business" that make it easy for consumers and business managers to see the overall quality of reviews received by a business.The applicability of cumulative helpfulness in raking businesses was illustrated by using a real-time dataset of 1,92,606 businesses from Yelp.com.The ranking of business using star rating along with cumulative helpfulness appears more reliable.
Further, the prediction of cumulative helpfulness is performed using seven business and four reviewer features.The prediction performance for LNR, GB and NNet were compared on datasets of different size and categories.GB outperformed LNR and NNet for a large dataset of all categories, however, for small and medium datasets, LNR and NNet performed better than GB.The use of reviewer feature, along with business features, shows a significant improvement in performance for predicting cumulative helpfulness.When examining the individual importance of features, business star rating, review count and the number of days since the last review appear to be the most important features.This study will help customers and retailers to see the overall quality of reviews for a business.The review platform can rank firms in a better way using their cumulative helpfulness score.Future work can evaluate business ranking, validate the impact of proposed features for predicting helpfulness and ranking tasks and examine performance comparison with deep learning techniques.

Figure 1 .
Figure 1.Flow chart of proposed research methodology.

Figure 2 .
Figure 2. Entity relationship diagram of Yelp database.

Figure 3 .
Figure 3. Flow chart of step performed for creating the dataset.

Figure 4 .
Figure 4.An overview of cumulative helpfulness prediction.

Figure 6 .
Figure 6.Comparison of RMSE for Nightlife.

Figure 7 .
Figure 7.Comparison of RMSE for Auto.

Figure 8 .
Figure 8.The importance of each feature for all datasets.

Table 1 .
Description of datasets and distribution of businesses by category.

Table 2 .
Description of symbols and variables.U_Compliment_Count m,n compliment count of user u who reviews business m review count of user u who review business m U_Fans_Count m,n fans count of user u who review business m U_Friends_Count m,n friends count of user u who review business m

Table 3 .
Top 10 five-star businesses by review count.

Table 4 .
Top 10 businesses by cumulative helpfulness.

Table 6 .
Evaluation results, performance comparisons and impact of reviewer features on RMSE.