Machine Learning Approach for Targeting and Recommending a Product for Project Management

: Conventionally, a market research and strategy for a product depends on the interviews and an explicit cluster/society to identify the customer’s needs. Customer-created information (CCI), such as call-center data, online reviews, and social media posts, provides an opportunity to recognize the customer’s needs more efficiently. Moreover, developed conventional approaches are not compatible with large CCI datasets because most of the CCI-contents are repetitive and unin-formative. In this paper, a machine learning approach for identifying the customer needs from the CCI dataset is proposed and its performance is evaluated for targeting and recommending a new product for project management. After the identification of the needs of the customer, information can be used to develop a market strategy, new product launching, brand positioning and much more long/short term planning.


Introduction
In the modern scenario, most businesses are innovations centric and fail to maintain their brand with customer's needs (CN) [1]. As per one survey, customer-focused companies gain ~60% more profit than the companies, which are not customer-centric. Generally, the specific CNs are the influencing factors (IFs) that activate the customer to buy and/or target and/or recommend the product. Therefore, with the intention of recognizing the CNs, it is essential to identify the reasons behind customer's decision and take care of their actions. To identify the CNs, a deep market research analysis needs to be carried out, which includes the following key-action reasons: (1) survey (such as call-center data, online reviews, and social media post): it helps to get the product's market value and relevant position in the market; (2) social listening: it helps to find out the customer's different value interaction with the product; (3) touchpoint maps: it helps to uncover hidden satisfiers/non-satisfiers; (4) journey maps between product and customer uses: it shows the higher relevance of the product for the customers, (5) predictive analysis: it helps to discover the unmet CNs, (6) adaptive conjoint analysis: it helps to evaluate the product valuation as per the customer, (7) personas: it helps to identify the personality of each customer segment/group, and (8) segments: it helps to identify the group/segments of customers concerning the product [2].
The CNs have to be understood first to meet the competitive market demands in an appropriate manner, which leads to a better experience by exceeding the customer expectations. The CNs are classified based on available market customers in a particular demographic region, as shown in Figure 1 in two main verticals (such as product needs and service needs). The key points are highlighted in Figure 1. There are several other key points that may be considered for product needs (i.e., functionality, convenience, experience, design, reliability, performance, efficiency, compatibility, etc.) and service needs (i.e., transparency, control, fairness, option, accessibility, etc.).

Figure 1. Type of CNs and its two verticals such as product needs and service needs.
There is a challenge for the business owner to address CNs, which focus the customer retention with good customer relationships. In this regard, several key points need to be addressed while doing market research analysis for any product. These key-points are: (1) deliver quality support for customers: customer expect real-time support (i.e., to provide real-time support: live chat, live assistant solution: live video chat/co-browsing, automatic customer support: deploy chatbot for 24/7 basis, etc.) when it is required; (2) customer journey mapping with product: to get visualization through customer touchpoints via meetings/experiences/ loyalties, (3) customer satisfaction measurement: use correct communication way to collect the CSAT (customer satisfaction score), NPS (net premotor score), CES (customer effort score); (4) high customer communication consistency: to avoid the high rate of frustration reported, (5) create a customer centric culture (CCC): try to focus customer experience first and collect all minute touchpoints from the customer; (6) improve the USP (Unique Selling Proposition) of the product: try to make good product quality, which inclines the market attention, acquisition, and/or consumption; (7) collect feedback from the customer: it is very important for a successful product business [3].
From the detailed analysis of the literature as above mentioned, most of the work is related to CNs need analysis for different market product analysis point of view (such as product opportunity, strategic analysis, product design, manage the portfolios, attribute identification, services management, etc.) in a general way [4][5][6][7][8][9][10][11][12][13] ( Table 1). The developed conventional methods for CNs analysis and their importance for product selection and targeting the new product are mentioned in [14][15][16][17][18][19][20][21][22][23] (Table 2). All these methods are highly dependent on expert skills and required analysis intervention and are time-consuming. The Customer created information (CCI) based text analysis in marketing and product development methods are presented in [24][25][26][27][28][29][30][31][32][33][34][35] (Table 3). In these methods, a huge volume of data with intangible attributes and unstructured textual data are processed, for which is required a large volume of memory and processing time. Apart from these, online product configuration, identification, and recommendation system using advanced techniques/methods are presented in [36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52] (Table 4). Apart from this, [53][54][55], they represent the recommendation system based on personal behavior, ML, and survey respectively. However, most of the above-mentioned techniques are very complex, not too easy to understand, and operation by the operator/researchers, time-consuming and required a huge volume of data storage. For all cited works, the summary of the main contribution and information of developed method along with its drawbacks are shown in Tables 1-4. Hypothesis based development of a platform for product portfolios using customer needs.
Outline platform for differentiating modules/Different [10], 2020 CNs support to recognize the variables utilized in the conjoint analysis Theoretical information with the basis establishment on a conjoint application.
Model based on conjoint analysis, i.e., traditional, adaptive, choice, partial profile choice, and its recommendations.
Conjoint Analysis for design and pricing/Different [11], 1998 CNs investigation enhance the available products and its related services Managerial implications and the consequences of the application for two different ski industry.   Table 3. Summary of the Customer created information (CCI)-based text analysis in marketing and product development methods.

Ref. and Year Type of Application Main Contribution and Characteristics Analyzed Method/Remarks
Recommender System/Similarity with Our Approach/ [24], 2016 Unstructured textual data (UTD)-based address of the managerial questions Sentence based text analysis for review data instead of word based for hotel and restaurant industry only Introduces a "bag-of-sentences". Analyze the latent Dirichlet allocation (LDA) method.
Text analysis model/Different [25], 2012 Introduce the state-of-the-art for UTD based analysis in the following areas: "Online Product Opinions: Incidence, Evaluation, and Evolution", "Network Characteristics and the Value of Collaborative UGC", "Evaluating Promotional Activities in an Online Two-Sided Market of UGC", etc.

User-Generated Content (UGC) based impact analysis
Model for Household, travel, hotel and service industry/Different (only state-of-the-art) [26], 2011 Analyze UTD word groupings with linking them to sales, sentiment, or product ratings Decomposing textual reviews into segments describing different product features for two different groups of products (digital cameras and camcorders) Analyze the clustering techniques for different products.  The main aim of this study is to develop a type of solution which is simple in operation with less expert skill and requires minimal operator intervention, providing the recommendations online by using customer-created information (CCI), such as call-center data, online reviews, and social media post. The proposed solution provides an opportunity to recognize the customer needs more efficiently to select the new product for the project management.
The organization of the paper is as follows: the introduction about the problem and its associated work is represented in Section 1. The proposed approach and methodology are represented in Section 2, which includes detailed information and mathematical modeling of the neural network-based recommendation method. In Section 3, the demonstration and discussion of the results are presented. Finally, the conclusion followed by the future scope of the study is represented in Section 4.

Methodology and Proposed Approach
The machine learning-based proposed approach for targeting and recommending a new product in the market for project management is shown in Figure 2, which is the marriage of seven parts, as follows: (1) PART-1: Acquisition of the review dataset of the market related to a brand/product; (2) PART-2: Import of the intelligent data analytics libraries; The proposed approach is totally different and more advanced (based on CCI dataset only) than the model mentioned in the literature [56][57][58]. In [56], the author represents a state-of-the-art related to software project management using ML techniques based on Web Science, Science Directs, and IEEE Explore. A total of 111 research papers have been categorized into four categories. In [57], the author represented the digital twin-based ML application for product optimization in the industry. In [58], a statistical and ML approach based on the two Bass models is developed for forecasting the demand. The main advancement of the proposed approach is that these three ML approaches [56][57][58] are not based on CCI datasets.
In this study, several software/emulation platforms are required to develop the different types of algorithms and methods for distinct operation with the huge volume of the product's review dataset (Amazon dataset of "Cell Phones and Accessories": 10,063,255 reviews for 590,269 products). Therefore, several libraries (i.e., import pandas, import sys, import string, import numpy, import sqlite3, import seaborn, import os, import nltk, import json, import csv, import re, import WordCloud, import matplotlib.pyplot, #nltk.download('punkt'), #nltk.download('wordnet'), etc.) are imported in PART-2 to perform the big-data analysis/analytics (BDA). Generally, BDA is the utilization of the advanced analytic algorithms/techniques for diverse, very large data sets, which include unstructured and/or semi-structured, and/or structure data of different size from different sources [59]. After imported these libraries, data analysis is performed in PART-3 to know more about the reviews dataset, such as (1) to know about datasets (i.e., how many rows and columns, count, name of variables, total reviews, rating, null objects, std, min, max, mean, brands, products, prices, type of data, review's text, etc.); (2) total brand names, products, and number; (3) brand name having min. and max. ratings; (4) brand having a maximum number of reviews; and (5) brand having maximum rating/review ratio, etc.
The data pre-processing (PART-4) is used to clean and make the data useful (wellformed data sets), which includes the four main functions such as removing the special characters Now, the clean and useful data (obtained from PART-4) are used to develop the machine learning-based targeting and recommending a new product. In this stage, three main functions are performed: (1) select cleaned-up actual review text and its corresponding rating data; (2) create a feature vector for classification; (3) create training and testing feature vectors; (4) develop the AI/ML model; and (5) perform the training, testing, and validation for the developed model to visualize its performance for targeting and recommending a new product in the market for project management. In the process of feature vector generation, the independent variables or features are extracted from the review text. The following stepwise procedure is performed to extract the features: (1) implement n-grams to generate features, (2) create bag-of-words (BoW), and (3) wse bag-of-words for generating the training and testing data files for the classifier model. The implementation of n-grams for feature extraction has more advantages (i.e., retain the structure and analyze the word in context) than disadvantages (i.e., the dimension of the feature vector is extremely large). Moreover, the featurization of the data can be performed by three methods (BoW, TF-IDF, and WEM: word embeddings model). The implementation of the BoWs procedure is similar to the count-based representations of the review data. Once BoWs are created, the feature vector is divided into training and test datasets, which interpreted the positive or negative trend of the market product. Apart from the BoW method, the TF-IDF (Term Frequency-Inverse Document Frequency) method can be featured in the review data, which is highly dependent on the normalized TF (Equation (1)) and IDF (Equation (2)) as given below:

number of times a word appears in a document Normalized TF the total number of words in that document
log the arithm of the number of the documents in the corpus IDF the number of documents where the specific term appears Moreover, both (BoW and TF-IDF) methods are not adept at capturing the context of a word in the document. Therefore, WEM is implemented to capture the context of a word in a document and its relation with other words. This method is widely used for an imagebased approach for targeting and recommending a new product in the market. After feature vector extraction, a neural network (NN) based classifier is designed to classify the positive and negative recommendations, as mentioned in PART-6 of Figure 2. The mathematical implementation of a three-layered NN architecture is given below [60,61]: Let us assume the input data [ , ] T f f f = for the two-class problem of positive and negative recommendations of the product in the market. The weight matrix at input-hidden and hidden-output layer is 1   (3) Modeling at output-layer for fth neurons: The performance of the NN highly depends on the number of hidden layer neurons, which are chosen by the hidden-and-trial method, or it can be evaluated by Equation (10). But Equation (10) The overall performance accuracy of the NN model is evaluated by Equation (11)

Dataset Used for the Performance Demonstration
The performance demonstration and validation of the proposed approach is represented by using Amazon review data (2018) [62] of 233.1 million reviews, which include the reviews of the product from May 1996 to October 2018. The metadata have the following basic information for each review of the product: product information consists of the product package type (electronics/hardcover etc.), size (black/white), color (small/large), and product image. The complete review data include the reviews for the following brands/categories: (1) Amazon Fashion data; (2) All Beauty data; (3) Appliances reviews data; (4) Arts, Crafts and Sewing data; (5) Automotive reviews data; (6) Books reviews data; (7) CDs and Vinyl reviews data; (8) Cell Phones and Accessories reviews data; (9) Clothing Shoes and Jewelry reviews data; (10) Digital Music reviews data; (11) Electronics reviews data; (12) Gift Cards reviews data; (13) Grocery and Gourmet Food reviews data; (14) Home and Kitchen reviews data; (15) Industrial and Scientific reviews data; (16) Kindle Store reviews data; (17) Luxury Beauty reviews data; (18) Magazine Subscriptions reviews data; (19) Movies and TV reviews data; (20) Musical Instruments reviews data; (21) Office Products reviews data; (22) Patio, Lawn, and Garden reviews data; (23) Pet Supplies reviews data; (24) Prime Pantry reviews data; (25) Software reviews data; (26) Sports and Outdoors reviews data; (27) Tools and Home Improvement reviews data; (28) Toys and Games reviews data, and (29) Video Games reviews data.
In this study, cellphone review dataset is used to demonstrate the result and model performance. The reviews data related to the Amazon mobile is segregated and utilized further in the study.

Case Study: Analyzing the Cellphone Market Status
To analyze the cellphone market status for a particular brand and/or product, the amazon product reviews on cellphone data are used, which include the information about the brand, rating, prices, categories, dateAdded, dateUpdated, keys, manufacturer, reviewer id, reviews text, ASINs (Amazon Standard Identification Number), title, URL, image, etc. Firstly, the following analysis is performed for targeting and recommending a cell phone for project management as given below: (1) to identify the total number of available brands in the market, (2) to analyze and identify the maximum and minimum rating of each brand, (3) to analyze the number of reviews, (4) to evaluate the MRR (ration between maximum rating and review), (5) to identify the brand which has a maximum and a minimum number of reviews.
The data analysis to know more information about the data and its properties is represented in Table 5 and Figures 3-18. The table represents the information related to brands, the number of items/products per brand, product rating information, and the total number of reviews for each brand. Figure 3 shows the percentage (%) rating of the available brands in the market, which indicates that Xiaomi cell phone has the highest percentage rating in the market. Figure 4 represents the analysis of the average rating provided by the customer for each brand in the market. Xiaomi has the highest average rating in comparison with other brands because people like it due to its low price and high feature value and it being a very popular brand. The average number of reviews for each brand is shown in Figure 5, which shows that Google mobile has the highest number of average reviews, which indicates that Google has the maximum average reviews per product because of people writing a lot of reviews for google products. Moreover, Figure 6 shows the brand-wise and rating-wise number of products in the market for review, whereas Figure 7 represents the brand-wise and rating-wise number of total reviews of products in the market. The highest number of products in the market is launched by Samsung and hence highest reviews are written for Samsung by the customer, followed by the Apple brand. In Figure 8, it is analyzed that the product which has the highest number of reviews (984) is Google cell phone (Google Pixel XL, Quite Black 32 GB), which is followed by Samsung Galaxy S4 (White 16 GB) with 980 reviews.
The MMR value represents the significance/importance of the brand/product, which is liked by the customer at a higher priority in the market. Figures 9 and 10 represent the MMR analysis of the product and brand, respectively. Figure 9 shows the product-wise evaluated MRR value as per product rating for all cellphones in the market, whereas Figure 10 shows the brand-wise evaluated MRR value as per product rating for all cellphones in the market.
The word-cloud analysis of the review data based on CCI is very important, representing the graphical representation of the used word in the review/dataset as per its importance level. The higher size level of the word in the plot represents the high importance among all and vice versa. As per the received total number of reviews for cellphone brand, the word-cloud plot analysis is presented in Figure 11, which shows that Samsung has a higher number of the reviews. Similarly, Figure 12 shows the word-cloud plot analysis for brand and its associated total number of items, which recognize that Samsung has a higher number of products in the market. Moreover, Figure 13 shows the word-cloud plot analysis for all products (available in the dataset) and its total number of reviews. The product B00HWEJJSQ (asin) has the highest number of reviews provided by the customers. The word-cloud plots as per the rating of the product are represented in Figures 14-18 for the words with review scores one to five, respectively. These figures recognize the exact word utilized by the customer in the review text. For example, the word "One Star" is used repeatedly (maximum time) by the customer to review the product with a rating of "1". Similarly, the word "Two Stars" is used again and again by the customer in the reviewtext of the product with a rating of "2". Generally, "1", "2", and sometimes "3" rating products have negative reviews. The utilized word-to-do review can be seen in the wordcloud plots in Figures 14-16. Apart from the highlighted word "One Star", "Two Stars", and "Three Stars", there are several key-words (i.e., not unlocked, not new, disappointed, do not buy, avoid, junk, waste of money, defective, not good, terrible, bad battery, bad phone, poor battery life, charging issue, do not recommended, battery life, not great, etc.), which represents the negative responses of the customers (as shown in the word-cloud plot [14][15][16]. With the good responses and rating for the product, the word-cloud plots are represented in Figures 17 and 18 for review scores 4 and 5, respectively. The most frequently used words are "Four Stars" and "**" followed by several good words such as Good phone, great, OK, I like, good value, love it, battery life, excellent, satisfied, great value, price, decent, great deal, value for money, best phone, good product, best phone ever, better than expected, worth of money, works good, works great, wow, etc. (as shown in the word-cloud plots 17-18).                 In summary, the following points need to be highlighted for a product review data analysis: -Seven brands (Apple, ASUS, Google, Motorola, Samsung, Sony, and Xiaomi) have a maximum rating of 5. The brand that has the maximum number of reviews is Samsung with 41,660.

-
The Samsung brand has a maximum number of items in the market with 397.

-
The maximum value of the "rating/review" ratio tells about the good brand that can be recommended. The following brands have maximum ratios: Samsung, Sony, Xiaomi, ASUS, Motorola, and Google.
In the machine learning-based data analysis of CCI customer reviews for a product/brand to identify the positive and negative responses and/or recommendation using available keywords/words in the review text dataset, the NN model is implemented. The developed NN model for classification of the positive/negative responses is shown in Figure 2, and its validation analysis for the training and testing phase is represented in Figures Figure 19, which shows the training error with respect to (w.r.t) the number of epochs. The epoch is the number of training of NN with all training datasets. In this case study, the total number of epochs is 142. Most of the time, the error value reduces w.r.t increase in epochs, but it may also be increased due to the overfitting of the model. In this case, the error is decreased, obtaining the best performance of 0.015816 at 136 epochs.
The error histogram plots the error value between true and predicted values after training the model, which indicates the level of difference. The error histogram analysis plot for the training phase of the NN model is represented in Figure 20 with 20 bins. The bins are the number of utilized bars in the histogram to represent the error value. In this case study, most of data samples (at the y-axis) are lies near to zero-error line (orange color) (at the x-axis), which shows high-performance validation of the model.
The summary of the predicted results from NN model is represented in Figure 21 in the form of a confusion matrix obtained during the training phase of the model. In this Figure, the number of incorrect classifications (IC) and correct classifications (CC) are summarized along with the false classifications of each class (evaluated by Equation (11)). In this case study, the overall classification accuracy of the model is 98.8% during the training phase. Moreover, the plot training state shows the training statistics of the model, which is shown in Figures 22 and 23 for gradient search-based learning curve and validation check, respectively.
The receiver operating characteristic (ROC) is another way to measure the performance of the model to show the second level of validation, which plots both true-positive rate (TPR) and false-positive rate (FPR) for each class (as shown in Figure 24 for training phase). Figure 24 shows the high identification accuracy for the review data analysis. The AUC (Area under the ROC Curve) evaluates the performance of classification thresholds.         Finally, the performance plots for the testing phase are represented in Figures 25-27 for the performance confusion matrix analysis, ROC curve and error histogram plot, respectively, which shows high identification accuracy of 98.4% with a ROC of 1. The error histogram shows that most of test data lie near to zero error line (shown in orange color), which means the model has high prediction accuracy in the testing phase. Hence, the developed model may be used to analyze the recommendation for other products in the market.

Conclusions
The market research and strategy for a product launch are recommended on the bases of customer interviews, call-center data, social media posts, targeted society, and their needs. In this study, a machine learning approach for targeting and recommending a product for project management in the market has been proposed using customer-created information (CCI) such as online reviews. The performance of the proposed approach is validated and demonstrated by using the real-market reviews dataset of the cell phone industry, which is obtained from the Amazon review dataset 2018. The demonstrated results show the high-performance accuracy of 98.8% and 98.4% during the training and testing phase of the proposed model, respectively. From the analysis, the following result outcomes have been pointed out for targeting and recommending a cellphone product in the market: (1) seven brands (Apple, ASUS, Google, Motorola, Samsung, Sony, and Xiaomi) have a maximum rating of 5; (2) four brands (Apple, Motorola, OnePlus, and Samsung) have a minimum rating of 1; (3) Google cellphone (Google Pixel XL, Quite Black 32GB) has a maximum number of reviews with 984; (4) the brand having a maximum number of reviews is Samsung with 41,660, (5) the Samsung brand has a maximum number of items in the market with 397, and (6) the Maximum value MMR tells about the good brand that can be recommended to the customer. The following brands have maximum MMR: Samsung, Sony, Xiaomi, ASUS, Motorola, and Google.
In the future, the scope of the proposed approach is to implement targeting and recommending other products for project management such as fashion and beauty products, appliances, arts, crafts and sewing, automotive, books, CDs and Vinyl, clothing shoes and jewelry, digital music, electronics, gift cards, grocery and gourmet food, home and kitchen, industrial and scientific, kindle store, magazine subscriptions, movies and TV, musical instruments, office products, patio, lawn and garden, pet supplies, prime pantry, software, sports and outdoors, tools and home improvement, toys and games, video games etc.