A Generalized Flow for B2B Sales Predictive Modeling: An Azure Machine Learning Approach

Predicting sales opportunities outcome is a core to successful business management and revenue forecasting. Conventionally, this prediction has relied mostly on subjective human evaluations in the process of business to business (B2B) sales decision making. Here, we proposed a practical Machine Learning (ML) workflow to empower B2B sales outcome (win/lose) prediction within a cloud-based computing platform: Microsoft Azure Machine Learning Service (Azure ML). This workflow consists of two pipelines: 1) an ML pipeline that trains probabilistic predictive models in parallel on the closed sales opportunities data enhanced with an extensive feature engineering procedure for automated selection and parameterization of an optimal ML model and 2) a Prediction pipeline that uses the optimal ML model to estimate the likelihood of winning new sales opportunities as well as predicting their outcome using optimized decision boundaries. The performance of the proposed workflow was evaluated on a real sales dataset of a B2B consulting firm.


I. INTRODUCTION
In the Business to Business (B2B) commerce, companies compete to win high-valued sales opportunities to maximize their profitability. In this regard, a key factor for maintaining a successful B2B business is the task of determining the outcome of sales opportunities. B2B sales process typically demands significant costs and resources and, hence, requires careful evaluations. As a result, quantifying the likelihood of winning sales opportunities at the early stages is an important basis for appropriate resource allocation to avoid wasting resources and sustain company's financial objectives and [1].
Conventionally, outcome prediction is carried out relying on subjective human ratings [2]. Most of the Customer Relationship Management (CRM) systems allow salespersons to rate the probability of winning new opportunities manually [3]. This probability is then used as a metric to calculate weighted revenue on the opportunity records. Often each salesperson develops a non-systematic rating intuition with little to no quantitative rationale, neglecting the complexity of the business environment's dynamics [4]. Besides, as often as not, selling opportunities are intentionally underrated to avoid internal competition with other sellers or overrated to circumvent the pressure from sales management to maintain a high performance [5].
Even though the abundance of data and improvements in statistical and Machine Learning (ML) techniques have led to significant enhancements in data-driven decision-making, the literature is scarce in the subject of B2B sales outcome prediction. Yan et al. explored predicting win-propensity for sales opportunities using a dynamic clustering technique [5]. Their approach allows for online assessment of opportunities win rate; however, it heavily relies on regular inputs and updates in the CRM profiles. This does not appear to be a robust source of data considering each salesperson often handles multiple opportunities in parallel and puts less effort into making frequent updates to each opportunity records [6].
In a work by Tang et al. wining probability was estimated using a hybrid ML model trained on snapshots of historical data [4]. However, aiming for the goal of standardizing this paradigm across multiple companies limited performing an extensive variable selection and feature engineering in their solution. On top of that, reliance on historical snapshots of data entails expensive computations and requires modifications to the data collection strategies for companies. Overall, despite some theoretical work in this context, the literature lacks a practical approach with a concrete business implementation.
Here, we proposed a thorough workflow for predicting the outcome of B2B sales opportunities by converting this problem into a binary classification framework. In our workflow, first, an ML pipeline extracts, cleans, and imputes sales opportunities data and then extensively trains various types of ML classification models on the data. After optimally parametrizing each model, the ML pipeline eventually outputs a voting ensemble classifier composed of these models. In addition, this pipeline enhances the data using a comprehensive feature engineering step built on statistical analyses of historical data from selected categorical attributes of sales opportunities.
A second Prediction pipeline makes use of the ML model to estimate the likelihood of winning a given sales opportunity. Importantly, this pipeline also includes a statistical analysis step that specifies appropriate decision boundaries based on industry and monetary value segmentation of the sales opportunities. This helps to maximize the interpretability of the ML model's predictions and increasing its credibility and reliability.
To demonstrate the usability of our workflow, it was implemented and deployed to a real B2B consulting firm's sales pipeline using the Azure Machine Learning Service (Azure ML) cloud-based platform. Such a cloud-based workflow allows for a more scalable solution that readily integrates into the existing CRM software applications within each enterprise. Finally, the performance of our solution was evaluated not only in terms of standard statistical measurements (prediction accuracy, AUC, etc.) but also with reference to financial measurements.

A. Data
In this work, we used sales opportunity data extracted from a global multi-business B2B consulting firm's CRM data in three main business segments: Healthcare, Energy, and Financial Services. The data consisted of a total number of 25578 closed opportunities records ( Fig 1A) out of which ~58% were labeled as "Won" in their status record (Fig 1B). The raw CRM dataset contained 20 relevant variables (features) for each opportunity. These features are categorized according to their data type as in Table 1. Once a profile is created in the CRM system for any new opportunity, users are required to enter an estimation of the probability of winning that opportunity. User-entered probabilities in the dataset were discrete values of: 0, 0.25, 0.5, 0.75, and 0.99. The "Status" of an opportunity takes one of the following values: "Open" for new opportunities, "Won", and "Lost" for closed opportunities.
First, in order to clean the data, any record with a missing "Status" was dropped. Missing values in each of other features was inferred and imputed using an appropriate ML model trained on the rest of the features (XGBoost Regressor for continuous features, and XGBoost Classifier for categorical features) [7]. Since most of the selected features were mandatory to populate for creating CRM profile for an opportunity, less than 1% of the whole dataset contained missing values.

B. Feature Engineering
To enhance the dataset, additional relevant features were calculated and added to the dataset. These additional engineered features were quantified based on three categorical features in the dataset: Sales Lead, Account and Account Location. Feature Engineering was conducted by a simple statistical analysis of these target features.
A history of the total number of opportunities and the total number of won, and lost opportunities were calculated for each unique value in the target features. Also, the win rate was determined using the ratio of won and total opportunities. Next, Total Contract Value was averaged across won opportunities for each unique feature value to record the mean contract value of won opportunities for individual accounts, sales leads, and locations. To include the extent of variability in the Total Contract Value of won opportunities, the coefficient of variation was also calculated.
The aforementioned statistics were collected and stored in feature engineering lookup dictionaries for Accounts, Sales Leads, and Account Locations (Fig 2 is an example of the lookup dictionary for Sales Lead). In the last step the Mahalanobis distance between Total Contract Value of opportunities and won opportunities was computed for each unique value of the target features to quantify how far the contract value an opportunity is from the previous won opportunities value distribution. Including the engineered features enhanced the dataset and increased the total number of features to 47 for each opportunity (20 features originally from the raw CRM data + 9×3 = 27 engineered features based on the target features). The final dataset (25578 opportunities) was randomly partitioned into a Training set (70%) and a Testing set (30%). The training set was used for the purpose of training ML models with a 10fold cross-validation technique and the testing set was used to report the testing performance of the trained models. A third Validation set was also collected after the proposed framework was deployed to the sales pipeline over a period of 3 months (846 opportunities) for further evaluation of the workflow's performance.

III. PROPOSED FLOW AND MODELING
Our approach for predicting the outcome of sales opportunities is essentially converting the problem into a supervised binary classification paradigm. Our proposed workflow involves two main pipelines: an ML pipeline, and a Prediction pipeline. A pipeline is defined as an executable workflow of data, encapsulated in a series of tasks (steps). All codes were custom-written in python using the Azure Machine Learning Service platform.

A. Machine Learning Pipeline
The main objective of the ML Pipeline was to train predictive models on the data. As illustrated in Fig 3A, the ML pipeline used raw CRM data of all closed opportunities (either won or lost). In the first step, the raw dataset was cleaned, and missing values were imputed using appropriate inference techniques. Next, the dataset was enhanced by adding the engineered features appropriately for each opportunity. For training ML models, under the supervised classification paradigm, features and class labels were extracted from the enhanced CRM dataset for each closed opportunity. All features in the raw CRM dataset were selected except for Probability (user-entered probability) to avoid biasing models with probability estimations from users. Also, Status was used as the binary class labels (won = 1, lost = 0). At this point, the dataset was also partitioned into a training set and a testing set.
Probabilistic classification models, given the feature vector of an observation X, output a conditional probability distribution over the class labels ( ∈ | ) which for the binary case = {0,1}. This probability simply corresponds to the likelihood that an observation belongs to one of the classes. The predicted class of an observation (here won/lost) can then be determined using the conditional probability of the model:

̂= argmax ( = | )
In other words, for a binary classification, the predicted class is the one with a probability of more than 50% assigned to it, we refer to this probability cutoff threshold as the naïve decision boundary. The performance of a classification model can be evaluated using metrics defined in Table 2 [8]. For binary classifications, the Receiver Operating Characteristic (ROC) curves are generated; the area under the ROC curves (AUC) quantifies the robustness of the classification (a higher AUC suggests robust classification performance) [9].
For a comprehensive insight into the classification results, we also took a step towards measuring the prediction performance in monetary terms. In particular, we aggregated the total contract values of opportunities in different scenarios of classification ( , , , and ) and defined monetary metrics with a similar formulation as the statistical metrics (Table 2). In this regard, monetary precision is the fraction of the contract value of opportunities correctly predicted as won. Also, monetary recall measures the contract value proportion of actual won opportunities that are correctly identified as such.

Table 2. Statistical and Monetary Performance Metrics
In order to train classification models on the data we used the Automated Machine Learning (AutoML) step of Azure ML. In this step, multiple iterations of various types of ML models are trained in parallel and optimally parameterized based on their classification performance [10]. We limited the models to a total number of 35 iterations from XGBoost, and LightGBM classification models [7]. The training accuracy of the models was calculated based on a 10fold cross-validation technique. All models were grouped in a Voting Ensemble [11]. A voting ensemble (Fig 4) outputs a soft-voting linear combination of individual model's probability predictions weighted based on their accuracy according to: The Azure ML platform supports deploying ML models as web services on Azure Kubernetes Service (AKS) [12]. AKS enables request response service with low-latency and high-scalability which makes it suitable for production-level deployments. In the last step of the ML pipeline, the best performing model in terms of accuracy (the voting ensemble classification model) was deployed as a web service to an AKS cluster.

B. Prediction Pipeline
The Prediction Pipeline was designed to predict the winning probability of new opportunities using the classification model found in the ML pipeline. As shown in Fig 3B, all opportunities CRM data was uploaded to this pipeline. The opportunities were filtered based on their "Status" into open (new opportunities) and closed (either won or lost) categories. Probability predictions were generated for open opportunities and closed opportunities were used to modify the decision boundaries.
First, data went through a cleaning process similar to the ML pipeline. Afterwards, the feature engineering lookup dictionaries created in the ML pipeline were used to enhance the CRM data. Note that this step also ensured the data was transformed into a format that was consistent with the data used to train the ML models. The voting ensemble model deployed in the ML pipeline was used to make predictions on the open opportunities. Specifically, the model calculated the probability that an open opportunity belonged to the class of won opportunities. We directly used this probability to infer the likelihood of winning the new opportunities. The probability predicted by the ML model, although informative, required further analyses to lead to a conclusive decision making.
In order to maximize the interpretability of predicted probabilities, Prediction pipeline generated optimal decision boundaries based on business segments and total contract values of all closed opportunities. For this, we split each business segment's closed opportunities total contract value distribution into four quantiles (4 equal-sized groups). For each contract value quantile, we found the cutoff probability decision boundary that maximized the ML prediction's precision for that quantile (defined in Table 2). A Total number of 12 decision boundaries was calculated based on business segments and contract values (4 quantiles × 3 business segment). These modified decision boundaries were used instead of the naïve boundary (50% probability cutoff) as reference points to predict the final Status.
Azure ML platform is capable of scheduling automatic pipeline runs [13]. The ML pipeline was scheduled for a weekly rerun in order to retrain ML models on an updated CRM dataset which contains an additional weeklong history of the newly closed opportunities. This also kept the feature engineering lookup tables updated according to the most recent information. The Prediction pipeline was scheduled for a daily rerun to calculate and store predictions for new opportunities and remodify the cutoff decision boundaries.

IV. RESULTS
This section gives an overview of the performance of our proposed workflow to predict the outcome of sales opportunities using statistical metrics such as Accuracy, F1score, ROC curves, etc. On top of that, the performance was measured in terms of opportunities contract values (Table 2). Finally, ML predictions were compared to the user-entered predictions.

A. Model Training Results
A total number of 35 iterations of XGBoost and LightGBM classification models were trained individually on the data and then combined in a voting ensemble model based on their training accuracy. The voting ensemble model's training accuracy based on a 10-fold cross-validation was equal to 0.82. Further performance metrics are summarized in Fig 5. Note that while training the models, the classification cutoff threshold was set to the naïve decision boundary (50% probability).

B. Modified Decision Boundaries
To predict the final status (win/lose) of an open opportunity, the ML model's predicted probability needs to be compared to a reference decision boundary. We tailored the optimal decision boundary to two features: business segment, and contract value. Modified decision boundaries for healthcare, energy, and finance segments' contract value quantiles are shown in Fig 6. Interestingly, the cutoff probabilities decrease for opportunities with higher contract values implying a more optimistic decision making for more profitable contracts. Detailed distribution plot of the predicted probabilities for all closed opportunities are outlined in Fig S1.

C. Testing Results
The voting ensemble model's predicted probabilities were used in accordance with the modified decision boundaries on the testing dataset. The ML workflow's performance on the testing set was evaluated using appropriate statistical metrics and then compared to the user-entered predictions. Note that the testing set was not used for model training. The workflow's accuracy varied across various business segments in a range of 0.82-0.87. The total accuracy of the model (0.85) was considerably higher than the user-entered predictions (0.67). All metrics are summarized in Table 3.
On the testing set, the proposed workflow resulted in a higher classification accuracy (0.85) compared to the manual user-entered predictions (0.67). Also, the monetary accuracy of our workflow (0.90) beats manual prediction (0.74). This means the probabilities estimated by the ML workflow not only predict the outcome (win/lose) of opportunities more accurately but also result in more profitable decision makings.

D. Validation Results
A second comparison between the ML workflow and userentered predictions on the validation dataset collected over three months demonstrated the effectiveness of the proposed workflow. All performance metrics were calculated based on the first snapshot of the workflow's prediction for new opportunities and their final status after being labeled as closed (implying that for any new opportunity the model's prediction was stored before the opportunity was closed and used as a training data for the model). The ML workflow retained a higher classification accuracy on the validation set (0.83) compared to user-entered predictions (0.63) while having a close monetary accuracy.

CONCLUSION
In this paper, we proposed a novel machine learning workflow for sales opportunity winning probability prediction implemented on a cloud computing environment. With our approach, sales opportunity data is cleansed, enhanced and used to train a probabilistic ML classification model. This model is then used to predict the winning probability for new sales opportunities. In addition to that optimal decision boundaries are calculated to predict new opportunities outcome.
This workflow was evaluated after being deployed to a multi-business B2B consulting firm. The ML workflow resulted in a superior overall performance compared to manual predictions made by salespersons. The proposed workflow combines the cloud-computing platform with ML algorithms for sales outcome prediction which makes it straightforward to integrate into existing sales pipelines.
It is worth mentioning that although data-driven prediction of sales outcome is more concrete than subjective estimations, it should not overwhelmingly rule out sensible or justifiable sentiments regarding a sales opportunity. A data-driven approach, such as our workflow, can provide a reliable reference point for further assessments of the feasibility of a sales opportunity.