Geo-Marketing Segmentation with Deep Learning

: Spatial clustering is a fundamental instrument in modern geo-marketing. The complexity of handling of high-dimensional and geo-referenced data in the context of distribution networks imposes important challenges for marketers to catch the right customer segments with useful pattern similarities. The increasing availability of the geo-referenced data also places more pressure on the existing geo-marketing methods and makes it more difﬁcult to detect hidden or non-linear relationships between the variables. In recent years, artiﬁcial neural networks have been established in different disciplines such as engineering, medical diagnosis, or ﬁnance, to solve complex problems due to their high performance and accuracy. The purpose of this paper is to perform a market segmentation by using unsupervised deep learning with self-organizing maps in the B2B industrial automation market across the United States. The results of this study demonstrate a high clustering performance (4 × 4 neurons) as well as a signiﬁcant dimensionality reduction by using self-organizing maps. The high level of visualization of the maps out of the initially unorganized data set allows a comprehensive interpretation of the different clusters and patterns across space. The centroids of the clusters have been identiﬁed as footprints for assigning new marketing channels to ensure a better market coverage.


Introduction
Marketing literature considers the segmentation, targeting, and positioning (STP) as key pillars of all marketing strategies [1][2][3]. The purpose of market segmentation is to identify relatively homogeneous groups of consumers with similar consumption patterns. A market segment has four components: (1) it must be identifiable, (2) it must be economically reachable, (3) it is more homogeneous in its characteristics than the market as whole, and (4) it is large enough to be profitable [4]. Customer demand should be considered as a basis for determining the channel structure. The targeting decision, when applied to channel design, entails a choice of whom not to pursue just as much as what segment to pursue. Targeting a channel segment means choosing to focus on that segment, with the goal of achieving significant sales and profits from selling to it [5]. Segmenting a market is not free. There are costs of performing the research, fielding surveys, and focus groups, designing multiple packages, and designing multiple advertisements and communication messages. Inadequate segmentation and clustering problems could lead to missing the strategic marketing opportunity or not cashing in on the rewards of a tactical campaign [6]. From a practical perspective, a lack of well-defined clusters and clear responsibilities often lead to cannibalizing and overlapping in the sales territories. This can jeopardize the performance of the marketing channels and can often create severe channel conflicts.
The customer data usually contain geographic information which should be considered for clustering the data to create the customer segments and thus to decide where the marketing channels need to be located. Clustering of geo-referenced (or spatial) data has become more popular in geo-marketing, however, the traditional clustering methods reveal various limitations in view of the increased requirements for accuracy and predictability. The more accurately and homogenous the customer segments are conducted, the more successful differentiated targeting through the appropriate channels will be.
Collectively, geospatial data available from several sources have grown into petabytes and increase by terabytes in size every day [7]. The increase in the sources of data and their acquisition have been exponential as compared to the development of processing systems which can process the data in real time [8]. Earlier, storage of data was costly, and there was an absence of technology which could efficiently process the data. Now, the storage costs have become cheaper, and the availability of technology to transform Big Data has become real [9]. The volume of geospatial data at the global scale (e.g., at the petabyte-scale) exceeds the capacity of traditional computing technologies and analytical tools designed for the desktop era. The velocity of data acquisition (e.g., terabytes of satellite images a day and tens of thousands of geotagged tweets a minute) pushes the limits of traditional data storage and computing techniques [10].
Most commercially available GIS provides extended functionality to store, manipulate, and visualize geo-referenced data, but rely on the user's ability for exploratory data analysis. This approach is not feasible with regards to the large amount and high dimensionality of geographic data and demands for integrated data mining technology [11]. The high dimensionality of a dataset can cause serious problems for most analysis methods. One typical problem to address is that it is unlikely for all variables to interrelate meaningfully. Most analysis methods limit or compress the potential hypothesis space by assuming a simple form of pattern, which can be configured with several parameters. For example, a regression analysis assumes a form of pattern (normally a linear form) and uses data to configure its parameters (e.g., coefficients) in relation to this form. However, the number of possible patterns, which can be of various forms, is practically infinite in a multivariate spatial dataset. Patterns can be linear or non-linear, spatial, or non-spatial, with different configurations [12]. In the real world, we encounter problems that exhibit non-linear correlations among the actions. They require complex and nonlinear models. To overcome this problem, a new technique called deep learning was proposed. This technique introduces non-linearity into the network [13]. The applicability of deep learning in business is a research field that intends to research different models that learn patterns from data in a supervised or unsupervised manner. The field presents high interest both for practitioners, as for researchers. Deep learning relates to artificial neural networks with multiple hidden layers, convolutional neural networks, recurrent neural networks, self-organizing maps, Boltzmann machine, and auto encoders [14]. This paper investigates deep learning approaches to spatial clustering and identifies the unsupervised self-organizing maps (SOM) to perform an optimized customer segmentation which should provide a comprehensive visualization of the different customer groups with similar patterns by considering their geo-location. Although some previous research papers applied the SOM algorithm to solve complex clustering tasks, its big advantage in a comprehensive visualization and knowledge acquisition in the geo-marketing has not been fully exploited. A key contribution of this research is the integration of a deep learning unsupervised approach into the channel management environment to enhance the visualization of the homogenous clusters and enable a deep learning approach for the customer segmentation problem. The empirical results of this research demonstrate the importance of using geo-marketing intelligence and visualization in the strategic decision making of manufacturers within the B2B industrial automation market in the US.

Industrial Market Segmentation
B2B markets are considerably more challenging than consumers markets and demand specific skills from marketers. Buyers, with a responsibility to their company and specialist product knowledge, are more demanding than the average consumer [15]. Buyers of B2B products and services often need to deliver a return on investment for their purchase, highlighting the more complex nature of B2B purchases [16]. In the industrial markets, often the same industrial products have multiple applications; likewise, several different products can be used in the same application. Customers differ greatly, and it is hard to discern which differences are important and which are trivial for developing a marketing strategy [17]. Segmenting industrial markets is different and more challenging because of greater complexity in buying processes, buying criteria, and the complexity of industrial products and services themselves [18].
Industrial market segmentation is a decision process that enables a firm to effectively allocate marketing resources to achieve business objectives. The decision process seeks to implement the major tenets of the marketing concept-to define an offering (products and services) that meets the needs of target buyers, while recognizing the behaviors of competitors and other stakeholders that define the market. While there are several decisions to be made in the process of segmentation, they revolve around the identification of groups of potential organizational buying centers that within each group are similar in response to a marketing program, and between-groups are different in their response [19]. The decision to use B2B market segmentation has three critical components: (1) the market uncertainties faced from the outcome of a situation analysis; (2) the importance of the marketing decisions contemplated; and (3) the organization's readiness to embrace segmentation [20].
The goal for every industrial market segmentation scheme is to identify the most significant differences among current and potential customers that will influence their purchase decisions or buying behavior, while keeping the scheme as simple as possible. This will allow the industrial marketers to differentiate their prices, programs, or solutions for maximum competitive advantage [21]. Marketers should evaluate a myriad of descriptive characteristics when identifying and selecting business market segments. The critical issues can be understood by analyzing geodemographic attributes or firmographics. These segmentation bases provide important decision-oriented insights about high-tech, industrial, and service markets [22]. Industrial market segmentation is currently primarily based on geographics and demographics [23]. However, this leaves industrial suppliers unsatisfied, for segmentation of the market into homogeneous groups regarding buying behavior has proved to be very difficult based on these criteria [21]. Industrial marketers can hardly be blamed for feeling that segmentation is very difficult for them. Not only has little been written on the subject as it affects industrial markets, but such analysis is also more complex than for consumer markets. The problem is to identify the best variables for segmenting industrial markets [17]. According to Webster, segmentation variables are customer characteristics that relate to some important difference in customer response to marketing effort [24]. The selection of segmentation variables should be based on such conditions as measurability, substantiality, accessibility, and actionability [25].
Johnson and Flodhammer [26] investigated how market segmentation in industrial Swedish firms is applied. They proposed a model for identifying variables to measure and group industrial customers in meaningful segments. In their study, they addressed two major problems that should be considered in industrial market strategy: (1) Is there a need for segmentation? If so, what conditions should be met? (2) How to identify segmentation variables-useful and relevant to evaluating industrial markets?
The main conditions are summarized as follows: Market segmentation is appropriate for industrial firms when one of the following conditions is available; the more heterogenous the product assortment, the greater the need for market segmentation; the more the customers differ on buying strategy, the greater the need for segmentation; the more heterogenous the market, the more the need for segmentation; the more the environment changes, the greater the search for expansion into market opportunities, the greater the reason for segmentation concepts. With regards to problem 2, five major criteria are proposed for identifying market segmentation variables-technological, economic, market, competition, and organizational characteristics. Finally, in their study, they suggest a model with segmentation variables suited to industrial markets. These variables include product/process, application (field use), branch (SIC), market size, customer location, buying Businesses 2021, 1 54 process, buying process, buying center, previous relations of seller to buyer, and end-user (environment).
Plank demonstrated in his review of the industrial segmentation literature, that there are three approaches for selecting segmentation bases: (1) unordered segmentation notions (a single segmentation dimension is chosen with no specific rationale for how it was selected), (2) two-step notions (such as the macro-micro-segmentation), or (3) a multistep approach (such as nested approach) [27].
Wind and Cardoza indicated that, while the concept of marketing strategy differentiation is widely accepted among industrial firms, there is little evidence to suggest that firms do follow a conscious segmentation strategy to plan or control their marketing activities. They proposed a two-stage approach to industrial segmentation that consists of macrosegments and microsegments. They explained that macrosegments consist of key organizational characteristics such as size of the buying firm, SIC (Standard Industrial Classification) category, geographic location, and usage factors. The second stage involves dividing those macrosegments into microsegments, based on characteristics of decision-making units [28].
Bonoma and Shapiro [29] proposed general guidelines for segmenting industrial markets following a nested approach. Specifically, they distinguished five general categories of segmentation variables: (1) Demographics-industry, company size, and customer location.
The "Nested Approach" is still applicable to industrial markets today in the twentyfirst century. It is still relevant, but changes are taking place [30].
Dibb and Simkin [31] argued that one of the major problems associated with segmentation in B2B markets is a failure of businesses to implement plans. They mentioned that there are three main reasons for this failure: (1) Infrastructure barriers: (This concern the culture, structure, and resources which can prevent the segmentation process from starting or being completed successfully).
(2) Process barriers: These barriers reflect a lack of experience, guidance, and expertise concerning the way in which segmentation is undertaken and managed. (3) Implementation barriers: These are practical barriers concerning a move to a new segmentation model.
However, implementation is not the final step, because the company still needs to react to dynamic internal and external changes. Little attention has been given to the strategy and implementation phases. There have been many contributions in the areas of dynamics, even though it is recognized as an important area [32]. According to blocker and Flint, markets and market segments can be unstable over time, and there must be at least a conceptual understanding of this concern, if not the methodological rigor in tracking the existence of a segment structure of the market [33]. Wind and Thomas suggested using both an "interactive research approach" to measure changing responses to marketing stimuli and a "panel survey" to assess the changing segment structures regarding products [19].
When the segmentation is finished and the strategies are created, it is important to realize what segment factors are critical for the company's success, and they should be continually monitored. Some of the critical success factors are likely to be the criteria that were critical in the identification of the segments. Others could be critical assumptions made while developing the strategy, for example, the technology. Other critical success factors could include the customer needs, relationships, technological development, and competitor offerings and moves [32].
Business segmentation gained credibility and acceptance led by research by Bononma and Shapiro and Plank. Despite this progress-more than 30 years-B2B segmentation is often misunderstood or poorly utilized by marketers [34]. With regards to the methods used for the industrial market segmentation, the literature considers a wide variety of techniques. Statistical models have been widely used for customer clustering [35]. Descriptive methods such as logistic regression or cluster-based techniques face serious limitations. Kotras demonstrated in his study how predictive algorithms can increase the performance of customer segmentation [36].
Machine learning and artificial intelligence have clear advantages over traditional statistical methods when: (a) there are a multitude of variables available for analysis, (b) the associations between the variables are uncertain (and likely to be highly complex), (c) the values of each variable are evolving constantly (such as in the case of a GPS), and (d) when understanding correlations between variables are more important that causation. The great strength of machine learning models is in making predictions, especially where an atheoretical prediction will work well. This is the reason that machine learning models are evaluated on criteria such as scalability, real-time implementation, and cross-validated predictive accuracy rather than on internal and external validity and theoretical foundations which are more suited to the traditional models [37]. Artificial intelligence and machine learning in marketing science are currently gaining more importance to leverage predictive segmentation [38,39].

Artificial Neural Networks
Neural networks learn to do tasks with progressive improvement in performance, by considering examples, generally without task-specific programming. They have found use in applications difficult to express in a traditional computer algorithm using rule-based programming. The original goal of the neural network approach was to solve problems in the same way that a human brain would [40]. An artificial neuron is a basic building block of every artificial neural network. Its design and functionalities are derived from observation of a biological neuron that is the basic building block of biological neural networks. An artificial neuron can be represented ( Figure 1) with its inputs, weights, transfer function, bias, and outputs [41]. An artificial processing neuron receives inputs as stimuli from the environment, combines them in a special way to perform a 'net' input, passes that over through a linear threshold gate, and transmits the (output) signal to another neuron or the environment [42]. The artificial neuron model can be seen in its mathematical description: where: • ( ) is input value in discrete time where goes from 0 to , • ( ) is weight value in discrete time where goes from 0 to , The ANNs are also called feedforward neural networks, and recently, deep networks or learning [43]. People began referring to it as "deep" when it started utilizing 3-5 layers a few years ago. Now, networks with more than 200 layers are commonplace [44]. The The artificial neuron model can be seen in its mathematical description: where: The ANNs are also called feedforward neural networks, and recently, deep networks or learning [43]. People began referring to it as "deep" when it started utilizing 3-5 layers a few years ago. Now, networks with more than 200 layers are commonplace [44]. The notion of deep learning refers to an artificial neural network mode that has multiple layers. Studies have shown that a multi-layer artificial neural network is capable of deep learning, namely, is able to model any non-linear relationship in a system [45]. Deep learning is a subset of a more general field of artificial intelligence called machine learning, which is predicated on this idea of learning from example [46] where learning happens through multiple learned layers of neurons [43]. The architecture or topology of the ANN describes the way the artificial neurons are organized in the group and how information flows within the network [47]. According to Haykin [48], there are three different classes of network architectures: single-layer feedforward networks, multilayer feedforward networks, and recurrent networks: The single-layer feedforward network is the simplest type of neural network. It consists of one output unit and two input units with no hidden layers; thus, it is also known as single-layer perceptron [9]. In a layered neural network, the neurons are organized in layers. One input layer of source nodes projects directly onto an output layer of neurons, but not vice versa. This network is strictly of a feedforward type [49]. The second class of feedforward neural network distinguishes itself by the presence of one or more hidden layers, whose computation nodes are called hidden neurons. The term "hidden" refers to the fact that this part of the neural network is not seen directly from either the input or output of the network output in some useful manner [48]. Among the main networks using multilayer feedforward architectures are the multilayer perceptron (MLP) and the radial basis function (RBF), whose learning algorithms used in their training processes are respectively based on the generalized delta rule and the competitive/delta rule [50]. In the recurrent network, at least one neuron connects with another neuron of the preceding layer and creates a feedback loop. This type of neural network consists of a self-connection between the neurons of the hidden layer. This functionality provides them with a temporary memory. As a result, activation of the following value is forwarded from the lower layer as well as its previous activation value to the hidden layer neuron [50]. Recurrent networks are used to process time varying data, predict future values, classify time series, predict system behavior, and so on [51].

Learning Algorithms
The performance of a neural network depends to a significant extent on how well it has been trained, and not the adequacy of assumptions concerning the statistical distribution of the data, as is the case with the maximum likelihood classifier [52]. Only after training does the network become operational, i.e., capable of performing the task it was designed and trained to do [53]. Learning or training is one of the most important characteristics of artificial neural networks. The literature on deep learning recognizes two main types of learning algorithms, either supervised or unsupervised.
In supervised learning, the neural network is trained on a training set consisting of vector pairs. One of these vectors is used as input to the network; the other is used as the desired or target input. During the training, the weights of the NN are adjusted in such a way as to minimize the error between the target and the computed output of the network [53]. Though it is biologically implausible, backpropagation learning is the most popular learning rule for performing supervised learning tasks. It is not only used to train feedforward networks such as MLP but also adapted to RNNs [54]. Backpropagation, or back-prop as it is often called, is a more complex way of learning. It reduces the error between the actual output of the network and the desired output by changing the connection weights and biases in such a way that they move slowly toward the correct values [55].
In unsupervised or self-organized learning, there is no external teacher to supervise the learning process. In this type of learning, no specific examples are provided to the network. The desired response is not known, so explicit error information cannot be used to improve network behavior. Since no information is available concerning correctness or incorrectness of responses, the learning process must somehow be accomplished based on observations of responses to inputs that have very little or no knowledge about [56]. The neurons compete to match the input as closely as possible, usually based on Euclidean distance. The neuron closest to the considered input exemplar in the winner taking it all, i.e., adjusting its weight to improve its position and thus move closer to the input [57]. Unsupervised data learning involves pattern recognition without the involvement of a target attribute. That is, all the variables used in the analysis are used as inputs and because of the approach, the techniques are suitable for clustering and association with mining techniques [58].

Literature Review on Spatial Clustering
The existing literature considers the clustering and segmentation as key activities in geo-marketing. Clustering is the technical process for unsupervised grouping, while segmentation is the application of creating segments of customers or markets. Thus, clustering can be used to segment consumer groups. Clustering helps firms to identify meaningful customer segments, allowing them to target defined groups rather than having to customize for each individual customer [59]. Clustering aims at grouping consumers in a way that consumers in the same segment (called a cluster) are more similar to each other than those in other segments (clusters) [60]. Clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes [61]. A cluster is a collection of data objects with higher similarity within cluster and lower similarity between clusters. The degree of similarity is usually described by the distance between objects. The greater the distance, the smaller the similarity is, vice versa [62]. Due to the complexity and size of the spatial databases, clustering methods should be efficient in high dimensional space, explicit in the consideration of scale, insensitive to a large amount of noise, capable of identifying useful outliers, insensitive to initialization, effective in handling multiple data types, independent to a priori or domain specific knowledge, and able to detect structures of irregular shapes. Conventional clustering algorithms often fail to fulfill these requirements [63].
As shown in Table 1, clustering algorithms can be classified into the following categories: partitional clustering algorithms, hierarchical clustering methods, density-based clustering algorithms, grid-based clustering algorithms. Many of these can be adapted to or are specially tailored for spatial data [64]. General purpose high-dimensional clustering methods mainly deal with non-spatial feature spaces and have very limited power in recognizing spatial patterns that involve neighbors [65]. Table 1. Summary of clustering classifications and the common algorithms used to achieve partitioning, hierarchical, density-based, or grid-based approaches [66].

Partitioning
Partitions the data into a user specified number of groups. Each point belongs to one group. Does not work well for irregularly shaped clusters.

Hierarchical
Decomposes data into a hierarchy of groups, each larger group contains a set of subgroups. Two methods: agglomerative (builds groups from the observation up), or divisive (start with a large group and separate).
Balanced iterative reducing and clustering using hierarchies (BIRCH), Chameleon, Ward's method, nearest neighbor, (dendrograms are used to visualize the hierarchy).

Density-based
Useful for irregularly shaped clusters. Clusters grow based on a threshold for the number of objects in a neighborhood.

Grid-based
Region is divided into a grid of cells, and clustering is performed on the grid structure.
In geographic segmentation, data are clustered or categorized according to geographic criteria such as nations, states, regions, countries, cities, neighborhoods, or postal codes. However, during the process of segmentation, a serious overlapping issue may occur and leads to an inefficient geospatial analysis. Moreover, geo-marketing is usually active in urban areas and requires clusters to be organized in a three-dimensional (3D) way [67]. For spatial clustering, it is important to be able to identify high-dimensional spatial clusters, which involves both the spatial dimensions and several non-spatial dimensions [12]. Table 2 presents a synthesis of recent studies in the field of spatial clustering. Spatial clustering has also long been used as an important process in geographic analysis [65]. Although clustering is an unsupervised learning technique, firms can utilize segments in predictive modeling by creating separate predictive models for each segment [59]. k-means is one of the simplest unsupervised learning algorithms used for clustering. k-means partitions n observations into k clusters in which each observation belongs to the cluster with the nearest mean. This algorithm aims at minimizing an objective function, in this case, a squared error function [76]. In practice, businesses usually strive to reduce the overlaps and gaps in their market coverage. The k-means algorithm has been widely used in market segmentation. The review of different geo-marketing research revealed that k-means has been popular, especially in clustering applications in combination with GIS [67,74]. Azri argued that segmenting the marketing data could increase time efficiency and sales volume. However, overlapping area among segmented clusters may lead to inefficient data management. In their research, they proposed clustering algorithm, k-means++ to reduce the overlap area. This approach was able to minimize the overlap region during market segmentation [67]. Ezenkwu demonstrated in his paper an application of the k-means algorithm to conduct an efficient customer segmentation. He used a MATLAB implementation of the k-means clustering based on data collected from a mega business retail outfit that has many branches. The result of this study shows that the algorithm has a purity measure of 0.95, indicating 95% accurate segmentation of the customers [74]. Even though the k-means algorithm is widely used in geo-clustering tasks, however, this clustering method has significant disadvantages versus deep-learning algorithms. The main weaknesses of the k-means can be summarized as follows: the number of cluster centers needs to be pre-determined, the algorithm fails for non-linear data sets; applicable only when for numerical data sets and fails for categorical data; unable to handle noisy data and outliers [45,77]. Another clustering algorithm which has been identified in spatial applications is an unsupervised learning approach known as the self-organizing map also called the Kohonen algorithm.

Data and Methodology
Self-organizing maps are feedforward, unsupervised neural networks and were developed in the 1980s by Kohonen [78]. The self-organizing maps differ from other neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighborhood function to preserve the topological properties of the input space [79]. SOM converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display [80]. The SOM provides a topologypreserving mapping from a high-dimensional input space to a lower-dimensional output space. It is often applied for visualization of high-dimensional data [81]. SOM, also known as the Kohonen network, consists of an input layer, which distributes the inputs to each node in a second layer, the so-called competitive layer. Each of the nodes on this layer acts as an output node. Each neuron in the competitive layer is connected to other neurons in its neighborhood and feedback is restricted to neighbors through these lateral connections. Neurons in the competitive layer have excitatory connections to immediate neighbors and inhibitory connections to more distant neurons. All neurons in the competitive layer receive a mixture of excitatory and inhibitory signals from the input layer neurons and from other competitive layer neurons [82]. The Kohonen layer is a "Winner-takes-all" layer. Thus, for a given input vector, only the Kohonen layer output is 1 whereas all others are 0. No training vector is required to achieve this performance. Hence, the name: self-organizing map layer (SOM-layer) [83].
The self-organizing map differs considerably from the feedforward backpropagation neural network in both how it is trained and how it recalls a pattern. The self-organizing map does not use an activation function or a threshold value. In addition, output from the self-organizing map is not composed of output from several neurons; rather, when a pattern is presented to a self-organizing map, one of the output neurons is selected as the "winner". This "winning" neuron provides the output from the self-organizing map. Often, a "winning" neuron represents a group in the data that is represented to the self-organizing map [84].
One further advantage of the SOM is the 2D output grid that can be used for visualization of the results of the SOM. This visualization can enhance the understanding for the underlying dataset [85]. Unlike other neural networks, there is no hidden layer or hidden processing units. As shown in Figure 2, in the SOM, the neurons of the input layer relate to all the neurons of the output layer through synaptic weights. Consequently, the information provided by each neuron of the input layer is transmitted to all the neurons of the output layer. All the neurons of the output layer receive the same set of inputs from the input layer.
Businesses 2021, 1, FOR PEER REVIEW 11 relate to all the neurons of the output layer through synaptic weights. Consequently, the information provided by each neuron of the input layer is transmitted to all the neurons of the output layer. All the neurons of the output layer receive the same set of inputs from the input layer. The SOM is trained by using a combination of neighborhood size, neighborhood update parameters, and a weight-change parameter. The SOM is formed initially with random weights between the input layer neurons and each of the neurons in the SOM. Each neuron in the input layer is connected to every neuron in the SOM. A neighborhood is the region around a given neuron that will be eligible to have the weights adapted. Neurons outside the region defined by the neighborhood do not undergo any weight adjustment. As the training is performed, the neighborhood size is adjusted downward until it surrounds a single neuron. The use of a neighborhood that reduces in size over time allows the SOM to group similar items together [86]. The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance (within the lattice) from the BMU [79].
The training process of the Kohonen network algorithm consists of the following steps [82]: 1. Initialize network:

Define
( ) (0 ≤ ≤ − 1) to be the weight from input to node at time .
Initialize weights from inputs to the nodes to small random values. Set the initial radius of the neighborhood around node , (0), to be large.

Calculate distances:
Compute distances between the input and each output node using: Select minimum distance: Designate the output node with minimum to be * . The SOM is trained by using a combination of neighborhood size, neighborhood update parameters, and a weight-change parameter. The SOM is formed initially with random weights between the input layer neurons and each of the neurons in the SOM. Each neuron in the input layer is connected to every neuron in the SOM. A neighborhood is the region around a given neuron that will be eligible to have the weights adapted. Neurons outside the region defined by the neighborhood do not undergo any weight adjustment. As the training is performed, the neighborhood size is adjusted downward until it surrounds a single neuron. The use of a neighborhood that reduces in size over time allows the SOM to group similar items together [86]. The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance (within the lattice) from the BMU [79].

Update weights:
The training process of the Kohonen network algorithm consists of the following steps [82]:

1.
Initialize network: Define w ij (t) (0 ≤ i ≤ n − 1) to be the weight from input i to node j at time t. Initialize weights from n inputs to the nodes to small random values. Set the initial radius of the neighborhood around node j, N j (0), to be large.

2.
Present new input: is the input to node to node i at time t.

3.
Calculate distances: Compute distances d j between the input and each output node j using: Select minimum distance: Designate the output node with minimum d j to be j * .

Update weights:
Update weights for node j * and its neighbors, defined by the neighborhood size N j * (t). New weights are: The term η(t) also called learning rate is a gain term (0 < η(t) < 1) that decreases in time, so slowing the weight adaption. The neighborhood N j * (t) decreases in size as time goes on, thus localizing the area of maximum activity.
The data are provided by an international manufacturing company that sells industrial parts to other (B2B) companies. The data set with 2881 observations includes customer data, total sales in 2019, sales for 5 different product lines, industry type, and product category ( Table 3). The data include numeric as well as categorical variables. The data set also contains the latitude and longitude of the locations of the customers. Figure 3 shows how the 2881 customers are spread across the United States. Update weights for node * and its neighbors, defined by the neighborhood size * ( ). New weights are: for in * ( ), (0 ≤ ≤ − 1) The term ( ) also called learning rate is a gain term (0 < ( ) < 1) that decreases in time, so slowing the weight adaption. The neighborhood * ( ) decreases in size as time goes on, thus localizing the area of maximum activity.

Repeat from step 2.
The data are provided by an international manufacturing company that sells industrial parts to other (B2B) companies. The data set with 2881 observations includes customer data, total sales in 2019, sales for 5 different product lines, industry type, and product category ( Table 3). The data include numeric as well as categorical variables. The data set also contains the latitude and longitude of the locations of the customers. Figure  3 shows how the 2881 customers are spread across the United States. The company wants to perform a customer segmentation to determine which customers with similar patterns should be grouped based on the product category (finished goods, spare parts, repair), and which customers are more service-sensitive and should be supported through its external marketing channels. The strategic objective is to develop a tool to be used by the marketing department to adjust the go-to-market strategy, i.e., to determine marketing policies for different customer groups based on their patterns and needs.  The company wants to perform a customer segmentation to determine which customers with similar patterns should be grouped based on the product category (finished goods, spare parts, repair), and which customers are more service-sensitive and should be supported through its external marketing channels. The strategic objective is to develop a tool to be used by the marketing department to adjust the go-to-market strategy, i.e., to determine marketing policies for different customer groups based on their patterns and needs.
This study adopts the Jupyter notebook which is an open-source, browser-based tool acting as a virtual lab notebook to support scientific workflows, coding, data, and visualizations. The python programming environment offers different libraries which support deep learning (Geopy, Scipy), clustering (MiniSom, Kmeans), and spatial analysis and map visualization (Numpy, Pandas, Matplotlib, Seaborn, Folium). The python programming language will be used to perform the unsupervised training and classify the unlabeled data. This study provides a framework which enables the mapping and visualization of performance metrics and results.
The data set contains numeric data as well as categorical data. The latter will be transformed into suitable numeric values. Since the variables are measured at different scales, the data normalization is necessary when working with machine learning algorithms. The goal of data normalization is to convert the values of numeric variables into a common scale by avoiding any distortion of the data or a loss information. The MinMax scaling method will be used prior to the model fitting and training process: Before conducting the unsupervised SOM training, k-means clustering will be performed to benchmark the SOM' performance. By using different performance metrics, this study aims to evaluate the efficiency and accuracy of the unsupervised learning.

Results and Discussions
As discussed earlier, k-means algorithm aims to partition the data points into a predefined number of clusters (k) in which each point belongs to the cluster with the nearest mean. It starts by randomly selecting k centroids and assigning the points to the closest cluster, then it updates each centroid with the mean of all points in the cluster. To determine the optimal number (k) of clusters, the elbow method will be applied. This method plots the cluster variance as a function of the number of clusters and selects the k that flats the curve. As presented in Figure 4, the elbow method defines k = 7 as the optimal number of clusters in the data.
Businesses 2021, 1, FOR PEER REVIEW 13 This study adopts the Jupyter notebook which is an open-source, browser-based tool acting as a virtual lab notebook to support scientific workflows, coding, data, and visualizations. The python programming environment offers different libraries which support deep learning (Geopy, Scipy), clustering (MiniSom, Kmeans), and spatial analysis and map visualization (Numpy, Pandas, Matplotlib, Seaborn, Folium). The python programming language will be used to perform the unsupervised training and classify the unlabeled data. This study provides a framework which enables the mapping and visualization of performance metrics and results.
The data set contains numeric data as well as categorical data. The latter will be transformed into suitable numeric values. Since the variables are measured at different scales, the data normalization is necessary when working with machine learning algorithms. The goal of data normalization is to convert the values of numeric variables into a common scale by avoiding any distortion of the data or a loss information. The MinMax scaling method will be used prior to the model fitting and training process: = − min ( ) max( ) − min( ) Before conducting the unsupervised SOM training, k-means clustering will be performed to benchmark the SOM' performance. By using different performance metrics, this study aims to evaluate the efficiency and accuracy of the unsupervised learning.

Results and Discussions
As discussed earlier, k-means algorithm aims to partition the data points into a predefined number of clusters (k) in which each point belongs to the cluster with the nearest mean. It starts by randomly selecting k centroids and assigning the points to the closest cluster, then it updates each centroid with the mean of all points in the cluster. To determine the optimal number (k) of clusters, the elbow method will be applied. This method plots the cluster variance as a function of the number of clusters and selects the k that flats the curve. As presented in Figure 4, the elbow method defines k = 7 as the optimal number of clusters in the data. Given the number of clusters k = 7 determined above, k-means clustering has been performed by using the python library KMeans. As a result, seven clusters including their centroids are plotted as a scatter plot shown in Figure 5. Since the data points are georeferenced, the clusters and their centroids are plotted by using their geographic Given the number of clusters k = 7 determined above, k-means clustering has been performed by using the python library KMeans. As a result, seven clusters including their centroids are plotted as a scatter plot shown in Figure 5. Since the data points are geo-referenced, the clusters and their centroids are plotted by using their geographic coordinates. By applying k-means to the data set, it failed to provide useful clusters or distinguish clear patterns or boundaries between the clusters. Traditional clustering algorithms are often used in marketing analytics to perform market segmentation tasks. Based on the pre-defined number of clusters, marketing managers allocate the support capabilities for these clusters. Consequently, an incorrect assessment of the number of clusters can be misleading and thus affecting strategic decision making.
Businesses 2021, 1, FOR PEER REVIEW 14 coordinates. By applying k-means to the data set, it failed to provide useful clusters or distinguish clear patterns or boundaries between the clusters. Traditional clustering algorithms are often used in marketing analytics to perform market segmentation tasks. Based on the pre-defined number of clusters, marketing managers allocate the support capabilities for these clusters. Consequently, an incorrect assessment of the number of clusters can be misleading and thus affecting strategic decision making. By using the existing python library Minisom, the self-organizing maps (SOM) have been implemented. Minisom accepts two types of training: train_random and train batch. Train_random means that the model will use random samples from the data during training. In train batch mode, the samples are chosen in the given order inside the data set. In this paper, the train_random method was chosen.
Since the SOM algorithm uses unsupervised learning, the labels of the data will be ignored during the training process. Although an unsupervised training will be performed, the label of the target attribute (product category) will be kept only for the SOM visualization of the clusters but is not factored into the training. To determine the size of the SOM, a trial range of options was conducted. Finally, a map with a size 4×4 neurons and 1000 iterations was selected. This grid size gives the feature map enough space to spread out while displaying some overlaps between the classes. In this study, the SOM map uses a hexagonal grid and Gaussian neighborhood function. The other arguments for the Minisom function have been set as follows: Sigma (1.0) which defines the spread of the neighborhood function. The learning rate of 0.5 defines the initial learning rate for the SOM. The learning rate decreases linearly with the training iterations.
For a self-organizing map to be an accurate model, it must preserve the topology and neighborhoods of the input data while also fitting the data [87]. The quality measures are chosen for this study based on their usefulness. The literature recommends two types of quality measures, namely, quantization error and topological error.
The quantization error is a measure to evaluate the resolution of the mapping that can be considered inherent to the process of modeling [78]. The topological error, also known as topographic error, measures the topology preservation and the continuity of the mapping. It is defined by the proportion of all data vectors where the BMU and second BMU are not adjacent units [87]. Both quality measures have been computed during the training. At the end of the training with 1000 iterations, the quantization error reached 0.78. To understand how the training evolves, Figure 6 plots the quantization and topographic error of the SOM at each iteration step. By using the existing python library Minisom, the self-organizing maps (SOM) have been implemented. Minisom accepts two types of training: train_random and train batch. Train_random means that the model will use random samples from the data during training. In train batch mode, the samples are chosen in the given order inside the data set. In this paper, the train_random method was chosen.
Since the SOM algorithm uses unsupervised learning, the labels of the data will be ignored during the training process. Although an unsupervised training will be performed, the label of the target attribute (product category) will be kept only for the SOM visualization of the clusters but is not factored into the training. To determine the size of the SOM, a trial range of options was conducted. Finally, a map with a size 4×4 neurons and 1000 iterations was selected. This grid size gives the feature map enough space to spread out while displaying some overlaps between the classes. In this study, the SOM map uses a hexagonal grid and Gaussian neighborhood function. The other arguments for the Minisom function have been set as follows: Sigma (1.0) which defines the spread of the neighborhood function. The learning rate of 0.5 defines the initial learning rate for the SOM. The learning rate decreases linearly with the training iterations.
For a self-organizing map to be an accurate model, it must preserve the topology and neighborhoods of the input data while also fitting the data [87]. The quality measures are chosen for this study based on their usefulness. The literature recommends two types of quality measures, namely, quantization error and topological error.
The quantization error is a measure to evaluate the resolution of the mapping that can be considered inherent to the process of modeling [78]. The topological error, also known as topographic error, measures the topology preservation and the continuity of the mapping. It is defined by the proportion of all data vectors where the BMU and second BMU are not adjacent units [87]. Both quality measures have been computed during the training. At the end of the training with 1000 iterations, the quantization error reached 0.78. To understand how the training evolves, Figure 6 plots the quantization and topographic error of the SOM at each iteration step. The SOM itself does not explicitly assign data items to clusters, nor does it identify cluster boundaries, as opposed to other clustering methods. Thus, visualizing the mapping created by the SOM is a key factor in supporting the user in the analysis process. A wealth of methods have been developed, mainly to visualize the cluster structures of the data [88].
To visualize the results of the training, the distance map also called U-Matrix ( Figure  7) using a pseudo color where the neurons of the maps are displayed as an array of cells and the color represents the (weights) distance from the neighbor neurons.
The elements with similar patterns are located close to each other. The map representation visualizes the neurons density using colors and markers showing the magnitude. The darker regions show the density and which nodes are close to each other. Light shades mean they are far from one another. The clusters are divided by green (finished goods), red (spare parts), and blue (repair).
The SOM has separated each cluster into topologically distinct areas of the map. Some clusters have been located across different regions in the SOM space. Other clusters fill a distinct region and therefore the SOM has been effective. The U-Matrix also shows that regions with a high density of points are co-habited by data from multiple clusters. To detect which neurons of the map have been activated more frequently, Figure 8 shows the clusters with different colors and reflects the activation frequencies. The SOM map clearly distinguishes the patterns in the data and visualizes that the customers can The SOM itself does not explicitly assign data items to clusters, nor does it identify cluster boundaries, as opposed to other clustering methods. Thus, visualizing the mapping created by the SOM is a key factor in supporting the user in the analysis process. A wealth of methods have been developed, mainly to visualize the cluster structures of the data [88].
To visualize the results of the training, the distance map also called U-Matrix ( Figure 7) using a pseudo color where the neurons of the maps are displayed as an array of cells and the color represents the (weights) distance from the neighbor neurons. The SOM itself does not explicitly assign data items to clusters, nor does it identify cluster boundaries, as opposed to other clustering methods. Thus, visualizing the mapping created by the SOM is a key factor in supporting the user in the analysis process. A wealth of methods have been developed, mainly to visualize the cluster structures of the data [88].
To visualize the results of the training, the distance map also called U-Matrix ( Figure  7) using a pseudo color where the neurons of the maps are displayed as an array of cells and the color represents the (weights) distance from the neighbor neurons.
The elements with similar patterns are located close to each other. The map representation visualizes the neurons density using colors and markers showing the magnitude. The darker regions show the density and which nodes are close to each other. Light shades mean they are far from one another. The clusters are divided by green (finished goods), red (spare parts), and blue (repair).
The SOM has separated each cluster into topologically distinct areas of the map. Some clusters have been located across different regions in the SOM space. Other clusters fill a distinct region and therefore the SOM has been effective. The U-Matrix also shows that regions with a high density of points are co-habited by data from multiple clusters. To detect which neurons of the map have been activated more frequently, Figure 8 shows the clusters with different colors and reflects the activation frequencies. The SOM map clearly distinguishes the patterns in the data and visualizes that the customers can The elements with similar patterns are located close to each other. The map representation visualizes the neurons density using colors and markers showing the magnitude. The darker regions show the density and which nodes are close to each other. Light shades mean they are far from one another. The clusters are divided by green (finished goods), red (spare parts), and blue (repair).
The SOM has separated each cluster into topologically distinct areas of the map. Some clusters have been located across different regions in the SOM space. Other clusters fill a distinct region and therefore the SOM has been effective. The U-Matrix also shows that regions with a high density of points are co-habited by data from multiple clusters.
To detect which neurons of the map have been activated more frequently, Figure 8 shows the clusters with different colors and reflects the activation frequencies. The SOM map clearly distinguishes the patterns in the data and visualizes that the customers can be segmented into 16 different clusters. In the bottom part of the map, some green and red clusters overlap in some areas which shows those clusters are similar to each other. In the center of the map, the clusters are clearly differentiated. In the top side of the map, the clusters related to repair are clearly distinguished, while the top 2 clusters on the right side share similar properties. be segmented into 16 different clusters. In the bottom part of the map, some green and red clusters overlap in some areas which shows those clusters are similar to each other. In the center of the map, the clusters are clearly differentiated. In the top side of the map, the clusters related to repair are clearly distinguished, while the top 2 clusters on the right side share similar properties.   By integrating the geo-coordinates of the data points during the training process, the SOM clusters have been be plotted according to their geographic distribution. Figure 10 exhibits the distribution of 16 SOM clusters as a scatterplot by using their latitudes and longitudes. Likewise, each cluster centroid has been assigned a pair of coordinates in the scatterplot. Figure 11 visualizes the geo-location of the clusters and their centroids across the US map which displays multiple high-density regions such as along the east-coast around Boston-New York and towards the south around North-Carolina-Florida. A high concentration of the clusters can be also seen in the eastern states such as Illinois, Indiana,  be segmented into 16 different clusters. In the bottom part of the map, some green and red clusters overlap in some areas which shows those clusters are similar to each other. In the center of the map, the clusters are clearly differentiated. In the top side of the map, the clusters related to repair are clearly distinguished, while the top 2 clusters on the right side share similar properties.   By integrating the geo-coordinates of the data points during the training process, the SOM clusters have been be plotted according to their geographic distribution. Figure 10 exhibits the distribution of 16 SOM clusters as a scatterplot by using their latitudes and longitudes. Likewise, each cluster centroid has been assigned a pair of coordinates in the scatterplot. Figure 11 visualizes the geo-location of the clusters and their centroids across the US map which displays multiple high-density regions such as along the east-coast around Boston-New York and towards the south around North-Carolina-Florida. A high concentration of the clusters can be also seen in the eastern states such as Illinois, Indiana, By integrating the geo-coordinates of the data points during the training process, the SOM clusters have been be plotted according to their geographic distribution. Figure 10 exhibits the distribution of 16 SOM clusters as a scatterplot by using their latitudes and longitudes. Likewise, each cluster centroid has been assigned a pair of coordinates in the scatterplot. Figure 11 visualizes the geo-location of the clusters and their centroids across the US map which displays multiple high-density regions such as along the east-coast around Boston-New York and towards the south around North-Carolina-Florida. A high concentration of the clusters can be also seen in the eastern states such as Illinois, Indiana, Michigan, and Ohio. The visualization of cluster centroids in the geo-clustering map can be used for assigning the location of the future marketing channels.
Michigan, and Ohio. The visualization of cluster centroids in the geo-clustering map can be used for assigning the location of the future marketing channels.  The geo-clustering by the SOM enables a deeper understanding of how the clusters are spread out through space. Table 4 highlights how the clusters stretch over the different states. Due to a high industrial concentration in Michigan, Wisconsin, Illinois, Indian, Ohio, Pennsylvania, and New York, clusters C1 to C9 intersect with each other at least in one of these regions. C10 covers Missouri, Virginia, Tennessee, and overlaps with C11 in Kentucky. C12 extends over a relatively wide territory including Texas, Louisiana, New Mexico, Arizona, and Colorado. C13 and C14 include Arizona, Utah, San Francisco, and Nevada. C15 covers Arkansas, Oklahoma, and Tennessee. C16 covers mainly Florida but also extends with a lower density over Georgia, Alabama, Louisiana, Mississippi, and Tennessee. Most of the clusters cover several states which requires from future marketing channels a wide market coverage to extend the reach of their product offering. Overlapping clusters especially those across the main industrial manufacturing belt between Wisconsin and New York can be expected to increase the intensity of competition among the marketing channels. In this context, marketing channels are requested to introduce more differentiation. Offering extra services can help them to avoid price conflicts. Michigan, and Ohio. The visualization of cluster centroids in the geo-clustering map can be used for assigning the location of the future marketing channels.  The geo-clustering by the SOM enables a deeper understanding of how the clusters are spread out through space. Table 4 highlights how the clusters stretch over the different states. Due to a high industrial concentration in Michigan, Wisconsin, Illinois, Indian, Ohio, Pennsylvania, and New York, clusters C1 to C9 intersect with each other at least in one of these regions. C10 covers Missouri, Virginia, Tennessee, and overlaps with C11 in Kentucky. C12 extends over a relatively wide territory including Texas, Louisiana, New Mexico, Arizona, and Colorado. C13 and C14 include Arizona, Utah, San Francisco, and Nevada. C15 covers Arkansas, Oklahoma, and Tennessee. C16 covers mainly Florida but also extends with a lower density over Georgia, Alabama, Louisiana, Mississippi, and Tennessee. Most of the clusters cover several states which requires from future marketing channels a wide market coverage to extend the reach of their product offering. Overlapping clusters especially those across the main industrial manufacturing belt between Wisconsin and New York can be expected to increase the intensity of competition among the marketing channels. In this context, marketing channels are requested to introduce more differentiation. Offering extra services can help them to avoid price conflicts. The geo-clustering by the SOM enables a deeper understanding of how the clusters are spread out through space. Table 4 highlights how the clusters stretch over the different states. Due to a high industrial concentration in Michigan, Wisconsin, Illinois, Indian, Ohio, Pennsylvania, and New York, clusters C1 to C9 intersect with each other at least in one of these regions. C10 covers Missouri, Virginia, Tennessee, and overlaps with C11 in Kentucky. C12 extends over a relatively wide territory including Texas, Louisiana, New Mexico, Arizona, and Colorado. C13 and C14 include Arizona, Utah, San Francisco, and Nevada. C15 covers Arkansas, Oklahoma, and Tennessee. C16 covers mainly Florida but also extends with a lower density over Georgia, Alabama, Louisiana, Mississippi, and Tennessee. Most of the clusters cover several states which requires from future marketing channels a wide market coverage to extend the reach of their product offering. Overlapping clusters especially those across the main industrial manufacturing belt between Wisconsin and New York can be expected to increase the intensity of competition among the marketing channels. In this context, marketing channels are requested to introduce more differentiation. Offering extra services can help them to avoid price conflicts. A more precise understanding of the spatial cluster coverage implies the analysis of the relationships between the other segmentation attributes. The results of the SOM customer segmentation are exhibited in Table 5, which shows the sizes and the distinguishing characteristics of each cluster. The size of the clusters is based on the number of customers and the total sales generated in each cluster. C9, C12, and C16 represent the largest clusters. C1, C4, and C15 have a medium size. C2, C3, C5, C6, C7 and C8 are the smallest ones. All clusters contain customers from all over the industry types (automotive, aerospace, and machining), however, the SOM presents the product category as the main differentiating attribute. The clusters C4, C5, C9, C10, C13, and C14 consist of customers demanding finished goods only. These clusters extend over geographical territories with a high OEM density. The clusters C7, C11, C15, and C16 include only customers for repair services. For instance, C16 covers predominantly Florida which includes a low concentration of manufacturing companies but a large base of end users with a need for repair services. C8 and C12 reflect customers requiring spare parts. The remaining clusters combine two or three different product categories. In this study, the SOM clustered the customer groups based on geographic, demographic, market-related, and product-related variables. Most of the clusters consist of customers who are homogeneous in terms of their specific requirements and benefits sought, thus allowing marketing channels to deploy targeted marketing campaigns that promote the products and services to each customer segment. Understanding the differentiating attributes of each segment in terms of needs and expectations enables marketers to derive appropriate channel selection criteria. All the sixteen clusters consist of cus-tomers who belong to the different industry sectors, namely, automotive, aerospace, and machining. Hence, future marketing channels should possess market knowledge and direct access to these industry segments. Palmatier [5] considered the customer demand as a basis for determining the channel structure. As mentioned earlier, the demand type is captured in the data through the product category variable which has been presented by the SOM as a key differentiating attribute during the segmentation process. The promotion of finished goods requires technical competencies such as product sizing and configuration. Furthermore, the marketing channels need to offer physical selling to better support the customers on selecting the right products. Clusters with spare parts customers require from marketing channels inventory centers close to customer locations to ensure prompt deliveries of a product that is intended to be used as a replacement for installation on a machine that is in service. Moreover, e-commerce capabilities are beneficial in this product category to shorten time to market. In the repair segment, for instance C15 and C16, marketing channels need to have a qualified repair capacity as well as a customer hotline. Repair centers located close to the customers will help shorten travel distance and reduce downtime. Heterogenous clusters (C1, C2, C4, C6) with more than one product category require from the marketing channels matching the different criteria discussed above to fulfill the specific customer requirements of each segment.

Implications
This study yields several implications for research and practice. First, little attention has been paid in the literature to study geo-segmentation in B2B markets. While the existing literature focused more on B2C markets, this research on B2B segmentation provides insightful implications for predictive segmentation. Moreover, previous studies using unsupervised learning for customer segmentation have not placed emphasis on the cluster centroids and their usefulness to optimize the market coverage within the different clusters.
This paper also provides several practical implications. Using geo-marketing methods in conjunction with unsupervised deep learning allows the visualization of the spatial cluster distribution and detection of interdependencies between the different segmentation variables when making marketing decisions. The results of this study have demonstrated how the specific patterns of the clusters can be used to improve B2B channel management strategies. Without a deep understanding of how customers are segmented, firms often lack market focus and fail to allocate their limited resources efficiently. A differentiated cluster strategy should support creating clear roles and functions among the marketing channel members for each segment. Assigning dedicated channels for the different customer clusters and defining clear responsibilities will help minimize the channel conflicts and channel cannibalization. The model adopted in this paper demonstrates how the SOM can deploy the cluster centroids to localize where the future marketing channels can be ideally established. Moreover, the centroids can be used for an optimal allocation of the inventory locations and service centers. Well defined cluster territories and boundaries will help overcoming multichannel challenges such as overlaps and white spots in the market coverage.
Due to the increasing dimensionality of customer data, descriptive segmentation methods face serious limitations. User interventions on selecting of number of clusters or manual handling of the segmentation variables often lead to a bias in results and a loss in cluster efficiency. The adoption of deep learning methods into B2B marketing opens new horizons for marketers. In the era of big data, predictive analytics and location intelligence should be considered as an integral part of the modern geo-marketing. The results of this study provide insightful outputs for the marketing managers to improve their current clustering approach. Deep learning has become attractive to classify unlabeled data, detect non-linear and hidden relationships, and propose efficient clusters. Although self-organized learning does not involve an external teacher to supervise the learning process, the model requires technical capabilities such as programming and data analytics.
This requires a change in the skillset of the marketing managers to better leverage the power of artificial intelligence.

Conclusions
This research deals with a capability of unsupervised deep learning to perform geospatial clustering in the B2B marketing context. While the k-mean algorithm could not find any useful clusters out of the data set, the unsupervised SOM has proven its capability of detecting both spatially homogeneous and spatially heterogeneous regions across the US. As shown in this study, after the integration of the geographical coordinates in the SOM, combined with various non-spatial variables, the model performs well and produces well-defined clusters, which will allow the marketing channels to identify which products and services they should offer and which benefits they should promote.
This research has some limitations that should be highlighted to provide avenues for future research. For the scope of this research and the implementation of this study, only the data of the year 2019 have been considered. This paper suggests future research to investigate the historical trends including temporal changes of the patterns inside the clusters. In this context, it would be also useful to investigate how the clusters and their centroids shift their positions over time.