Analytical Business Model for Sustainable Distributed Retail Enterprises in a Competitive Market

Retail enterprises are organizations that sell goods in small quantities to consumers for personal consumption. In distributed retail enterprises, data is administered per branch. It is important for retail enterprises to make use of data generated within the organization to determine consumer patterns and behaviors. Large organizations find it difficult to ascertain customer preferences by merely observing transactions. This has led to quantifiable losses, such as loss of market share to competitors and targeting the wrong market. Although some enterprises have implemented classical business models to address these challenging issues, they still lack analytics-based marketing programs to gain a competitive advantage to deal with likely catastrophic events. This research develops an analytical business (ARANN) model for distributed retail enterprises in a competitive market environment to address the current laxity through the best arrangement of shelf products per branch. The ARANN model is built on association rules, complemented by artificial neural networks to strengthen the results of both mutually. According to experimental analytics, the ARANN model outperforms the state of the art model, implying improved confidence in business information management within the dynamically changing world economy.


Introduction
Business information (BI) analytics are groups of methodologies, organizational techniques and tools used collectively to gain information, analyze it and predict the outcomes of solutions to problems [1].The field of BI analytics through the use of operational data generated from transactional systems has given business users better insight into the problems they face [2].These insights can assist business users or managers to make better and informed decisions.BI analytics are commonly applied in sustainable retail enterprises.Retail enterprises purchase goods from manufacturers or wholesalers in large quantities.They break up the bulk and resell those goods in smaller quantities directly to consumers.Consumers can go around the shop, pick the items of their choice from the shop shelves, place them into their baskets and then the contents of each basket are captured into transactional systems.These transactional systems generate data that can be used for analysis purposes.There are two major types of retail enterprises: centralized and distributed retail enterprises.This paper concentrates on distributed retail enterprises as a way of alleviating analytics issues of enterprises in a competitive market environment.
A distributed retail enterprise issues decision rights to the branches or groups nearest to the data collection [3].Each branch can make its own decisions, depending on the data generated.A distributed retail enterprise often maintains clustered databases for each branch for the storage of data.Data generated in a distributed retail enterprise branch usually reflects the true customer purchasing habits at that particular branch.Data analysis per branch might reveal better results than a centralized data management system.It is, therefore, important to analyze data generated in each branch to realize meaningful patterns.Analysts can apply BI analytics to branch data in order to generate meaningful patterns for each particular branch.
Retail enterprises strive for survival in view of the current challenging sales optimization models.These models affect product arrangements in retail enterprises, leading to a decline in sales levels [4], high research and marketing costs, a decline in market share, wrong product target markets and poor management decisions [5]. Figure 1 presents the quantitative impact of these challenging sales optimization models in retail enterprises.Figure 1a shows the sales decline in Hungarian retail enterprises in June 2013.The sales level of computer equipment and books declined drastically by 4.8%, while sales of non-food items had the lowest level of decline of 0.4%.Figure 1b shows the causes of the reduction in sales level.The highest scoring reason for the reduction in sales was expensiveness (48%), followed by 41% of products with features unavailable.The least common reason for a reduction in sales was lack of functionality (20%).Adapted from [6]; (b) reasons for reduction in sales.Adapted from [7].
Data quality problems also affect the quality of decisions made by managers on different levels of a retail enterprise [5].Poor data has caused problems in both traditional and e-business companies, as shown in Figure 2. In both types of companies, extra cost to prepare reconciliations was seen as the main problem caused by inadequate data.This was seen to have an impact of 58% and 57% respectively.Inability to deliver orders or loss of sales was also a poor data quality challenge that had a higher impact in e-business (33%) than in traditional (24%) companies.The lowest-scoring problem caused by poor data was failure to meet a significant contractual requirement.An organization implemented an easy-to-use desktop and server analytics software program for the development of several business units and to improve the basis for decision-making [9].The challenge was to test the most effective BI analytics for solving theoretical business problems.A data consolidation project was undertaken in South Africa by Altron to organize and deliver high-quality data successfully to its executives on their Apple iPads [10].The smart phones' interfaces were too small for the style and amount of information they wanted to deliver.This approach posed the following challenges: lack of an analytics-based marketing program, failure to make BI a "matchmaker", lack of business-driven analytic strategies and failure to test the most effective BI analytics for solving theoretical business problems.
This paper develops an analytical business (ARANN) model that can be used in distributed retail enterprises within the dynamically changing world economy to implement the best arrangement of shelf products at each branch in order to improve the weaknesses highlighted in Figures 1 and 2. The ARANN model is built on a machine learning technique, association rules (AR) technique, complemented by an artificial neural network (ANN) technique to strengthen the results of the individual models.Since sustainability in this context generally requires the ability of a business to sustain itself in times of crisis, similar to competitive markets, ARANN has been specifically designed for sustainable distributed and centralized retail enterprises.The major contributions in this paper are the following:

‚
Development of a newly proposed analytical ARANN model that could intelligently assist distributed retail enterprise management within competitive markets to arrange products optimally on store shelves so that customers will purchase more products than planned in order to achieve an optimal profit level.

‚
Detailed experimental evaluations conducted on the sustainable ARANN model as measures of its performance using publicly available data and a volume of real-life retail data sets captured in ever-changing markets.

‚
Application of a robust business model in terms of (i) deployment scenarios, (ii) distributed and centralized analytics, (iii) time and memory scalability, and (iv) benchmark with classical methods for ease of implementation for managerial practices in IT.
To our knowledge, not enough research has presented user-friendly models and work examples to make technical information and BI available to professional managers.This paper is structured as follows: Section 2 previews work done in the area of AR and ANN, Section 3 proposes an intelligent model for distributed retail enterprises, Section 4 focuses on experimental evaluations and finally, Section 5 concludes the paper.

Related Work
Besides the analytics software programs and projects mentioned above, classical applications of AR and ANN are highlighted here.From the research conducted in [11], the authors applied AR to medical data containing combinations of categorical and numerical attributes to discover useful rules and from this experiment, useful and concise AR were discovered for prediction purposes.In [12], the authors implemented a system for the discovery of AR in web log usage data as an object-oriented application and discovered excellent associations within the data.They put forward "interestingness measures" as future work.In [13], the researchers applied an AR algorithm to a large database of customer transactions from a large retailing company to test the effectiveness of the algorithm and it exhibited excellent performance.In the study conducted in [14], it was observed that AR is effective in revealing associations though it does not take into account special interests.A comprehensive survey was conducted in [15] regarding AR on quantitative data in data mining.The authors examined it using different parameters and they concluded that the direct application of AR might produce a large number of redundant rules.This is also supported in the article in [16].
AR was applied in [4] to a sport company struggling with the arrangement of sports items in accordance with customer purchasing patterns.The retail company had no computerized mechanism for providing the best item arrangement.The study was performed to identify purchasing patterns that could be adopted by the retail enterprise.The authors analyzed historical data to identify the associated patterns from transactional data.From the study, they found relationships between sports items purchased and the best ways of arranging items, either side by side or in the same retail area, so that the items were frequently purchased together to yield high sales.In this study, AR was used for mining relationships between items purchased.
AR was applied in [11] to medical data containing combinations of categorical and numerical attributes to discover useful rules and from this experiment, useful and concise associations were discovered for prediction purposes.Ordonez [17] used AR to predict the level of contraction in four arteries and risk factors.The experiment predicted accurate profiles of patients with localized heart problems, specific risk factors and the level of disease in one artery.
ANN have been used in the past to search for patterns and predict future sales [18].In research conducted in [19], the authors evaluated the predictive accuracy of ANNs and logistic regression (LR) in marketing campaigns of a Portuguese banking institution and their results showed that ANNs are more efficient and faster than LR.In [20], the researchers applied ANNs to a Pima Indians diabetes database and it generated rules with strong associations, thereby enhancing the decision-making process by doctors.In research conducted in [21], ANNs were applied for retail segmentation.The authors compared an ANN technique based on Hopfield networks against k-means and mixture model clustering algorithms.The results showed the usefulness of ANNs in retailing for segmenting markets.Many articles mentioned in [22] consider ANNs to be a promising machine learning technique.
In research conducted in [23], it was observed that the combination of data mining methods and a neural network model can greatly improve the efficiency of data mining methods.Craven and Shavlik [24] also supported ANN in data mining because of the ability to learn the target concept better than when using data mining methods.However, they presented two limitations that make ANNs poor data mining tools: excessive training times and incomprehensible learning.The proposed analytical model seeks to use AR complemented by ANNs to implement the best arrangement of shelf products, branch by branch, in order to use the cooperative result to make managerial decisions.
This research is undertaken to improve the following challenges of current sales optimization models: lack of analytics-based marketing programs, lack of business-driven analytic strategies and failure to leverage BI to become "matchmakers".To our knowledge, not enough research has presented working examples and considered non-expert users in proposing models that are user-friendly to professional managers.Sections 2.2 and 2.3 explain the building blocks of the analytic model where the processed data from different branches is entered.

Association Rules
AR mining is an unsupervised data mining method to find interesting associations in large sets of data items [25].It was originally derived from point-of-sale data that describes which products are purchased simultaneously.AR discovers interesting associations that are often used by businesses such as retail enterprises for decision-making purposes; an example could be to find out which products are frequently purchased simultaneously by different customers [26].It is one of the most common and widely used techniques in data mining, aimed at finding interesting relations [27,28] or correlations between large data items [29].AR provides decision-makers at retail enterprises with marketing insights for cross-selling by providing information about product associations [30].The most common AR algorithm used in market basket analysis is Apriori.However, the Apriori algorithm has an important drawback of generating numerous candidate item sets that must be repeatedly contrasted with the whole database [31].We are going to use two measures to quantify the interestingness of a rule: support and confidence.

Support Value
Support determines how frequently a rule is contained in a given dataset.It is defined as the fraction of transactions that contains A Y B to the total number of transactions in the database [32] and this can be expressed as shown in Equation (1): If support (AñB) is greater than or equal to the minimum support threshold (min_sup) then it is a frequent item set.An item set is frequent if support (AñB) ě min_sup().

Confidence Value
Confidence is the ratio of the number of transactions containing A and B to the number of transactions containing A, and can be further expressed as shown in Equation (2): If confidence (AñB) is greater than or equal to the minimum confidence (min_con) then we are confident about the rule generated.
Furthermore, rules that satisfy both the minimum support threshold (min_sup) and the minimum confidence threshold (min_con) are called strong AR.A rule is strong if support (AñB) ě min_sup ĉonfidence (AñB) ě min_con.These two measures are used as inputs in the ANN technique.

Artificial Neural Networks
ANNs simulate the behavior of biological systems and are used to discover patterns and relationships.They are useful for studying complex relationships between input and output variables in a system [33].The main advantage of an ANN is the ability to extract patterns and detect trends that are too complex to be noticed by other computer techniques or humans [34].In [35], the research done shows that ANNs are now commonly used to solve data mining problems because of the following advantages: robustness, self-organizing adaptiveness, parallel processing, distributed storage and a high degree of fault tolerance.The ANN sums the inputs x i against corresponding weights w i and compares the ANN output to the threshold value, a.The threshold is determined by the inputs used.
Let X be the net weighted input of the neuron, as shown in Equation (3).The decision of X is for discrete cases since it takes only certain values: where x i is the input signal, w i is the weight of input and n is the number of neurons.
If the net input is less than the threshold, the neuron output is ´1; if the net input is greater than or equal to the threshold then the neuron is activated and the output attains a +1.
Let Y be the ANN output.The decision of Y is for continuous cases, since it can take any values in the range.The actual output of the neuron with the sigmoid activation function is expressed as shown in Equation ( 4): The proposed intelligent analytics-based framework has the following benefits: reduction in risk of passing misleading results to all branches, no one point of failure, consumption of fewer resources, faster construction of distributed systems and no need for data integration.This proposed analytics-based model can be implemented using the pseudo-code presented in Table 1.Table 1 shows how ARANN generates product arrangement sets that can be used by retail enterprise managers to arrange products on shop shelves so as to attract customers to purchase more products than planned.The pseudo-code is further presented mathematically, as shown in Equations ( 5)-( 14).The proposed intelligent analytics-based framework has the following benefits: reduction in risk of passing misleading results to all branches, no one point of failure, consumption of fewer resources, faster construction of distributed systems and no need for data integration.This proposed analytics-based model can be implemented using the pseudo-code presented in Table 1.Table 1 shows how ARANN generates product arrangement sets that can be used by retail enterprise managers to arrange products on shop shelves so as to attract customers to purchase more products than planned.The pseudo-code is further presented mathematically, as shown in Equations ( 5)-( 14).Step 6: Generate N 1 by summing of the inputs with the corresponding weights and apply the output into sigmoid function

Proposed Methodology for Sustainable Business Enterprises
Step 7: Generate N 2 by summing of the inputs with the corresponding weights and apply the output into sigmoid function Step 8: Generate the summation of N 1 & N 2 after the sigmoid function and apply the output into sigmoid function to obtain Degree of Belief (DoB) Step 9: Display products pattern where DoB ě ARANN activation } }

Mathematical description for the ARANN Model
Support pSupq " n pAuBq N (5) The sup and con values feed the N 1 as the inputs and are multiplied with the corresponding weights.
The output of N 1 after the sigmoid function The sup and con values feed the N 2 as the inputs and are multiplied with the corresponding weights: The output of N 2 after the sigmoid function

Evaluation Mechanism
The purpose of model evaluation is to assess the performance of the models so as to identify the best-performing model.To test the performance of the models, three sets were used.The confusion matrix shown in Table 2 was used to represent actual values and predictions.
where a is the number of sets predicted true when they are true, b is the number of sets predicted false when they are true, c is the number of sets predicted true when they are false and d is the number of sets predicted false when they are false.Error rate is then defined as shown in Equation (15).Even weights were applied to each corresponding input to avoid bias on products.This was obtained by dividing the count of a_union_b over a number of records within the data set, where a, and b are different products.The following ARANN activation was used: >= 0.75 strongly connected products (strongly accepted) >= 0.65 moderately connected products (accepted) < 0.65 weakly connected products (rejected)

Experimental Setup
Real-life data was collected from a retail enterprise situated in South Africa with several branches nationwide.The data for the experiments was collected from only eight branches within different demographics of a developing country.The retail enterprise has database servers at each branch for

Experiment 1: Observations of ARANN with Varying Activation in Distributed Analytics
In this experiment, Equation ( 14) was used to determine the decisions to be applied to Tables 8-11 of the analytical model.This analytical model accepts product patterns defined in Equation ( 14) and uses the following ARANN activations: DoB < 60%, 60% >= DoB < 70% and DoB >= 70%.The analytical model rejects arrangement sets where the DoB is less than 60% and accepts arrangement sets between 60% and 69%, while those with a DoB greater or equal to 70% are strongly accepted.To make the decision, ARANN compares the DoB value generated with the ARANN activations and a decision is made.Managers use the decision to determine how products are to be arranged in each branch.Using ARANN activation of DoB >= 70, the following sets from Table 8 are strongly accepted: {Roll-on, Perfume => Colgate}, {Colgate => Body lotion}, {Bread => Drink} and {Bread => Sugar}; these are strongly connected products.Using ARANN activation of 60 >= DoB < 70, the following examples of sets from Table 8 are accepted: {Colgate, Body lotion => Roll on}, {Rice, Maize meal => Soup} and {Rice => Soup}; these are moderately connected products.The choice is left to every retail enterprise to adopt either moderately or strongly connected products, depending on the market competitiveness and profit levels.Note that the analytical model rejects the sets with DoB < 60 (i.e., weakly connected products), which are not included.One can see in Table 8 of branch 1 that the "strongly accepted" products at higher activation implies that some specific toiletry products are strongly connected, while bakery products and refreshments are strongly connected at this branch.Applying ARANN activation of DoB >= 70, the following "strongly accepted" set is generated; {Meat => Salt}; these are strongly connected products.When ARANN activation of 60 >= DoB < 70 is used, the following examples of sets are accepted in Table 9: {Meat, Salt => Cooking oil}, {Bread, Rice => Eggs} and {Bread => Eggs}; these are moderately connected products.It is up to the retail enterprise's decision-makers to adopt either moderately or strongly connected products, depending on the market competitiveness and profit levels.On the other side, the analytical model rejects the sets with DoB < 60 (i.e., weakly connected products), which are not included.It can be seen in Table 9 of branch 2 that the "strongly accepted" products at higher activation implies that some specific meat products are strongly connected with salt products at this branch.When ARANN activation of DoB >= 70 is applied, the following "strongly accepted" sets from Table 10 are generated: {Fish => Canned soup}, {Bread => Chocolate milk} and {Canned soup => Bread}; these products are strongly connected.Using ARANN activation of 60 >= DoB < 70, the following are examples of "accepted" sets that are generated in Table 10: {Fish, Canned soup => Wine}, {Tea, Cookies => Peanuts} and {Wine => Beer}; these are moderately connected products.Every retail enterprise is left with the choice to adopt either moderately or strongly connected products, depending on the market competitiveness and profit levels.Note that the analytical model rejects the sets with DoB < 60 (i.e., weakly connected products), which are not included.In Table 10 of branch 3, one can see that the "accepted" product sets at moderate activation implies that some specific beverages are moderately connected at this branch.In Table 11 the following "strongly accepted" sets were generated using ARANN activation of DoB >= 70: {Bread => Chocolate milk} and {Fish => Canned soup}, which are strongly connected products.Using ARANN activation of 60 >= DoB < 70, the following example of sets from Table 11 were accepted: {Fish, Canned soup => Wine}, {Orange juice => Bread} and {Tea, Bread => Orange juice}, which are moderately connected products.The decision-makers of every retail enterprise are left with the choice to adopt either moderately or strongly connected products, depending on the market competitiveness and profit levels.Note that the analytical model rejects the sets with DoB < 60 (i.e., weakly connected products), which are not included.In Table 11 of branch 4 one can see that the "strongly accepted" product sets at higher activation implies that some specific bakery products are strongly connected with dairy products at this branch.

Experiment 2: Performance Evaluations of ARANN in Comparison with Classical Methods
Table 12 shows the error rate of the individual AR and ANN techniques against the analytical model.Equation ( 15) is used to determine the error rate of each technique.The column "No. of patterns" indicates the number of sets evaluated.The column "Correctly classified sets" is composed of sets the analytical model predicted as true when they were actually true (a) and sets predicted as false when they were actually false (d), as shown in Table 2.The column "Incorrectly classified sets" is composed of sets the analytical model predicted as false when they were actually true (b) and sets predicted as true when they were false (c).Randomly generated sets were used to evaluate the performance of the three models.For example, in Branch 1 (real life), 10 rules where used in AR: five rules were predicted as true when they were actually true (a); two were predicted as false when actually false (d); three were predicted as true when actually false (c) and 0 were predicted as false when actually true (b).From the results displayed in Table 12, it is clear that the analytical model (ARANN) has a lower error rate compared to the individual classical methods.This research compares the performance of the analytical model in a distributed retail enterprise with a centralized retail enterprise.In the distributed retail enterprise, a computer was used to represent a branch and the time taken by the analytical model to generate arrangement patterns was observed.Figure 7a shows raw integration time.Figure 7b shows the time of response (ToR) taken by the analytical model to integrate a number of records from various workstations.Figure 7c shows the ToR taken by the analytical model to generate patterns in distributed and centralized retail enterprises.From the experiment conducted, it was observed that the analytical model performs faster in distributed retail enterprises than in centralized retail enterprises, as shown in Figure 7c.The analytical model takes more time to generate patterns in a centralized retail enterprise than in a distributed retail enterprise.The ToR to integrate data depends on the number of records being integrated.The more records, the more time is needed to integrate those records.This was observed in Figure 7b.In addition, the performance time taken by the analytical model depends on the size of the data set being used.The analytical model's performance is affected by the size of the data set, as shown in Figure 7d.

Conclusions
In this paper, a sustainable model was proposed that can be used in distributed retail enterprises in an ever-changing economic environment to address the current laxity through the best arrangement of shelf products branch by branch.It can intelligently assist distributed retail enterprise management to arrange products optimally on shelves of shops so that customers will purchase more products than planned, in order to achieve an optimal profit level.The analytical model takes branch data and processes the data to determine the best ways of arranging items on the shelves of a retail enterprise branch by branch.It is built on AR, complemented by ANN.
The proposed analytical model for sustainable business in distributed retail enterprises was developed.A logical demonstration of working scenarios and experiments of the proposed analytical model for management practices in distributed retail enterprises was presented.This was done by inputting support and confidence values from the AR technique into the ANN technique in order to get DoB values of the analytical model.The analytical model accepts product patterns with a DoB greater than or equal to ARANN activation.
In the proposed analytical model performance evaluation experiment, ARANN proved to be better than the classical methods because of its lower error rate, implying improved confidence in the decision-making process in a competitive environment.To get the best results, the weights of the neurons need to be determined appropriately and the quality of data needs to be improved.The DoB values of the analytical model can sometimes be affected by the weights used.
It was observed that sets generated in a distributed retail enterprise portray the real purchasing habits of customers per branch better than in a centralized retail enterprise.In this research, real life datasets from eight branches of a retail enterprise and public datasets were used to conduct the experiments.
Observations of our distributed BI analytics model are: the proposed model retains complete control of product pattern generation, arrangement sets generated by the analytical model show a lower error rate (Table 12), they reveal the real buying habits of each branch, the model reduces the risk of passing misleading results to all branches (Tables 8-11) and the software runs a single process; there is no need for data integration (Figure 3).In addition, the ARANN incorporates the strengths of the AR and ANN models, improves generation of product arrangement sets, has the ability to discover complex nonlinear associations discreetly among different products, effects a reduction in poor data quality problems and losses, as well as an improvement in the effectiveness of current product sales optimization models.Since sustainability in this context generally requires the ability of a business to sustain itself in times of crisis, similar to competitive markets, ARANN has been specifically designed for sustainable distributed and centralized retail enterprises.
In future, we wish to; (i) improve on ARANN performance by considering nature-inspired algorithms; (ii) investigate a standard method of selecting the threshold; and (iii) integrate a sophisticated learning algorithm into ARANN.The strategy and observations in this research are therefore good for addressing challenges in an ever-changing economic environment.

Figure 1 .
Figure 1.Impact of current sales optimization models on retail enterprises.(a) reduction in retail sales.Adapted from [6]; (b) reasons for reduction in sales.Adapted from [7].

Figure 4
Figure 4 shows a scenario of how the analytical model displays placement results in distributed branches.Transactional data from each retail branch is loaded into the ARANN model to determine the arrangement sets.

Figure 4 .
Figure 4. Intelligent Analytics-based Model for Four Branches.
" 0.70 Product pattern => 0.70 >= 0.65 Therefore it is moderately connected and is accepted.

Figure
Figure 7d shows the ToR taken by the analytical model to generate product arrangement patterns across different data sizes.

Figure 7 .
Figure 7.Comparison of the performance of ARANN in distributed and centralized retail enterprises.

Table 4 .
Market basket transactional data for branch 4 of a retail enterprise.

Table 8 .
Real-life ARANN results for branch 1.

Table 9 .
Real-life ARANN results for branch 2.

Table 10 .
Public DATA ARANN results for branch 3.

Table 11 .
Public data ARANN results for branch 4.

Table 12 .
Quantitative evaluations of the cooperative model in distributed branches.