Extrapolative Collaborative Filtering Recommendation System with Word2Vec for Purchased Product for SMEs

: Many small and medium enterprises (SMEs) want to introduce recommendation services to boost sales, but they need to have sufﬁcient amounts of data to introduce these recommendation services. This study proposes an extrapolative collaborative ﬁltering (ECF) system that does not directly share data among SMEs but improves recommendation performance for small and medium-sized companies that lack data through the extrapolation of data, which can provide a magical experience to users. Previously, recommendations were made utilizing only data generated by the merchant itself, so it was impossible to recommend goods to new users. However, our ECF system provides appropriate recommendations to new users as well as existing users based on privacy-preserved payment transaction data. To accomplish this, PP2Vec using Word2Vec was developed by utilizing purchase information only, excluding personal information from payment company data. We then compared the performances of single-merchant models and multi-merchant models. For the merchants with more data than SMEs, the performance of the single-merchant model was higher, while for the SME merchants with fewer data, the multi-merchant model’s performance was higher. The ECF System proposed in this study is more suitable for the real-world business environment because it does not directly share data among companies. Our study shows that AI (artiﬁcial intelligence) technology can contribute to the sustainability and viability of economic systems by providing high-performance recommendation capability, especially for small and medium-sized enterprises and start-ups.


Introduction
A good recommendation system provides magical experiences for users via relevant and serendipitous recommendations of products or services.Users who receive appropriate recommendations become more loyal to the company, and continuous purchases lead to increased sales.Various recommendation studies have been conducted in which the recommendation system learns past patterns of customer purchases.Recently, recommendation studies based on natural language processing technology have emerged [1].Large companies such as Amazon, Alibaba, and eBay have been increasing sales via recommendation services [2,3].However, not all companies can introduce recommendation services.Small and medium-sized enterprises (SMEs) have insufficient resources in all areas, including technology, funds, and manpower, so there is a limit to their transformation for environmental change [4].In Korea, SMEs are classified by two criteria: sales and total assets according to the Framework Act on Small and Medium Enterprises.Although the standard for each industry is different, because the size of sales is different depending on the industry, the total amount of assets is the same in all industries and refers to a small company with less than 500 million dollars.Because of the lack of data and lack of AI professionals, it is not easy to develop recommendation systems directly unless the company is large.It is not easy for SMEs to introduce recommendation systems [5] because high-quality data are needed to build recommendation systems, and a lot of time and money is spent to collect such data [6].In addition, start-ups without a customer base have difficulty entering new markets.As the gap widens day by day, the focus on large companies is gradually maximizing.Therefore, we need to develop AI technology and policy support for small and medium-sized companies and start-ups.
Each small or medium-sized company also wants to increase sales by recommendation services but often does not have enough data.The more data, the better the performance.Small and medium-sized enterprises want to obtain data from other merchants and increase their recommendation performance.However, sharing raw data directly among companies is not easy because of both business concerns and legal concerns.To solve this problem, this study aims to propose a system that does not directly share data but can improve recommendation performance for small and medium-sized enterprises that lack data and technology.We expect this recommendation system to enable small and medium-sized enterprises and start-ups to pursue sustainable development.
In this study, we applied Word2Vec to purchased product (PP) data.The PP2Vec was developed by utilizing payment data written in natural language without the user's demographic information.Through the PP2Vec algorithm, SMEs lacking data will be able to provide appropriate recommendation services to users.

Recommendation System
A recommendation system is a type of information filtering (IF) technology that filters information based on user preferences.Recommendation systems are widely studied and applied in many industries because they provide helpful information to users [7,8].These recommendation systems are used in various areas, including Amazon, YouTube, Netflix, and Spotify [3,9,10].The most usual recommendation systems include collaborative filtering and content-based filtering, and collaborative filtering techniques are generally known to be effective in recommending items [11].Basically, as a type of collaborative filtering, user-based recommendations measure similarity between users, and item-based recommendation systems are based on the similarity between items [12,13].In order to utilize the recommendation system, sufficient data must be collected, and a hybrid filtering methodology using both user-based and item-based recommendation techniques has been developed [14,15].

Word2Vec-Based Recommendation System
Word2Vec, a natural language processing method, was proposed by Google in 2013 [16].It is a technique for representing words as vectors by embedding words into vector spaces.These methods show excellent performance in natural language processing and have been used in many studies, including those on recommendation systems [17,18].Item2Vec-based recommendation services are implemented by introducing Word2Vec into Item-Based CF.Item2Vec is able to analyze the relationships between items without user information [19].In [20], based on users' visit history, the next place to visit was recommended.Product2Vec [21] enhances marketing by using shopping baskets as input.Similarly, the recommendation algorithms we propose also leverage users' payment history to build PP2Vec to make appropriate recommendations.

Multi-Merchant
Cross-domain collaborative filtering has emerged as a way to solve the cold-start problem [17,22].It focuses heavily on patterns of evaluation scores learned in the auxiliary domain (secondary domain) or the transfer of late factors to the target domain.The methodology utilizes data from other domains to improve the accuracy of the target domain.The underlying assumptions of existing cross-domain studies include that data from different domains are shared, which is an unrealistic method for real-world businesses.Sharing customer data poses a risk of leakage of user privacy, so sharing customer data directly among different domains is challenging in business due to privacy concerns [20,23].
Although there has also been research on cross-domain recommendation methodologies that protect privacy [20], these are different from the methods proposed in this study by the passing of the weight of learned items to the target domain.The ECF that we propose in this study does not share the personal information of customers or the weight of learned models.These methods do not share data in business situations, making them more realistic models.

Motivation Extrapolative Collaborative Filtering System Scenario for a Merchant
Merchant S is a startup that has just opened, and most of its customers are new customers.There are not enough data about its customers yet, so there is no proper way to recommend products to customers.Merchant S has signed up for the recommendation service.One day, a new customer, A, visits Merchant S. Merchant S is able to find similar customers to A among existing Merchant S customers by using the recommendation service, and using the information at this time, it is able to recommend products to A.
SMEs have difficulty in providing recommendation services because they have a small user base and insufficient budget for developing AI models.However, the recommendation service entity should solve privacy issues to enable direct data sharing between merchants.In our system, we assume that for the exploratory collaborative filtering, the recommendation service uses only payment transaction data without user demographic data.Extrapolation is a method of estimating unknown areas beyond the data obtained from past experiments.Herein, we develop an extrapolative collaborative filtering methodology that provides recommendation services by utilizing the user behavior data present in multiple domains, such as user purchases and content viewing history, without utilizing user demographics [24].
The different merchants do not share their internal product codes, while the user's purchase and payment history (e.g., receipts) is stored in natural language for the user's convenience in a refund or account management book.We then apply Word2Vec to analyze the purchase and payment data written in natural language.The Word2Vec algorithm represents each user as a vector by utilizing user-purchase payment data and explores similar users through these vectors.We name the new Word2Vec as PP2Vec, as it learns the purchasing propensity of users.By utilizing the learned purchasing tendency, users can receive appropriate recommendations even when they visit a new merchant.
We aim to confirm whether we could provide an appropriate recommendation service in an e-commerce environment using only the minimal payment transaction data.The recommendation service entity uses only the payment data from all merchants.To verify this methodology, two recommendation situations were defined, and the accuracy of the recommendation models in each situation was compared.The single-merchant method, shown in Figure 1, is only able to know the purchase information of users of that one merchant.The multi-merchant case provides recommendation services to users by reflecting purchase information from various merchants.The recommendation system in the single-merchant environment cannot recommend products when User N first visits Merchant A. User A, who has used Merchant A, visits Merchant A again after a long time.Merchant A is able to recommend to User A, a product recently purchased by User B, who has a purchase history similar to that of User A. However, this system could not make a recommendation to User N, a first-time visitor to Merchant A, because there is no purchase history.
On the other hand, the recommendation system in the multi-merchant environment can recommend products even when User N visits Merchant A for the first time.Since User N has a purchase history with Merchant B, User C with a similar purchase history from Merchant B could be searched for.In addition, a product that User C recently purchased from Merchant A may be recommended to User N.

Method 4.1. Data Description
To evaluate the performance of ECF with PP2Vec, we utilized the purchase transaction data of four Korean merchants [25].The released data contain the four merchants' sales data and consist of details of products purchased from 2014 to 2015.However, Merchant S ('the smallest') only has data for 2015.A total of 28,592,566 purchases were used for analysis after pre-preprocessing, with 19,335 users and 4386 product types.As shown in Table 1, Merchant L1 ('the Largest') has the largest number of purchases, and the average number of purchases per user is about 719.Merchant S has the smallest number of purchases, i.e., 105,402, and the lowest number of users at 3791.Compared with the other merchants, Merchant S has 145 product types, the least.

Extrapolative Collaborative Filtering (ECF)
Semantic similarities between language items can be quantified and classified based on the distribution properties of data in large-scale linguistic data.Words appearing in the same context tend to have similar meanings and the same tendency [26].We validated the ECF system in two steps.First, we developed PP2Vec (purchased product to vector) by utilizing the users' transaction data (Figure 2).PP2Vec is composed of PP2Vec on user, which considers the purchasing tendency of users, and PP2Vec on product, which considers the patterns of products being purchased together.In PP2Vec on user, similar users are searched by considering three things: the purchased products, the purchase locations, and the purchase times.Second, we compared the recommendation results of the single-merchant and multi-merchant methods using the developed PP2Vec.

Word2Vec-Based Hybrid Collaborative Filtering (Hybrid CF)
We implemented a recommendation system by constructing a Hybrid CF model that combines user-based CF and item-based CF using the Word2Vec algorithm.PP2Vec on user operates as a user-based CF, and PP2Vec on product operates as an item-based CF.The final PP2Vec combining the two vectors operates as a hybrid CF model.This model excludes the demographic information of the user.In order to obtain the tendency vector of each user, we used the Gensim library, and the product vector, place vector, and time vector created through the Gensim library are denoted V P , V L , and V T , respectively.
The definition of PP2Vec on user is as follows: The recommendation system proceeds in two steps (Algorithm 1).First, when a user u r is input, the recommendation system searches for the most similar M I through the PP2Vec calculated previously and recommends the products that the searched user mainly purchased.Next, to reflect user u r 's recent purchase tendency, the recommendation system recommends a similar product to the product last purchased by the user u r .By combining the products recommended in these two steps, the recommendation system finally recommends the top-N products.To compare the performance of user-based CF, item-based CF, and hybrid CF combining the two, the recommended product list was extracted as Hit-Rates 3, 5, and 10 to verify the recommendation system.

Single-Merchant vs. Multi-Merchant Recommendation Comparison and Validation
Based on the Hybrid CF model implemented in the previous step, we verified the two recommendation methods: single-merchant and multi-merchant.Single-merchant recommendation systems utilize only their merchant data to train the hybrid CF models.
The multi-merchant recommendation system trains the hybrid CF model by referring to other merchants' purchase transactions.

Evaluation Metric
Herein, we evaluate the accuracy of the ECF recommendation methodology in multimerchant and single-merchant situations.We regarded the user's last purchase product as the label and chose a method to predict the label through the user's purchase record excluding the last product.To evaluate the accuracy of the proposed algorithm, we used a popular hit-rate (HR) method in the top-N recommendation methods [27].Here, n is the total number of users, and the number of hits is the products actually purchased by the user from among the recommended products.
The definition of the hit-rate is as follows: The number o f hits n (2)

Comparison of Various CF Algorithms
Table 2 and Figure 3 show the performance of the three recommendation algorithms.Overall, hybrid CF performed better than item-based CF and user-based CF.This can be seen as being consistent with prior studies where hybrid CF showed high performance.Based on these results, we performed a performance comparison using hybrid CF as the recommendation algorithm of the ECF system for single-merchant and multimerchant methods.

Single-Merchant vs. Multi-Merchant Recommendation Algorithms
A comparison of the performance of the single-merchant and multi-merchant recommendation algorithms for each merchant is shown in Table 3 and Figure 4. First, for all three examined hit-rates, the average performance of multi-merchant recommendation was higher than the average performance of single-merchant recommendation.Second, the smallest merchant, Merchant S, showed higher recommendation performance by multi-merchant recommendation algorithms than by single-merchant recommendation algorithms.However, for the larger Merchants L1 and L2, single-merchant recommendation algorithms outperformed multi-merchant recommendation algorithms overall.In the case of Merchant L1 and Merchant L2, which are rich in data, the data of smaller merchants would rather act as noise in the recommendation system, resulting in decreased performance.In the case of Merchant M and Merchant S, which have relatively insufficient data, the ECF system that includes data from other merchants could improve their performance.In particular, in the case of Merchant S, the recommendation performance increased significantly compared with Merchant M. Therefore, we can conclude that the fewer transaction data available, the greater the effect of the ECF system.

Implementation of the Recommendation System
Harex InfoTech, a payment company in Korea, provides a food ordering and delivery service, Ulsan Pedal.The Ulsan Pedal service holds transaction data relating to various merchants.As a multi-merchant system connecting SMEs, ECF system applications can provide relevant recommendations to users and provide merchants with new revenue opportunities.After configuring the data for the ECF system, the ECF system was applied to Ulsan Pedal.The data used for the simulation were 13,385 real payments made via Ulsan Pedal for two months from March 2021.The number of merchants in these data was 966, but because it is a new platform launched in March, it has few transaction data compared to the number of merchants.This model created a data set by considering only the purchased product, the purchase location, and the purchase time, excluding user demographic data.The development environment was Anaconda, python 3.7, and Docker (Table 4).We used Docker container technology, which provides virtualization technology for stable service development, deployment, and management of Ulsan Pedal.Harex In-foTech's main database sent Ulsan Pedal transaction data to the ECF recommendation system database using API.It updated the recommendation algorithm through the transaction data.The system structure for providing the recommendation service is shown in Figure 5.When the purchase history of Ulsan Pedal users was stored in Harex InfoTech's main database, Harex InfoTech transmitted the transaction data to the recommendation system server through API.The recommendation system then derived a product list.From the application of the recommendation system to a real business, the simulation results for Ulsan Pedal are shown in Table 5.The average recommendation response time of this recommendation system was 0.066 ms per case.With the multi-merchant ECF system, various merchants can be mutually recommended.Merchant products that have never been purchased can also be recommended, so it provides a magical experience to users.In addition, Ulsan Pedal is a shared platform that cooperates with various merchants and can provide benefits to merchants by lowering payment fees.The system recommendation details are at the merchant's product level, so the advertising effect is maximized.SMEs have difficulty applying recommendation systems, but the AI divide can be reduced through a shared platform in Ulsan Pedal.

Conclusions
In this study, we proposed an ECF system that can provide an appropriate recommendation service to users, especially small and medium-sized companies.To verify the proposed structure, we compared and verified the recommendation accuracy in two situations, single-merchant and multi-merchant, using an open dataset.We implemented a recommendation system using the ECF algorithm using actual payment company data.
Three models were compared: user-based CF using PP2Vec on user, item-based CF using PP2Vec on product, and hybrid CF, a method that combines the two.Through a comparative experiment, hybrid CF showed the highest performance.Thus, in this study, we finally selected hybrid CF as the algorithm for ECF systems.Finally, we compared single-merchant and multi-merchant performance using ECF.
The application of the recommendation algorithm showed that the multi-merchant recommendation algorithm has higher performance for relatively small merchants.We implemented a recommendation system for the ECF system using Ulsan Pedal data.The ECF system proposed in this study is more suitable for a real-world business environment because it does not directly share data among companies.Our study shows that the ECF system can contribute to the sustainability and viability of economic systems by providing high-performance recommendation capability to small and medium-sized enterprises and start-ups.This ECF system makes it possible for SMEs and start-ups to introduce recommendation services.In addition, the ECF system will be able to generate revenue through product recommendations relevant to customers.
The theoretical implications of this study are as follows.Through the development of the PP2Vec algorithm, companies in a data-scarce environment can make more relevant recommendations to users.In addition, even if only minimum data are shared, it is possible to make appropriate recommendations to users.As a managerial implication, for SMEs that have difficulty in developing a recommendation system, ECF systems can increase their productivity.However, there is a limitation to this study.Although the algorithm was developed by minimizing the data and excluding the users' personal information, some user data or purchase history data are shared.In future studies, we intend to develop an algorithm that can make relevant recommendations to users without sharing data by using a federated learning model.We will also study whether this algorithm works in other domains.

Figure 2 .
Figure 2. Purchased Product to Vector Architecture.

Figure 3 .
Figure 3.Comparison of Collaborative Filtering Algorithms by Hit-Rate.

Figure 4 .
Figure 4. Comparison of Single-Merchant and Multi-Merchant Algorithms.

Table 1 .
Data by Merchant Type.
Top-N Recommended Product ListCalculate Product Vector V P using Gensim(M P ) Calculate Location Vector V L using Gensim(M L ) Calculate Time Vector V T using Gensim(M T ) Calculate Purchased Product Vector V PP User =V P + V L + V T Calculate Product Vector V PP Product using Gensim(M I ) Initialize Recommended Product R While len(R) ==Top-N do For Purchased Product Vector V PP User do Calculate Cosine similarity s(u r , V PP USer ) Get Similarity User u s = Max(s(u r , V PP User )) Recommended Product R Add Only Top-N/2 of u s Purchased Product list Get u r Recent Purchased Product p r For Product Vector V PP Product do Calculate Cosine Similarity s p r , V PP Product Get Similarity Product p s = sorted s p r , V PP Product Descending Order Recommended Product R Add Only Top-N/2 of p s Algorithm 1 Hybrid Collaborative Filtering based Word2Vec Input: User ID u r User Metrics Based on Product of Purchased Product M P User Metrics Based on Location of Purchased Product M L User Metrics Based on Time of Purchased Product M T Product Metrics Based on User M I Output:

Table 2 .
Comparison of Collaborative Filtering Algorithms.

Table 3 .
Comparison of Single-Merchant and Multi-Merchant Algorithms.