# Adversarial Learning for Product Recommendation

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

**Business impact of recommendation.**Online retailing revenue continues to expand each year. The largest online provider of goods and services (Amazon) reported 2019 gross revenue of $280.5B, an increase of 20.4% over the previous year (https://www.marketwatch.com/investing/stock/amzn/financials). Most sizable e-commerce companies use some type of recommendation algorithm to suggest additional items to their customers. The Long Tail proposition asserts that by making consumers aware of rarely noticed products via recommendation, demand for these obscure items would increase, shifting the distribution of demand away from popular items, and potentially creating a larger market overall [2]. The goal of personalized recommendation is to produce marginal profit from each customer. These incremental sales are certainly non-trivial, accounting for approximately 35% additional revenue for Amazon, and 75% for Netflix by some estimates [3]. Operating efficiencies within a digital enterprise can also be significantly improved. Netflix saves $1B per year in cost due to churn by employing personalization and recommendation [4].

**Recommender systems.**Recommendation algorithms act as filters to distill very large amounts of data down to a select group of products personalized to match a user’s preferences. Filtering and ranking the recommendations is extremely important; marketing studies have suggested that too many choices can decrease consumer satisfaction and suppress sales [5]. These algorithms can be categorized into a few basic strategies—(1) item- or content-based (return lists of popular items with similar attributes); (2) collaborative (recommend items based on preferences or behaviors of similar users), or (3) some hybrid combination of the first two.

**Related work: Neural recommmenders.**Neural or deep learning-based recommendation systems are abundantly represented in the research literature (see reviews in References [6,7]). Deep models have the capacity to incorporate greater volumes of data of mixed types, extract features and express user-item-score statistical relationships as compared to classical techniques based on linear matrix decomposition [8]. Examples of deep algorithms application to recommendation tasks include multilayer perceptrons (MLPs) [9]; autoencoders [10]; recurrent neural networks (RNNs) [11]; graph neural networks [12]; and generative adversarial networks (GANs) [13]. These models aim to predict a user’s preference for new or unseen items from mappings relating user-item ([9,10]), item-feature ([12]) or item-item sequences ([11,13]).

**Contribution of present research.**In this work, we apply a deep conditional, coupled generative adversarial network to a new domain of application-product recommendation in an online retail setting. In the context of previous GAN research [14], and specifically in terms of recommender systems, there are several novel aspects of the model and approach advanced in this research. These include:

- Mapping: Direct modeling of the joint distribution between product views and buys for a user segment;
- Data structure & semantics: Inputs to the trained generative model are (1) user segment and (2) noise vectors; the outputs are matrices of coupled (view, buy) predictions;
- Coverage: Complete, large-scale product catalogs are represented in each generated distribution;
- Data compression: Application of a linear encoding algorithm to very high-dimensional data vectors, enabling computation and ultimate decoding to product space;
- Commercial focus on transaction (versus rating) for recommended products by design.

## 2. Methods

#### 2.1. Background-Generative Adversarial Networks

_{data}(

**x**) [14]. A GAN consists of two functions: a generator G that converts samples

**z**from a prior distribution p

_{z}(

**z**) into candidate examples G(

**z**); and a discriminator D that looks at real samples from p

_{data}(

**x**) and those synthesized by G, and estimates the probability that a particular example is authentic, or fake. G is trained to fool D with artificial samples that appear to be from p

_{data}(

**x**). The functions G and D therefore have adversarial objectives which are described by the minimax function used to adapt their respective parameters:

**z**)) approach 0; the generator tries to make this quantity approach unity [14].

**Conditional GANs.**Additional information about the input data can be used to condition the GAN model. To learn different segments of the target distribution, an auxiliary input signal

**y**is presented to both the generator and discriminator functions. The objective function for the conditional GAN [15,16] becomes

_{data}(

**x**,

**y**).

**z**; for D, near the encoded semantics of class membership). By constraining weights in this manner, the joint distribution $p(v,b)$ between view and buy behaviors can be learned from training data.

#### 2.2. Model Architecture

_{1}, G

_{2}. Discriminators (right) are trained alternatively with these artificial arrays and real samples X

_{1}, X

_{2}, and try to discern the difference. The error in this decision is backpropagated to update weights in the generators.

#### 2.3. Data Preparation

**Electronic commerce dataset**. The adversarially trained recommender model was developed using a dataset collected from an online retailer (https://www.kaggle.com/retailrocket/ecommerce-dataset). The dataset describes website visitors and their behaviors (view, “add” or buy items available for purchase); the attributes of these items; and a graph describing the hierarchy of categories to which each item belongs. Customers have been anonymized, identifiable only by a unique session number; all items, properties and categories were similarly hashed for confidentiality reasons.

**User segmentation**. In Reference [22], models of user engagement with digital services were developed based on metrics covering three aspects of observable variables: popularity, activity and loyalty. Each of these areas and metrics suggest means for grouping users in an implicit feedback situation.

**Compressed representation**. The e-commerce dataset contained 417,053 distinct items in 1669 product categories. Data matrices covering the full-dimensional $category\times item$ space (${\mathbb{R}}^{1669\times 417,053}$) are prohibitively large for practical computation. This is exacerbated by the number of trainable parameters in the model (>257,000,000).

**V**) and Buys (

**B**) can be expressed symbolically as:

#### 2.4. Evaluation Metrics

- 1
- Specific items contained within the overlapping category sets that are both viewed and “bought”—a putative conversion rate;
- 2
- Coherence between categories in the paired $(view,buy)$ recommendations.

**Product conversion rate.**Define the conversion rate as the number of items recommended and bought, to the count of all items recommended, conditioned on the set of overlapping product categories returned by the system:

**Category similarity.**The average Jaccard similarity between recommended categories $({c}_{v},{c}_{b})$ is given by

**Training distribution statistics.**Summary statistics comparing the distributions ${V}_{x},{V}_{z}$ (Figure 2, path #1) are observed to provide qualitative information about the effectiveness of target distribution learning.

**Null hypothesis tests.**A legitimate question to ask upon analyzing the current results is this: “Are the generator realizations samples of the target joint distribution, or do they simply represent random noise?”.

#### 2.5. Recommendation Experiments

**Training.**The system was trained on the encoded data for 1100 epochs, in randomly-selected batches of 16 examples each.

**Testing.**Testing a machine learning model refers to evaluation of its output predictions obtained using out-of-sample (unseen) input data. This provides an estimate of the quality and generalization error of the model. Techniques such as cross-validation are often used to assess generalization potential. In contrast to many other learning algorithms, GANs do not have an objective function, rendering performance comparison of different models difficult [31].

**GAN predictions.**After training, the model was stimulated with a noise vector and a user segment conditioning signal, producing a series of coupled $(view,buy)$ predictions $({G}_{1}(z,y),{G}_{2}(z,y))$, as depicted in Figure 1. The discriminators ${D}_{1},{D}_{2}$ serve only to guide training, and are disabled for the inference procedure.

## 3. Results And Discussion

#### 3.1. Main Statistical Results

#### 3.2. Benchmark Comparison Results

#### 3.3. Discussion

#### 3.3.1. Comparison with Other Recommenders

#### 3.3.2. Drawbacks of Current Method

**Numerical efficiency.**A limitation of the approach to recommendation as presented here is the numerical efficiency of the decoding process. The arithmetic coding algorithm used to decode the binary data matrices after training the model involves iteration and is not easily parallelizable. The dimensionality of the full catalog of products is extremely high; decoding compute times are consequently large. This mandates offline processing before deployment.

**Ranking of recommendations.**As discussed above, there is no ranking of recommendation results in the current scheme, as the GAN produces binary valued information upon decoding. Inherent filtering is accomplished by limiting the presented results to those contained within the category intersection set $\left(\right)$ as seen in the operational definition of conversion rate (Equation (4)). This set is interpreted as representing the greatest likelihood for completing a transaction. On average over user segments, 13% of all categories are returned; of these, 0.46% of all catalog items are represented.

**Conditioning signal.**The current conditioning signal y is simply based on user dwell time. The information contained in this signal is relatively weak, as indicated by the variation of statistics across segments in Table 3. It is reasonable to anticipate more stringent filtering, and consequent precision and relevance of results, upon the introduction of more robust demographic or behavioral data in the conditioning signal input to the model. This would facilitate a more personalized recommendation experience. The model architecture considered here directly supports such segmentation, and is an important topic to be explored in extensions to this research.

#### 3.3.3. General Discussion Points

**Selection bias and scalability.**The estimation of conversion rates is difficult due to two related, key issues: training sample selection bias and data sparsity [28]. Sample selection bias refers to discrepancies in data distribution between model training and inference in conventional recommenders—that is, training data often comprise only “clicked” samples, while inference is made on all impression samples. Selection bias is said to limit the accuracy of inference assuming the user proceeds through the sequence $(impression\to click\to buy)$ [28]. As $clicked$ examples are a small fraction of $views$, a highly imbalanced training set results, biased towards sparse positive examples [36].

**An open question.**Has the true joint distribution been learned? Making inferences about the joint distribution of viewing and buying behavior to inform marketing decisions is the motivation behind this analysis. Investigators have previously shown that GANs may not adequately approximate the target distribution, as the support of the generated distribution was low due to so-called mode collapse [38], where the generator learns to mimic certain modes in the training data in order to trick the discriminator during training.

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

GAN | Generative adversarial network |

CVR | Conversion rate |

NCF | Neural collaborative filtering |

MLP | Multilayer perceptron |

RNN | Recurrent neural network |

## References

- Gilula, Z.; McCulloch, R.; Ross, P. A direct approach to data fusion. J. Mark. Res.
**2006**, XLIII, 73–83. [Google Scholar] [CrossRef] - Anderson, C. The Long Tail: Why the Future of Business Is Selling Less of More; Hyperion Press: New York, NY, USA, 2006. [Google Scholar]
- MacKenzie, I.; Meyer, C.; Noble, S. How Retailers Can Keep Up with Consumers. 2013. Available online: Https://mck.co/2fGI7Vj (accessed on 9 June 2019).
- Gomez-Uribe, C.; Hunt, N. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Manag. Inf. Syst.
**2015**, 6, 1–19. [Google Scholar] [CrossRef] [Green Version] - Iyengar, S.; Lepper, M. When Choice is Demotivating: Can One Desire Too Much of a Good Thing? J. Personal. Soc. Psychol.
**2001**, 79, 995–1006. [Google Scholar] [CrossRef] - Batmaz, Z.; Yurekli, A.I.; Bilge, A.; Kaleli, C. A review on deep learning for recommender systems: Challenges and remedies. Artif. Intell. Rev.
**2018**, 52, 1–37. [Google Scholar] [CrossRef] - Zhang, S.; Yao, L.; Sun, A. Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Comput. Surv.
**2019**, 52, 1–38. [Google Scholar] [CrossRef] [Green Version] - Brand, M. Fast online SVD revisions for lightweight recommender systems. In Proceedings of the SIAM International Conference on Data Mining, San Francisco, CA, USA, 1–3 May 2003. [Google Scholar]
- He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017. [Google Scholar]
- Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, May 2015. [Google Scholar]
- Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based Recommendations with Recurrent Neural Networks. arXiv, 2015; arXiv:1511.06939. [Google Scholar]
- Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar] [CrossRef] [Green Version]
- Yoo, J.; Ha, H.; Yi, J.; Ryu, J.; Kim, C.; Ha, J.; Kim, Y.; Yoon, S. Energy-Based Sequence GANs for Recommendation and Their Connection to Imitation Learning. arXiv, 2017; arXiv:1706.09200. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv, 2014; arXiv:1406.2661. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv, 2014; arXiv:1411.1784. [Google Scholar]
- Gauthier, J. Conditional generative adversarial nets for convolutional face generation. Cl. Proj. Stanf. CS231N Convol. Neural Netw. Vis. Recognit.
**2014**, 2014, 2. [Google Scholar] - Liu, M.Y.; Tuzel, O. Coupled Generative Adversarial Networks. In Advances in Neural Information Processing Systems 29; Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 469–477. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 31 August 2020).
- Linder-Norén, E. Keras-GAN: Keras Implementations of Generative Adversarial Networks. 2018. Available online: https://github.com/eriklindernoren/Keras-GAN (accessed on 31 August 2020).
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res.
**2014**, 15, 1929–1958. [Google Scholar] - Lehmann, J.; Lalmas, M.; Yom-Tov, E.; Dupret, G. Models of User Engagement. In UMAP’12-Proceedings of the 20th International Conference on User Modeling, Adaptation, and Personalization; Springer: Berlin/Heidelberg, Germany, 2012; pp. 164–175. [Google Scholar] [CrossRef]
- MacKay, D. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- McNee, S.M.; Riedl, J.; Konstan, J.A. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Proceedings of the CHI Extended Abstracts on Human Factors in Computing Systems, Montréal, QC, Canada, 22–27 April 2006. [Google Scholar]
- Castells, P.; Vargas, S.; Wang, J. Novelty and Diversity Metrics for Recommender Systems: Choice, Discovery and Relevance. In Proceedings of the International Workshop on Diversity in Document Retrieval (DDR-2011), Dublin, Ireland, April 2011. [Google Scholar]
- Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 426–434. [Google Scholar] [CrossRef]
- Jannach, D.; Jugovac, M. Measuring the Business Value of Recommender Systems. ACM Trans. Manag. Inf. Syst.
**2019**, 10, 1–23. [Google Scholar] [CrossRef] [Green Version] - Wen, H.; Zhang, J.; Wang, Y.; Bao, W.; Lin, Q.; Yang, K. Conversion Rate Prediction via Post-Click Behaviour Modeling. arXiv, 2019; arXiv:cs.LG/1910.07099. [Google Scholar]
- Bermeitinger, B.; Hrycej, T.; Handschuh, S. Representational Capacity of Deep Neural Networks—A Computing Study. arXiv, 2019; arXiv:1907.08475. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X.; Chen, X. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems 29; Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 2234–2242. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv, 2017; arXiv:1710.10196. [Google Scholar]
- Ogonowski, P. 15 Ecommerce Conversion Rate Statistics. 2020. Available online: https://www.growcode.com/blog/ecommerce-conversion-rate (accessed on 6 April 2020).
- Gilotte, A.; Calauzènes, C.; Nedelec, T.; Abraham, A.; Dollé, S. Offline A/B Testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining-WSDM ’18, Marina Del Rey, CA, USA, 5–9 February 2018. [Google Scholar]
- Harper, F.M.; Konstan, J. The MovieLens datasets: History and context. ACM Trans. Interact. Intell. Syst.
**2015**, 5, 1–19. [Google Scholar] [CrossRef] - Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008; pp. 263–272. [Google Scholar]
- Yang, L.; Cui, Y.; Xuan, Y.; Wang, C.; Belongie, S.; Estrin, D. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. In Proceedings of the Twelfth ACM Conference on Recommender Systems (RecSys ‘18), Vancouver, BC, Canada, 2–7 October 2018. [Google Scholar]
- Arora, S.; Zhang, Y. Do GANs actually learn the distribution? An empirical study. arXiv, 2017; arXiv:1706.08224. [Google Scholar]
- Dacrema, M.F.; Boglio, S.; Cremonesi, P.; Jannach, D. A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research. arXiv, 2019; arXiv:1911.07698. [Google Scholar]

**Figure 1.**Coupled, conditional RecommenderGAN architecture. All component networks $({D}_{1},{D}_{2},{G}_{1},{G}_{2})$ are active during training. Samples from the trained model are generated by presenting latent vector z and user segment y to the generators ${G}_{1},{G}_{2}$.

**Figure 3.**Recommended items are drawn from intersection of outputs generated by the trained GAN, indicated in blue. ${G}_{V}\left(\mathbf{z}\right|\mathbf{y})$ and ’${G}_{B}\left(\mathbf{z}\right|\mathbf{y})$ are the view and buy distributions, respectively.

**Figure 4.**Batch statistics for ${X}_{1}$ and ${G}_{1}$ versus training epoch. Similar statistic values were observed for ${X}_{2}$, ${G}_{2}$. Batch size = 16.

**Table 1.**Configuration details for discriminators (left) and generators (right). The “?” symbol indicates batch size of the associated tensor. Weights are shared between layers D:(9–14) and G:(6–13).

ID | D Layer | Output Size | ID | G Layer | Output Size |
---|---|---|---|---|---|

1 | Input (y) | (?,1) | 1 | Input (y) | (?,1) |

2 | Embedding | (?,1,500700) | 2 | Embedding | (?,1,100) |

3 | Flatten | (?,500700) | 3 | Flatten | (?,100) |

4 | Reshape | (?,1669,300,1) | 4 | Input (z) | (?,100) |

5 | Input (X) | (?,1669,300,1) | 5 | Multiply (3,4) | (?,100) |

6 | Multiply (4,5) | (?,1669,300,1) | 6,7 | Dense, ReLU | (?,128) |

7 | AvgPooling | (?,834,15,1) | 8 | BatchNorm. | (?,128) |

8 | Flatten | (?,125100) | 9 | Dropout | (?,128) |

9,10 | Dense, ReLU | (?,512) | 10,11 | Dense, ReLU | (?,256) |

11,12 | Dense, ReLU | (?,256) | 12 | BatchNorm. | (?,256) |

13,14 | Dense, ReLU | (?,64) | 13 | Dropout | (?,256) |

15 | Dense | (?,1) | 14 | Dense | (?,500700) |

15 | Reshape | (?,1669,300,1) |

Segment | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Count | n/a | 309 | 1182 | 1510 | 7590 |

y | #I | #C | CVR | CVR^{rn} | J_{c} | ${\mathit{J}}_{\mathit{c}}^{\mathit{rn}}$ |
---|---|---|---|---|---|---|

1 | 1648 | 239 | 1.763 | 0.0005 | 8.19 | 50.66 |

2 | 2037 | 213 | 1.414 | 0.0004 | 7.37 | 51.36 |

3 | 2522 | 190 | 1.323 | 0.0005 | 6.13 | 50.04 |

4 | 1419 | 222 | 1.644 | 0.0004 | 7.57 | 50.81 |

**Table 4.**Experimental conversion rate compared to selected industrial average benchmarks. GAN: current results, segment-wise average; Industry: average over 11 industrial markets; Product: average of 9 product types [33].

GAN | Industry | Product |
---|---|---|

1.536 | 2.089 | 1.827 |

Algorithm | Reference | Recommender Input | Recommender Output |
---|---|---|---|

MLP+Matrix factorization | He et al. [9] | User, item vectors | Item ratings |

Autoencoder | Sedhain et al. [10] | User, item vectors | Item ratings |

Recurrent neural network | Hidasi et al. [11] | Item sequence | Next item |

Graph neural network | Ying et al. [12] | Item/feature graph | Top items |

Sequence GAN | Yoo et al. [13] | Item sequence | Next item |

RecommenderGAN | This work | Noise, user vectors | View, buy matrices |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bock, J.R.; Maewal, A.
Adversarial Learning for Product Recommendation. *AI* **2020**, *1*, 376-388.
https://doi.org/10.3390/ai1030025

**AMA Style**

Bock JR, Maewal A.
Adversarial Learning for Product Recommendation. *AI*. 2020; 1(3):376-388.
https://doi.org/10.3390/ai1030025

**Chicago/Turabian Style**

Bock, Joel R., and Akhilesh Maewal.
2020. "Adversarial Learning for Product Recommendation" *AI* 1, no. 3: 376-388.
https://doi.org/10.3390/ai1030025