A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems

Viktoratos, Iosif; Tsadiras, Athanasios

doi:10.3390/a15030072

Open AccessArticle

A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems

by

Iosif Viktoratos

^* and

Athanasios Tsadiras

Department of Economics, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(3), 72; https://doi.org/10.3390/a15030072

Submission received: 3 January 2022 / Revised: 11 February 2022 / Accepted: 17 February 2022 / Published: 22 February 2022

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

A domain that has gained popularity in the past few years is personalized advertisement. Researchers and developers collect user contextual attributes (e.g., location, time, history, etc.) and apply state-of-the-art algorithms to present relevant ads. A problem occurs when the user has limited or no data available and, therefore, the algorithms cannot work well. This situation is widely referred in the literature as the ‘cold-start’ case. The aim of this manuscript is to explore this problem and present a prediction approach for personalized mobile advertising systems that addresses the cold-start, and especially the frozen user case, when a user has no data at all. The approach consists of three steps: (a) identify existing datasets and use specific attributes that could be gathered from a frozen user, (b) train and test machine learning models in the existing datasets and predict click-through rate, and (c) the development phase and the usage in a system.

Keywords:

personalized advertising; mobile advertising; cold-start problem

1. Introduction

1.1. Advertisement Personalization Systems

The advertisement domain has gained huge popularity in the past few years and its revenue is estimated at billions every year. Thus, mobile applications either use personalization systems that have been created by their developers or, more often, a mediator network to display advertisements (e.g., Google). Personalized advertisement systems aim either (a) to display an advertisement or not or (b) display the right advertisement among a set of available ads, based on the user’s profile. To design and develop such systems, researchers and industries gather and utilize user context (e.g., location, activities, etc.) to determine his/her needs and then apply personalization algorithms (contextual-based advertising) [1,2]. Therefore, to achieve better context-awareness and design efficient advertisement personalization systems, researchers combine this domain with other emerging software-related domains such as artificial intelligence, semantic web, etc. [3]. Such approaches offer new opportunities for advertisers, users/potential customers, and personalized advertising mediator systems. Apart from efficiency, better customer relationships and more interactive communication between consumers and businesses can be built [4].

1.2. Cold-Start Problem

A very common issue that personalization systems suffer from is the cold-start [5]. The cold-start problem arises in two cases:

The new user—When there is a new user in the system that has not yet interacted with enough data objects (e.g., rated very few movies). Due to lack of data, the system is not able to generate an accurate model and, thus, it cannot provide adequate recommendations to him/her [6]. A special case of cold-start arises when there is no data at all available for the user (also referred as ‘frozen start/user’) [7,8]. This situation is even more difficult and most of the systems face trouble in providing suggestions [9,10,11].
The new item—When a new item is inserted into the system (e.g., a new movie) [12,13]. Because this item is not associated with any users yet, personalization models face difficulties when recommending it [14,15].

1.3. Stating the Problem, Motivation, and Objective

The special cold-start case that is described above is a very common issue in personalized advertisement systems. Often no data about the user are available. For example, a user does not give consent of his/her data due to privacy reasons, most apps do not require an account for using them, etc. Imagine a scenario where a user downloads an application, rejects giving personal info (e.g., location, cookies, etc.) and starts using it. Lots of (special cold-start/frozen user) situations like this arise and the applications face issues to display related advertisements. This is called the “frozen user problem” in the literature [8,9]. The problem is that the state-of-the-art personalization models cannot be applied directly to provide accurate suggestions. Often a popular or a random advertisement is shown.

The motivation of this work is to explore this frozen user problem and the aim is to propose and implement a novel approach for designing personalization systems that can handle the frozen user problem and decide to display or not a related advertisement to him/her. Section 2 below describes some state-of-the-art solutions to the cold-start problem in the advertisement domain and the contribution of this work. Section 3 describes the design approach in detail. The approach tries to alleviate the “frozen user problem”. Section 4 provides details about the experiments that were conducted, the validation and the implementation of the approach (concerning every step). Finally, Section 5 concludes the paper, discusses the findings, and pinpoints future directions.

2. Related Works and Contribution

2.1. Related Works

There are some state-of-the-art research papers in the literature that inspired our work. These approaches are grouped into two main categories:

Works that focus on the cold-start problem from the new item perspective (when a new advertisement is added into the system).
Works that focus on the cold-start problem from the new user viewpoint.

More details are included in the following sections below.

2.1.1. New Item

Starting with the first category, Richardson et al., 2007 used a model that takes into account user search term and is based on logistic regression, to predict the click through rate (CTR) for new advertisements [16]. Yi et al., 2016 implemented a movie cold-start recommendation method to optimize the movie similarity measure by computing the similarities among directors and actors using Item-based collaborative filtering [12]. Shah et al., 2017 in their work designed a new exploration system that was adapted to search advertising [8]. In this paper, an ϵ-greedy exploration algorithm (that takes search term and advertiser bid into account) is used to deal with the new item/advertisement problem. In the same spirit, Aggarwal et al., 2019 propose two domain adaptation techniques—SDA-DANN and SDA-Ranking—to alleviate the partner cold-start by incorporating sub-domain similarities [9]. Pan et al., 2019 propose meta-embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs. The proposed method trains an embedding generator for new ad IDs by making use of previously learned ads through gradient-based meta-learning. When a new ad comes, the trained generator initializes the embedding of its ID by feeding its contents and attributes [17]. Manchanda et al., 2020 implemented two domain adaptation approaches (interpretable anchored domain adaptation—IADA, and latent anchored domain adaptation—LADA) that leverage the similarity among the partners to transfer information from the partners with sufficient data to similar partners with insufficient data [6].

2.1.2. New User

The second category of works tries to alleviate the cold-start problem from the new user side, as it was discussed above.

To begin with, many works try to identify similarities among users and items. For example Rong et al., 2014 adopt a graph-based approach to simulate the preference propagation among users, in order to alleviate the data sparsity problem for cold-start users [18]. They propose a Monte Carlo algorithm to estimate the similarity between different users. Shen et al., 2018 propose a method that enables time-sensitive point-of-interest recommendation. Users’ semantic and spatial similarities are considered. In semantic modeling, they calculate users’ similarity scores by comparing users’ temporal hierarchical semantic trees. In spatial modeling, they use Gaussian mixture model (GMM) to compute users’ similarity scores. Finally, they combine the check-in data of the target user with those of her top-k most similar users in terms of both semantic and spatial similarities to train a personalized hidden Markov model (HMM) to predict the most probable venue category for each specified time slot [19].

Furthermore, many researchers propose to utilize external data from other sources or ask for user participation (e.g., answer some questions before using the system). Zhang et al., 2015 proposed a model for cold-start local event recommendation in social networks [20]. They get the event’s organizer existing data (previous events, location, users and their friends, etc.) and by applying Bayesian Poisson factorization they recommend related events to new users. Wang et al., 2019 get user data from other systems (transferred information from an ad platform to an online shopping domain) and apply deep learning techniques to solve the problem [21]. Herce-Zelaya et al., 2020 utilize users’ social media data to extract more info and build their profile [22]. Then, they apply classification techniques (decision trees, random forests) to recommend items. One example regarding approaches that requires user participation is that of Aharon et al., 2015. They present an algorithm (called ExcUseMe), which selects a predefined number of users that are more likely to be interested in interacting with new items [23].

Apart from the above, there are some knowledge-based approaches or hybrid approaches. For example, Viktoratos et al., 2018 implemented a hybrid approach which combines association rules, probability metrics, and user’s context to alleviate the cold start problem in rule-based context aware recommender systems [7].

2.2. Challenges, Limitations, and Contribution

Upon doing this research and studying the literature, some issues, challenges, and limitations have been identified. These are below:

Works that try to alleviate the new user cold-start problem, need at least a few user-item associations to work (e.g., user have rated some items). If there is a user with no data at all, a “frozen user”, the models face issues [24]. There is a lack of such works in the literature, although there are many works that focus on the “frozen” item problem (first category above), especially regarding the advertisement domain.
Approaches that collect data from other sources (e.g., social data) could not be applied if these data are not available, or face interoperability problems [25].
Approaches that require the user to fill-in a lot of information before using the system may discourage him/her from using it [7].
The contribution of this research can be summarized in the following points:
- Presents state-of-the-art approaches about the cold-start problem in advertisement systems and highlights challenges, limitations, and paths for future research. The special case of the ‘frozen user’ problem is discussed.
- Explores the potential of state-of-the-art personalization algorithms and models about the frozen user problem by conducting experiments on existing datasets. These algorithms have not been tested with this input and for the frozen user purpose.
- Proposes a novel predicting approach to design personalized advertisement systems that can deal with the ‘frozen user problem’. The philosophy of this approach is to use specific attributes that could be gathered from a new user (attributes associated with the application and the device–they can be called ‘user independent’), existing datasets, and machine learning methods to predict CTR for a ‘frozen user’. One big advantage is that various datasets can be employed for training and testing, and the approach can be easily applied in any system. The approach includes all the above steps (attribute and data, algorithms/model, development) and is discussed in detail below.

All the related details are described thoroughly in Section 3 below.

3. Approach Description

The proposed approach consists of the following three steps:

Identify relevant attributes and datasets.
Perform experiments and check state-of-the-art algorithms and models’ performances in order to validate the approach.
Development/Application implementation.

In detail:

Step 1.: Identify attributes and relative datasets

The first step of the method is to identify specific attributes that are “user independent” (e.g., no demographics, personal data, historical, etc.) and can be extracted automatically by application developers. Additionally, it is important to be able to use these attributes in various datasets and systems that are available. Such attributes can be:

Time (hour, time period, day)
Application or/and site category
Application name/id
Advertisement position (where the advertisement will be displayed)
Advertisement form (image, video)
Device type, size and model
Connection type
Advertisement id
Advertisement category

Developers can easily collect these attributes when the user starts interacting with an application or a website. The user does not need to fill any info. Moreover, since these attributes are not sensitive (e.g., like location, or personal info) developers do not need user consent or participation. These attributes will be the input in a personalization model.

Apart from the above, relevant datasets should be found. In more detail, datasets that contain the above attributes and the user click to an advertisement (yes or not); so as to be able to use machine learning techniques and predict the CTR based on the input (to evaluate if the state-of-the-art algorithms perform well and can provide accurate suggestions). Indeed, there are some datasets available that have been used consistently for CTR prediction and contain the attributes that were described above (e.g., Avazu which is consistently used by researchers and in many competitions [26]). Since attributes such as the user id, the location, etc. are not used, various datasets can be employed for training and testing.

Step 2.: Conduct experiments and check models’ performances

Afterward, in these datasets, machine learning techniques can be applied in order to check the performance of state-of-the-art models and validate the approach. For example, the Python programming language is consistently used for this purpose [27]. Techniques such as logistic regression, SVM, etc. or even deep learning techniques—like FLEN, DeepFM, etc.—have been tested in these datasets and provided very successful results [17,18]. The only difference is that now the training and the testing of these models shall be conducted based on the chosen attributes as input.

Step 3.: Development/application implementation

The last step after the experiments is the development (to use the selected model in real-world applications). Many applications use advertisement providers (e.g., integrate Google Ads) to display an advertisement to the user [28,29,30]. When the user launches the application, the application (the client) gets an advertisement from this network and displays it.

Following the proposed method, before getting the advertisement from the network, the client should send to the application’s server the relevant input data (time, application category and id, ad position, device, connection type, etc.). Then, the server (a) runs the prediction model based on the input data values and calculates the result, and (b) sends the result back to the client. If the response is positive the client gets and shows the advertisement from the network to the current user.

The next section describes, in detail, the validation and the implementation of the approach that has been performed.

4. Validation and Implementation

Regarding the validation and the implementation of the proposed prediction approach, various paths can be followed. About the implementation that has been done in this work, the following have been conducted in each step:

Step 1.: Identify attributes and relative datasets

To begin with, to implement step 1, two datasets have been spotted and used for performing experiments. The first dataset is the Avazu (https://www.kaggle.com/c/avazu-ctr-prediction/overview) (accessed on 22 September 2021) dataset and the second is the DIGIX (https://www.kaggle.com/louischen7/2020-digix-advertisement-ctr-prediction). (accessed on 23 September 2021) These two datasets were selected because they were suitable for testing the approach and conducting the experiments. Furthermore, both datasets are widely accepted by the scientific community and many researchers have used them to conduct experiments and test their models [26].

Beginning with the Avazu dataset, it contains nearly 1.1 M records/rows and 23 discrete attributes/columns. The attributes that were selected (based on the criteria that are described above) as input to a model from this dataset are the following:

Time (time period)
Application and site category and domain
Application name and id
Advertisement position (where the advertisement will be displayed)
Device type and model
Connection type

All the above variables are categorical, and the task was to predict the ‘click’ variable (0 or 1 possible values).

The second dataset is DIGIX, which contains over 40 M records/rows and 36 different attributes/columns. The attributes that were selected as input to a model from this dataset are the following:

Time period
Application and site category
Display form of an ad material
App level 1 category of an ad task
App level 2 category of an ad task
Application ID of an ad task
Application tag of an ad task
Application score/rating
Device name and size
Model release time
Connection type

All the above variables are categorical, and the task was to predict the ‘label’ variable (0 or 1 possible values).

Step 2.: Conduct offline experiments and check models’ performances/validation

The next step was to perform offline experiments to explore the performance of the state-of-the-art approaches when limited attributes are available. Being more specific, the objectives of these experiments are (a) to check if they perform better than traditional/baseline models when limited attributes are available, and (b) if the results with limited attributes are close enough with the results that they achieve when all the attributes are available (to verify that they perform satisfactorily).

To begin with, the following models have been examined.

Traditional/baseline models such as:
◦
GradientBoost—Gradient boosting is a machine learning technique for regression, classification and other tasks, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. Gradient boosting involves three elements: (a) A loss function to be optimized, (b) a weak learner to make predictions, and (c) an additive model to add weak learners to minimize the loss function [31].
◦
CatBoost—is an algorithm for gradient boosting on decision trees. A gradient boosting framework attempts to solve for categorical features using a permutation driven alternative compared to the classical algorithm [32].
◦
Logistic Regression—a statistical model that uses a logistic function to model a binary dependent variable. It is used in statistical software to understand the relationship between the dependent variable and one or more independent variables by estimating probabilities using a logistic regression equation [33].
◦
LightGBM—Light gradient boosting machine, is a distributed gradient boosting framework for machine learning originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks [34]
Deep learning state-of-the-art such as:
◦
ONN—Operational neural networks (ONNs), can be heterogeneous and encapsulate neurons with any set of operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. The operation-aware embedding method learns different representations for each feature when performing different operations [35].
◦
xDeepfm—eXtreme Deep Factorization Machine (xDeepFM). A compressed interaction network-CIN (which aims to generate feature interactions in an explicit fashion and at the vector-wise level) is combined with a classical Deep NN into one unified model. The xDeepFM is able to learn certain bounded-degree feature interactions explicitly and arbitrary low- and high-order feature interactions implicitly [36].
◦
IFM—Input-aware factorization machine (IFM) learns a unique input-aware factor for the same feature in different instances via a neural network [37].
◦
DCN V2—An improved version of deep and cross network (DCN), which automatically and efficiently learns bounded-degree predictive feature interaction. DCN keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model [38].
◦
FiBiNet—FiBiNET is an abbreviation for ‘feature importance and bilinear feature interaction network’. Proposes to dynamically learn the feature importance and fine-grained feature interactions. On the one hand, the FiBiNET can dynamically learn the importance of features via the squeeze-excitation network (SENET) mechanism; on the other hand, it is able to effectively learn the feature interactions via bilinear function [39].
◦
FLEN—Field-leveraged embedding network (FLEN) devises a field-wise bi-interaction pooling technique. By suitably exploiting field information, the field-wise bi-interaction pooling captures both inter-field and intra-field feature conjunctions with a small number of model parameters and an acceptable time complexity [28].
◦
Ensembler—Ensembling various models and combining the predictions of different models is a technique that is used by many researchers for improving results [5,35]. In our case, an average-based ensembler has been built. In more detail, the six state-of-the-art deep learning models that are described above are used [40]. After training the models, the average of predictions from all the models is used to make the final prediction (the average probability that a user will accept the advertisement).

Experiments have been conducted in both datasets. Regarding the first dataset (Avazu), the first 1 M rows were used. Data were shuffled randomly and 80% used for training and 20% for testing (the sample contained nearly 84% ‘0–No’ and 16% ‘1–Yes’ regarding the dependent/target variable). A technique that is followed by many researchers in this dataset when they are testing the models (also other researchers use the data without shuffling-sequentially, and the last day for testing) [28]. This approach was followed since the aim was to evaluate the models in a random dataset to test the validity of the approach. Regarding the parameters and the tuning of the models (learning rate, activation function, etc.), the default values/fine tuning that are proposed were used. Area under the curve (AUC) and log-loss were calculated for each model [41]. The AUC is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes. Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value. These metrics selected since the vast majority of researchers that perform binary classification use them.

Regarding the second dataset, the DIGIX, 1 M were randomly selected and the sample contained nearly 78% ‘No’ and 22% ‘Yes’ regarding the dependent variable (so as to be equivalent with the first sample). Once again, 80% used for training and 20% for testing. Additionally, the same tuning was followed. Table 1 and Figure 1 and Figure 2 below display the results in these two datasets. It can be noticed that state-of-the-art deep learning models clearly outperform traditional models, providing very satisfactory results. There are not any significant differences among them (DCNv2, FLEN, and FiBINet have slightly better results). It is also worth mentioning that the ensembler improved the results.

Additionally, more experiments have been conducted in these two datasets to explore further the performance of the models for the frozen start problem. Due to the fact that these two datasets can be considered as slightly imbalanced (regarding the distribution of the target variable–‘0’ is considered as the majority class since it covers almost 80% of the rows in both samples), undersampling technique has been used to conduct experiments on totally balanced datasets as well (1:1 regarding the distribution of the target variable in our cases) [42]. The undersampling technique was used in both datasets. In the context of a binary (two-class) classification problem where class 0 is the majority class and class 1 is the minority class, such as Avazu and DIGIX, undersampling techniques remove examples from the training dataset that belong to the majority class in order to better balance the class distribution, such as reducing the skew from a 1:100 to 1:2, or even a 1:1 class distribution. This technique was helpful to check the performance of the models when less records are available [37]. Once again:

Data were shuffled balanced and 80% used for training and 20% for testing.
The same tuning was followed for the models.
AUC and log-loss were calculated.

Table 2 and Figure 3 and Figure 4 below display the results for these two balanced undersampled datasets. The results show that deep learning techniques outperform traditional/baseline methods in any case. Furthermore, in the Avazu dataset the result of the ensembler is really close to the score that the algorithms achieve with all the attributes available. Following the experiments that have been conducted by other researchers, the algorithms achieve scores between 0.75–0.77 when all attributes are available [28].

Concerning some additional observations, there are no significant differences regarding the performance of the deep learning models between the standard and the balanced undersampled datasets. Once again, the deep learning models perform very satisfactorily and there are no significant differences among them (DCNv2 performs slightly better). Furthermore, it is worth mentioning that traditional techniques perform better on the undersampled datasets (although they cannot outperform the deep learning techniques). Finally, the ensembler was slightly better at any case as expected.

Step 3: Development/application implementation

The last step after the offline experiments, as it is discussed above, is the development (to use the selected model in real-world applications). The objective is to verify the technical feasibility and show a possible implementation approach (could be different implementation approaches based on the developers’ expertise). A mobile application for android devices has been implemented for this purpose. JAVA programming language was chosen for the implementation (https://www.java.com/en/) (accessed on 28 September 2021). The application displays a test advertisement (or not) to the user.

When the user launches the application, the client (JAVA in mobile) first collects and then sends to the application’s server the relevant input data (time, application category, ad position, device, connection type, etc.—all the variables that are contained in the first dataset). Then, the server (Python Django) collects these values and starts the calculation process. First, it (a) runs the model based on the input data values and calculates the result, and (b) sends the result back to the client. To build the server, Python and Django framework have been used [27]. In more detail, the ensembler was chosen as the prediction model since it had the best results. The six deep-learning models that were trained offline in the AVAZU dataset run sequentially and calculate the predicted probability based on the input. Then the average is used for the final prediction. The server returns the result/response to the client and based on the probability, the client shows (or not) a test advertisement from the network to the current user (if the probability is higher or lower than 0.5). It should be mentioned that Google Ads is used as an advertisement provider to fetch and display a test advertisement to the user [30]. The UML Activity diagram of Figure 5 displays the flow regarding the development step.

5. Conclusions, Discussion, and Future Directions

In this work, a study about the cold-start problem in mobile advertisement systems has been conducted and the issue of the ‘frozen user problem’ is discussed. A novel prediction approach to design personalized advertisement systems that can deal with this problem is proposed. The main steps are as follows:

Use device and application-related variables that could be gathered from a new user.
Train and test state-of-the-art machine learning models in existing datasets.
Implement a client-server architecture that gets user data and predicts his/her interest for the advertisement using a deep learning ensembler.

A validation and an implementation of this approach has been done. Beginning with the validation, experiments have been conducted in real world datasets with 1 million users to test the performance of deep learning models for a frozen user (when limited attributes are available). The results showed that deep learning techniques (and the ensembler model) are really close to the performance that the models achieve with all the attributes available and, therefore, they can work very well. State-of-the-art deep learning models achieve high performance even with limited input variables. About the implementation, a mobile application that retrieves advertisements from Google Ads has been implemented (based on the developers’ expertise there could different implementation approaches). The implementation that has been performed clearly demonstrated that the approach is viable and can be easily incorporated into applications.

Although the cold-start problem is a popular domain for research, in this work the special case of ‘frozen user’ is pinpointed. To the best of our knowledge, there is a gap in the literature and this manuscript opens paths for future research (about the frozen user). Furthermore, the approach that is proposed will help developers to implement efficient personalized advertisement systems when there is not enough data available. By following the above steps they will be able to implement systems that can deal with the frozen user problem. An implementation example has been made in this work, but based on the developers’ expertise (programming languages, software available, etc.) they can build their own implementations. This is very important because users often arise privacy concerns, do not connect their social media accounts, insert fake data, etc. [43,44]. Information overload and irritating advertisements can be avoided [45].

Our future plans include further exploring the frozen user problem and implementing new approaches by taking into consideration new technologies such as IOT (Internet of Things), big data, augmented reality, etc. [46]. Furthermore, an advertisement personalization system (not only for cold start users but also general) that combines knowledge from various domains (artificial intelligence, IoT, big data, knowledge-based, marketing, psychology, and augmented reality) will be implemented [47,48].

Author Contributions

Conceptualization, I.V. and A.T.; Approach, I.V. and A.T.; Implementation, I.V.; Resources, I.V.; Writing—original draft preparation, I.V.; Writing—review and editing, I.V. and A.T.; Supervision, A.T.; Project administration, I.V. and A.T.; Funding acquisition, I.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the project “Reinforcement of Postdoctoral Researchers–2nd Cycle” (MIS-5033021), implemented by the State Scholarships Foundation (ΙΚΥ). 2019-050-0503-18772.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets can be found at: https://www.kaggle.com/c/avazu-ctr-prediction/overview (accessed on 22 September 2021). https://www.kaggle.com/louischen7/2020-digix-advertisement-ctr-prediction (accessed on 23 September 2021). https://drive.google.com/file/d/173HmG2NGDP5eop19YmXGXMZ8WGgKyBDb/view?usp=sharing (1 M rows that were used in the experiments) (accessed on 24 September 2021). App can be downloaded at: https://drive.google.com/file/d/1EfSG8LbZYRCBjMMTZ4qhebgpW634Rddh/view?usp=sharing (accessed on 14 November 2021).

Acknowledgments

This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the project “Reinforcement of Postdoctoral Researchers–2nd Cycle” (MIS-5033021), implemented by the State Scholarships Foundation (ΙΚΥ). Algorithms 15 00072 i001

Conflicts of Interest

The authors declare no conflict of interest.

References

Rula, J.P.; Jun, B.; Bustamante, F.E. Mobile AD(D): Estimating mobile app session times for better ads. In Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications, Santa Fe, NM, USA, 12–13 February 2015; pp. 123–128. [Google Scholar] [CrossRef]
Faroqi, H.; Mesbah, M.; Kim, J. Behavioural advertising in the public transit network. Res. Transp. Bus. Manag. 2020, 32, 100421. [Google Scholar] [CrossRef]
Capurso, N.; Mei, B.; Song, T.; Cheng, X.; Yu, J. A survey on key fields of context awareness for mobile devices. J. Netw. Comput. Appl. 2018, 118, 44–60. [Google Scholar] [CrossRef]
Jiménez, N.; San-Martín, S. Attitude toward m-advertising and m-repurchase. Eur. Res. Manag. Bus. Econ. 2017, 23, 96–102. [Google Scholar] [CrossRef]
Yagci, M.; Gurgen, F. A ranker ensemble for multi-objective job recommendation in an item cold start setting. In Proceedings of the Recommender Systems Challenge 2017, New York, NY, USA, 27 August 2017. Part F1305. [Google Scholar] [CrossRef]
Manchanda, S.; Yadav, P.; Doan, K.; Sathiya Keerthi, S. Targeted display advertising: The case of preferential attachment. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 1868–1877. [Google Scholar]
Viktoratos, I.; Tsadiras, A.; Bassiliades, N. Combining community-based knowledge with association rule mining to alleviate the cold start problem in context-aware recommender systems. Expert Syst. Appl. 2018, 101, 78–90. [Google Scholar] [CrossRef]
Ahmed, T.; Srivastava, A. A data-centric and machine based approach towards fixing the cold start problem in web service recommendation. In Proceedings of the 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science, Bhopal, India, 1–2 March 2014. [Google Scholar] [CrossRef]
Aggarwal, K.; Yadav, P.; Keerthi, S.S. Domain adaptation in display advertising. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 178–186. [Google Scholar] [CrossRef]
Ha, I.; Oh, K.J.; Jo, G.S. Personalized advertisement system using social relationship based user modeling. Multimed. Tools Appl. 2015, 74, 8801–8819. [Google Scholar] [CrossRef]
Chen, Y.; Berkhin, P.; Li, J.; Wan, S.; Yan, T.W. Fast and Cost-Efficient Bid Estimation for Contextual Ads. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 477–478. [Google Scholar] [CrossRef] [Green Version]
Yi, P.; Yang, C.; Zhou, X.; Li, C. A movie cold-start recommendation method optimized similarity measure. In Proceedings of the 2016 16th International Symposium on Communications and Information Technologies (ISCIT), Qingdao, China, 26–28 September 2016; pp. 231–234. [Google Scholar] [CrossRef]
Embarak, O.H. Like-minded detector to solve the cold start problem. In Proceedings of the 2018 Fifth HCT Information Technology Trends (ITT), Dubai, United Arab Emirates, 28–29 November 2018; Volume 2, pp. 300–305. [Google Scholar] [CrossRef]
Shah, P.; Yang, M.; Alle, S.; Ratnaparkhi, A.; Shahshahani, B.; Chandra, R. A practical exploration system for search advertising. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; Part F1296. pp. 1625–1631. [Google Scholar] [CrossRef]
Cao, D.; Wu, X.; Zhou, Q.; Hu, Y. Alleviating the New Item Cold-Start Problem by Combining Image Similarity. In Proceedings of the 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 12–14 July 2019; pp. 589–595. [Google Scholar] [CrossRef]
Richardson, M.; Dominowska, E.; Ragno, R. Predicting clicks. In Proceedings of the 16th international conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; p. 521. [Google Scholar] [CrossRef]
Pan, F.; Li, S.; Ao, X.; Tang, P.; He, Q. Warm up cold-start advertisements: Improving CTR predictions via learning to learn ID embeddings. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 695–704. [Google Scholar] [CrossRef]
Rong, Y.; Wen, X.; Cheng, H. A Monte Carlo algorithm for cold start recommendation. In Proceedings of the WWW ‘14: Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 7–11 April 2014; pp. 327–336. [Google Scholar] [CrossRef]
Shen, T.; Chen, H.; Ku, W.S. Time-aware location sequence recommendation for cold-start mobile users. In Proceedings of the SIGSPATIAL ’18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 6–9 November 2018; pp. 484–487. [Google Scholar] [CrossRef]
Zhang, W.; Wang, J. A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation Categories and Subject Descriptors. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1455–1464. [Google Scholar]
Wang, H.; Hara, T.; Amagata, D.; Niu, H.; Kurokawa, M.; Maekawa, T.; Yonekawa, K. Preliminary investigation of alleviating user cold-start problem in e-commerce with deep cross-domain recommender system. In Proceedings of the WWW ’19: Companion Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 398–403. [Google Scholar] [CrossRef]
Herce-Zelaya, J.; Porcel, C.; Bernabé-Moreno, J.; Tejeda-Lorente, A.; Herrera-Viedma, E. New technique to alleviate the cold start problem in recommender systems using information from social media and random decision forests. Inf. Sci. 2020, 536, 156–170. [Google Scholar] [CrossRef]
Aharon, M.; Anava, O.; Avigdor-Elgrabli, N.; Drachsler-Cohen, D.; Golan, S.; Somekh, O. ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations. In Proceedings of the 9th ACM Conference on Recommender Systems, Vienna, Austria, 16–20 September 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 83–90. [Google Scholar] [CrossRef]
Son, L.H. Dealing with the new user cold-start problem in recommender systems: A comparative review. Inf. Syst. 2016, 58, 87–104. [Google Scholar] [CrossRef]
Verma, D.; Gulati, K.; Shah, R.R. Addressing the cold-start problem in outfit recommendation using visual preference modelling. In Proceedings of the 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), New Delhi, India, 24–26 September 2020. [Google Scholar] [CrossRef]
Wu, S.; Yu, F.; Yu, X.; Liu, Q.; Wang, L.; Tan, T.; Shao, J.; Huang, F. TFNet: Multi-Semantic Feature Interaction for CTR Prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1885–1888. [Google Scholar] [CrossRef]
Forcier, J.; Bissex, P.; Chun, W. Python Web Development with Django; Addison-Wesley: Boston, MA, USA, 2008. [Google Scholar]
Chen, W.; Zhan, L.; Ci, Y.; Lin, C. FLEN: Leveraging Field for Scalable CTR Prediction. 2019. Available online: http://arxiv.org/abs/1911.04690 (accessed on 15 September 2021).
Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X.; Dong, Z. DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction. arXiv 2018, arXiv:1703.04247. Available online: http://arxiv.org/abs/1804.04950 (accessed on 16 September 2021).
Zainurossalamia Za, S.; Tricahyadinata, I. An Analysis on the Use of Google AdWords to Increase E-Commerce Sales. Int. J. Soc. Sci. Manag. 2017, 4, 60. [Google Scholar] [CrossRef] [Green Version]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. Catboost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 2018, 6638–6648. [Google Scholar]
Ma, J.; Chen, X.; Lu, Y.; Zhang, K. A click-through rate prediction model and its applications to sponsored search advertising. In Proceedings of the International Conference on Cyberspace Technology (CCT 2013), Beijing, China, 23 November 2013; pp. 500–503. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 2017, 3147–3155. [Google Scholar]
Yang, Y.; Xu, B.; Shen, S.; Shen, F.; Zhao, J. Operation-aware Neural Networks for user response prediction. Neural Networks 2020, 121, 161–168. [Google Scholar] [CrossRef] [PubMed]
Lian, J.; Chen, Z.; Zhou, X.; Xie, X.; Zhang, F.; Sun, G. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018; pp. 1754–1763. [Google Scholar] [CrossRef] [Green Version]
Yu, Y.; Wang, Z.; Yuan, B. An input-aware factorization machine for sparse prediction. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1466–1472. [Google Scholar] [CrossRef] [Green Version]
Wang, R.; Shivanna, R.; Cheng, D.; Jain, S.; Lin, D.; Hong, L.; Chi, E. DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-Scale Learning to Rank Systems. Association for Computing Machinery: New York, NY, USA, 2021; Volume 1, ISBN 9781450383127. [Google Scholar]
Huang, T.; Zhang, Z.; Zhang, J. Fibinet: Combining feature importance and bilinear feature interaction for click-through rate prediction. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 169–177. [Google Scholar] [CrossRef] [Green Version]
Haider, C.M.R.; Iqbal, A.; Rahman, A.H.; Rahman, M.S. An ensemble learning based approach for impression fraud detection in mobile advertising. J. Netw. Comput. Appl. 2018, 112, 126–141. [Google Scholar] [CrossRef]
Wang, Q.; Liu, F.; Huang, P.; Xing, S.; Zhao, X. A Hierarchical Attention Model for CTR Prediction Based on User Interest. IEEE Syst. J. 2020, 14, 4015–4024. [Google Scholar] [CrossRef]
Bach, M.; Werner, A.; Palt, M. The Proposal of Undersampling Method for Learning from Imbalanced Datasets. Procedia Comput. Sci. 2019, 159, 125–134. [Google Scholar] [CrossRef]
Liu, D.; Xu, S.; Chen, L.; Wang, C. Some observations on online advertising: A new advertising system. In Proceedings of the 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS), Las Vegas, NV, USA, 28 June–1 July 2015; pp. 387–392. [Google Scholar] [CrossRef]
Chen, S. The Emerging Trend of Accurate Advertising Communication in the Era of Big Data—The Case of Programmatic, Targeted Advertising; Springer: Singapore, 2020; Volume 156, ISBN 9789811397134. [Google Scholar]
Viktoratos, I.; Tsadiras, A.; Bassiliades, N. A context-aware web-mapping system for group-targeted offers using semantic technologies. Expert Syst. Appl. 2015, 42, 4443–4459. [Google Scholar] [CrossRef]
Andronie, M.; Lăzăroiu, G.; Iatagan, M.; Hurloiu, I.; Dijmărescu, I. Sustainable Cyber-Physical Production Systems in Big Data-Driven Smart Urban Economy: A Systematic Literature Review. Sustainability 2021, 13, 751. [Google Scholar] [CrossRef]
Yang, S.; Carlson, J.R.; Chen, S. How augmented reality affects advertising effectiveness: The mediating effects of curiosity and attention toward the ad. J. Retail. Consum. Serv. 2020, 54, 102020. [Google Scholar] [CrossRef]
Nelson, A.; Neguriță, O. Big Data-driven Smart Cities. Geopolit. Hist. Int. Relat. 2020, 12, 37–43. Available online: https://www.jstor.org/stable/26939892 (accessed on 25 September 2021).

Figure 1. AUC scores for models regarding Avazu and Digix dataset.

Figure 2. Log-loss scores for models regarding Avazu and Digix dataset.

Figure 3. AUC scores for the balanced undersampled Avazu and Digix datasets.

Figure 4. Log-loss scores for the balanced undersampled Avazu and Digix datasets.

Figure 5. Implementation–development step.

Table 1. Model performances for the Avazu and Digix dataset.

	Avazu		DIGIX
Model	AUC	Log-Loss	AUC	Log-Loss
Gradient Boost	0.51	6.1	0.50	7.89
CatBoost	0.536	5.45	0.521	7.81
LightGBM	0.54	5.876	0.517	7.867
Logistic Regression	0.54	5.46	0.528	7.83
ONN	0.746	0.397	0.657	0.654
xDeepFM	0.753	0.383	0.655	0.655
IFM	0.754	0.382	0.654	0.656
DCN v2	0.755	0.38	0.654	0.655
FiBiNET	0.755	0.381	0.655	0.656
FLEN	0.755	0.381	0.658	0.654
Ensembler	0.76	0.371	0.661	0.496

Table 2. Model performances for the balanced undersampled Avazu and Digix dataset.

Undersampled (1:1)		Avazu	DIGIX
Model	AUC	Log-Loss	AUC	Log-Loss
Gradient Boost	0.6657	11.55	0.6057	13.62
CatBoost	0.684	10.90	0.619	13.231
LightGBM	0.681	10.99	0.612	13.402
Logistic Regression	0.684	10.907	0.6067	13.585
ONN	0.723	0.651	0.657	0.654
xDeepFM	0.738	0.624	0.655	0.655
IFM	0.731	0.649	0.654	0.656
DCN v2	0.743	0.612	0.654	0.655
FiBiNET	0.733	0.648	0.655	0.656
FLEN	0.739	0.614	0.658	0.654
Ensembler	0.7513	0.593	0.66	0.652

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Viktoratos, I.; Tsadiras, A. A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems. Algorithms 2022, 15, 72. https://doi.org/10.3390/a15030072

AMA Style

Viktoratos I, Tsadiras A. A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems. Algorithms. 2022; 15(3):72. https://doi.org/10.3390/a15030072

Chicago/Turabian Style

Viktoratos, Iosif, and Athanasios Tsadiras. 2022. "A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems" Algorithms 15, no. 3: 72. https://doi.org/10.3390/a15030072

APA Style

Viktoratos, I., & Tsadiras, A. (2022). A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems. Algorithms, 15(3), 72. https://doi.org/10.3390/a15030072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Approach for Solving the Frozen User Cold-Start Problem in Personalized Mobile Advertising Systems

Abstract

1. Introduction

1.1. Advertisement Personalization Systems

1.2. Cold-Start Problem

1.3. Stating the Problem, Motivation, and Objective

2. Related Works and Contribution

2.1. Related Works

2.1.1. New Item

2.1.2. New User

2.2. Challenges, Limitations, and Contribution

3. Approach Description

4. Validation and Implementation

5. Conclusions, Discussion, and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI