A Two-Stage Neural Network-Based Cold Start Item Recommender

Tsai, Chieh-Yuan; Chiu, Yi-Fan; Chen, Yu-Jen

doi:10.3390/app11094243

Open AccessArticle

A Two-Stage Neural Network-Based Cold Start Item Recommender

by

Chieh-Yuan Tsai

^*

,

Yi-Fan Chiu

and

Yu-Jen Chen

Department of Industrial Engineering and Management, Yuan-Ze University, Taoyuan City 320, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(9), 4243; https://doi.org/10.3390/app11094243

Submission received: 8 April 2021 / Revised: 27 April 2021 / Accepted: 5 May 2021 / Published: 7 May 2021

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, recommendation systems have been successfully adopted in variant online services such as e-commerce, news, and social media. The recommenders provide users a convenient and efficient way to find their exciting items and increase service providers’ revenue. However, it is found that many recommenders suffered from the cold start (CS) problem where only a small number of ratings are available for some new items. To conquer the difficulties, this research proposes a two-stage neural network-based CS item recommendation system. The proposed system includes two major components, which are the denoising autoencoder (DAE)-based CS item rating (DACR) generator and the neural network-based collaborative filtering (NNCF) predictor. In the DACR generator, a textual description of an item is used as auxiliary content information to represent the item. Then, the DAE is applied to extract the content features from high-dimensional textual vectors. With the compact content features, a CS item’s rating can be efficiently derived based on the ratings of similar non-CS items. Second, the NNCF predictor is developed to predict the ratings in the sparse user–item matrix. In the predictor, both spare binary user and item vectors are projected to dense latent vectors in the embedding layer. Next, latent vectors are fed into multilayer perceptron (MLP) layers for user–item matrix learning. Finally, appropriate item suggestions can be accurately obtained. The extensive experiments show that the DAE can significantly reduce the computational time for item similarity evaluations while keeping the original features’ characteristics. Besides, the experiments show that the proposed NNCF predictor outperforms several popular recommendation algorithms. We also demonstrate that the proposed CS item recommender can achieve up to 8% MAE improvement compared to adding no CS item rating.

Keywords:

recommendation systems; cold start problems; neural networks; collaborative filtering

1. Introduction

Nowadays, recommendation systems play a critical role in promoting sales and services in many online applications. For instance, 80 percent of movies watched on Netflix came from recommendations, and 60 percent of video clicks came from home page recommendations on YouTube [1,2]. Typically, recommendation systems can be classified into content-based (CB), collaborative filtering (CF), and hybrid approaches. Among them, CF is the most popular approach since it does not need to analyze the items’ content. Instead, it relies on the relationship between users and items, typically encoded in a rating preference matrix. Although the CF approach has been successfully applied in many domains, it suffers from a cold start (CS) problem [3]. The CS problem indicates that many users have not rated new items not to be correctly linked with similar items [4]. When little or no preference information is available, the recommendation accuracy drops significantly. To solve this problem, many researchers adopt variant auxiliary data such as text descriptions, images, or videos to derive ratings of CF items [5,6,7]. However, most of these auxiliary data are high dimension so that the time for evaluating item similarity is much longer.

In general, matrix factorization (MF) is one of the most popular methods to implement the collaborative filtering (CF) concept [8,9]. One strength of MF is that it can incorporate implicit feedback that is not directly given but can be derived by analyzing the user behavior. MF algorithms work by decomposing the user–item matrix into the product of two lower dimensionality rectangular matrices. Much research effort has been devoted to enhancing MF, such as integrating it with neighbor-based models [8], combining it with topic models of item content [10], and extending it to factorization machines for generic modeling of features [11]. Despite the effectiveness of MF for collaborative filtering, it is well-known that its performance can be hindered by the simple choice of the interaction function—inner product [12].

This research integrates neural networks and a collaborative filtering method for CS item recommendations to solve the above difficulties. The proposed recommendation system includes two components: the denoising autoencoder (DAE)-based CS item rating (DACR) generator, and the neural network-based collaborative filtering (NNCF) predictor. The DACR generator is to derive the CS item ratings from similar non-CS items using auxiliary textual information. In the generator, the DAE, a neural network-based dimension reduction method, is applied to extract content features from item vectors. With the compact content feature vector, the rating of a CS item can be derived efficiently. The NNCF predictor is designed to deal with a sparse preference prediction problem. In the NNCF predictor, one hot-encoding is applied to convert the representation of users and items into binary sparse vectors. These long sparse vectors are then projected to a dense latent vector in the embedding layer. Next, latent vectors are fed into multilayer perceptron (MLP) layers for user–item matrix learning. When the target user is specified, the trained NNCF predictor will return the ratings of all items.

The major contributions of this study are summarized below. First, we take the benefits of neural networks and a collaborative filtering approach for solving the CS item recommendation problem. To the best of our knowledge, it has been rarely studied before. Second, we apply the DAE, a neural network-based dimension reduction method, to extract the content features from vectors in the proposed DACR generator. The experiments show the proposed DACR generator can overcome the sparsity and redundancy of the high dimension vector and reduce much computational time when evaluating item similarity. Third, we perform experiments on real-world datasets and demonstrate the effectiveness of the proposed recommendation system. The remainder of this paper is organized as follows. Section 2 reviews the relevant research. Section 3 introduces the proposed CS item recommendation system by integrating deep neural networks and a collaborative filtering approach. Section 4 describes an implementation case to show the feasibility and performance of the proposed system. Section 5 presents conclusions and future work suggestions.

2. Related Work

2.1. Cold Start Problems

When the dataset is sparse, recommender systems are difficult to provide high-quality recommendations. A method to alleviate the new user cold start problem for recommender systems applying collaborative filtering was presented by [13]. They proposed a model in combination with similarity techniques and prediction mechanisms for retrieving recommendations. A novel approach for alleviating the cold start problem by imputing missing values into the input matrix was proposed by [14]. Their system combined local learning, attribute selection to optimize the recommendation process. They have evaluated the proposed framework on one synthetic and two real datasets, using four different matrix factorization algorithms. A novel solution for cross-site cold start product recommendation to recommend products from e-commerce websites to users at social networking sites in cold start situations was proposed by [15]. They used the linked users across social networking sites and e-commerce websites as a bridge to map users’ social networking features to another feature representation for product recommendation. A hybrid interactive context-aware recommender system applied to the tourism domain was proposed by [1]. The approach combined case-based reasoning and an artificial neural network to overcome the cold start problem for a new user with little prior ratings. The proposed method can suggest a tour to a user with limited knowledge about his preferences and considers the user’s preference changing during the recommendation process. A hybrid recommendation model for dealing with the cold start problem, in which item features were learned from a deep learning architecture SDAE from the descriptions of the items retrieved, then using these features to the time SVD++ model was proposed by [3]. The experiments were performed to evaluate the collaborative filtering recommendation model, calculating the RMSE with the movie dataset.

A novel crowd-enabled framework, called CrowdStart, utilizes crowds’ wisdom via crowdsourcing. The intuition behind the CrowdStart framework is based on the conventional expert systems was proposed by [16]. The knowledge of domain experts helps solve complex problems that are difficult to solve with machine-only algorithms. The experimental results show that the crowd workers provide relevant, diverse, reliable, and explainable crowd-based neighbors for the new item. The crowd-based neighbors are helpful for new item recommendations. A niche approach that applies interrelationship mining into item-based collaborative filtering (IBCF) was proposed by [17]. The proposed approach utilizes interrelationship mining to extract new binary relations between each pair of item attributes, and constructs interrelated attributes to enrich the available information on a new item. A joint personalized Markov chains (JPMC) model to address the cold-start issues for implicit feedback recommendation system was proposed by [18]. The research first utilizes user embedding to mine network neighbors. A two-level model based on Markov chains at both user level and user group level is proposed to model user preferences dynamically. Useful user selection criteria based on the items’ attributes and users’ rating history, and combine the criteria in an optimization framework for selecting users was designed by [19]. By exploiting the feedback ratings, users’ previous ratings and items’ attributes, their research then generates accurate rating predictions for the other unselected users. A user similarity detection engine (USDE) for solving the lack of initial social links for newcomers was proposed by [20]. This paper utilizes users’ smart devices enabling the USDE to extract real-world social interactions between users automatically. The USDE uses a user clustering algorithm to identify similar users based on their profiles and then provide more personalized recommendations.

2.2. Deep Learning-Based Recommendation Systems

With the development of artificial intelligence, much research showed that deep learning-based methods got a good performance in the recommendation systems. Autoencoders have been applied in CF recommenders in the last few years. A deep learning model stacked denoising autoencoder and used it in integration with probabilistic matrix factorization was adopted by [21]. To satisfy the need for relational deep learning, they proposed a probabilistic formulation for stacked denoising autoencoder and then extended it to a relational stacked denoising autoencoder model. An autoencoder framework called AutoRec for collaborative filtering was proposed by [22]. A collaborative denoising auto-encoder (CDAE) method for Top-N recommendation was presented by [23]. A deep autoencoder model trained end-to-end without any layer-wise pre-training for the rating prediction task was proposed by [24]. However, different from these works in which autoencoders are integrated into the rating prediction process, the autoencoder in our study is to extract compact content features from the high-dimensional vectors.

The convolutional neural networks (CNN) to perform the hashtag recommendation problem was proposed by [25]. They proposed a novel attention-based CNN architecture to incorporate a trigger word mechanism, including a local attention channel and a global channel. Experimental results showed that the proposed method could achieve significantly better performance than the state-of-the-art methods. A dual-net deep network model to make recommendations of images to users was designed by [26]. The network consists of two sub-networks, which map an image and user preferences into the same latent semantic space. To alleviate the platform editors’ working load by automating the manual article selection process and recommending a subset of articles that fits the human editor’s taste and interest was aimed by [27]. They proposed a dynamic attention deep model to address the problems for the editor article recommendation task. The model used character-level text modeling and convolutional neural networks to learn the representation of each article effectively.

A neural architecture called PACE to bridge collaborative filtering and semi-supervised learning for point of interest (POI) recommendation was developed by [28]. The PACE is a deep neural architecture that jointly learns the embedding of users and POIs to predict user preference over POIs and various contexts associated with users and POIs. A hybrid method called location-aware personalized news recommendation with explicit semantic analysis (LP-ESA), which recommended news using both the user’s interests and geographical contexts was proposed by [29]. They further proposed a novel method called LP-DSA to exploit recommendation-oriented deep neural networks to extract dense. Experimental results showed that LP-DSA further improves the news recommendation. The inner product with neural architecture was replaced by [10]. They presented neural network architecture to model users’ latent features and items to tackle the implicit collaborative filtering problem.

A hybrid neural recommendation model to learn the deep representations for users and items from both ratings and reviews was proposed by [30]. Three major components: a rating-based encoder, a review-based encoder, and the prediction module, are proposed. Besides, the research introduces a novel review-level attention mechanism incorporating rating-based representation as a query vector to select valuable reviews. A novel multi-criteria collaborative filtering model based on deep learning was proposed by [31]. The model obtains the users’ and items’ features and uses them as an input to the criteria ratings deep neural network, which predicts the criteria ratings. Those criteria ratings then input to the overall rating deep neural network for rating prediction. A novel hybrid probabilistic matrix factorization model, which tries to model users’ preferences from their auxiliary information and differentiate the effect of the core terms extracted from item’s comments was proposed by [32]. Mainly, two sub deep learning-based components are designed for this task. A global objective function that optimizes model parameters under a unified framework is proposed. A deep hybrid recommendation model that integrates matrix factorization with a convolutional neural network (CNN) was proposed by [33]. Furthermore, this research offers an adversarial training framework to learn the hybrid recommendation model, where a generator model is built to learn the distribution over the pairwise ranking pairs. A performance evaluation of a recommending interface (PERI) framework to automate an optimal recommending interface adjustment according to the characteristics of the user and their goals was proposed by [34]. In the framework, a deep neural network is used to predict the efficiency of a particular recommendation presented in a selected position and with a chosen degree of intensity. A session-based graph convolutional neural network (GCNN)-based product recommendation model that incorporates similarity between multiple users to produce an optimized, accurate, and intelligent recommendation system was proposed by [35]. The experiments showed that the complexity and computational time were decreased by estimating the similarity among nodes and sampling the nodes before training.

3. Methodology

Let U = {u₁, ..., u_N } be the set of N users, V = {v₁, ..., v_M} be the set of M items. The users’ feedback for the items can be represented by an N × M preference matrix R where r_uv is the preference value for item v by user u. In this study, r_uv is explicitly provided by the user in the form of an integer value (e.g., 1–5). Let U(v) = {u ∈ U | r_uv ≠ null} denote the set of users that expressed a preference for item v. An item v is defined as a cold start item (CS item) if

| U (v) | \leq ρ

where

| \cdot |

is the cardinality of a set and

ρ

is a given threshold value. Typically, text descriptions of an item can provide beneficial auxiliary information to describe an item’s characteristics. If we can find the non-CS items with similar text descriptions with a CS item, the CS item’s rating can be derived from the ratings of those similar non-CS items. Therefore, this study’s first goal is to develop a CS item rating generator based on ratings of similar non-CS items. The derived ratings of CS items are then added to the original preference matrix R and formed an updated preference matrix

R^{'} \in ℝ^{N \times M}

. Based on R^′, the second goal of this study is to develop a robust recommendation model that can deal with sparse preferences and accurately predict the user’s preference.

To fulfill the above goal, this research proposes a two-stage CS item recommender. The major components of the proposed recommender include the denoising autoencoder-based CS item rating (DACR) generator and the neural network-based collaborative filtering (NNCF) predictor. In the DACR generator, textual descriptions of items are collected and used to generate items’ content features. Through the preprocessing tasks of tokenization, stop-words removal, and stemming, a set of meaningful terms is derived from the textual description of all items. Each item is then represented as the vector format, where each entry in the vector represents the occurrence frequency of the term. Next, the DAE, a neural network-based dimension reduction method, is applied to extract compact content features from the high-dimensional vectors. Based on the compact content feature vectors, a CS item’s rating can be derived from the ratings of similar non-CS items in a more efficient way. In the second stage, the updated user-item rating matrix R^′ is used to train the NNCF predictor. First, the unique identifications of users and items are converted to vectors format through one-hot encoding. The long sparse vectors are then projected to a dense latent vector in the embedding layer. Next, latent vectors are fed into multilayer perceptron (MLP) layers for user–item matrix learning. The objective of the NNCF predictor is to minimize the loss among the predicted ratings and real ratings. When the user ID is specified, the trained NNCF predictor will return the ratings of all items. After sorting the ratings, the Top-N item suggestion will be returned to the user. Figure 1 illustrates the framework of the proposed cold start item recommendation system.

3.1. The DACR Generator

The primary tasks in the DACR generator include text preprocessing, content feature extraction, and CS item rating generation.

3.1.1. Content Information and Text Preprocessing

It is straightforward to take the textual description of an item to describe the characteristics of the item. For example, the textual description (movie plot) “The classic Shakespearean play about a murderously scheming king staged in an alternative fascist England setting” can be useful content information for representing the movie “Richard III.” Typically, the textual description of an item contains many words offering less useful information. Therefore, the text preprocessing, including tokenization, stemming, and stop-words removal, will be applied to all texts. Tokenization is the procedure of splitting a text into words, phrases, or other meaningful parts. Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes. Stop words are commonly encountered in texts without dependency on a topic (such as conjunctions, prepositions, articles, etc.). After the preprocessing, a set of meaningful terms T = {t₁, t₂, …, t_d} (also called a bag of words) is generated. Based on T, an item v_i can be displayed as v_i = < v_i_,1, v_i_,2, …, v_i_,d> where v_i_,j is the occurrence frequency of term t_j for item v_i.

3.1.2. Content Feature Extracting Using DAE

Typically, item v_i is a spare vector because the number of dimensions in the vector is large. This makes similarity evaluation between two items inefficient. In this study, the denoising autoencoder (DAE) neural network [36] is used to reduce an item’s vector space while maintaining its essential characteristics. The autoencoder can be divided into an encoding part and a decoding part. The encoding part encodes the input data to get the low dimensional representation in few layers. In contrast, the decoding part returns the low dimension vector respective to its original dimension vector.

Mathematically, the autoencoder takes an input vector

x \in {[0, 1]}^{d}

and maps it to a hidden representation

y \in {[0, 1]}^{d ’}

through a deterministic mapping

y = f_{θ} (x) = s (W x + b)

parameterized by

θ = {W, b}

.

W

is a

d \times d ’

weight matrix, b is a bias vector, and s is an activation function. The resulting latent representation y is then mapped back to a reconstructed vector

z \in {[0, 1]}^{d}

where

z = g_{θ^{’}} (y) = s (W^{’} y + b^{’})

with

θ^{’} = {W^{’}, b^{’}}

. The weight matrix

W^{'}

of the reverse mapping may optionally be constrained by

W ’ = W^{T}

, in which case, the autoencoder is said to have tied weight. Each training

x^{(i)}

is thus mapped to a corresponding

y^{(i)}

and reconstruction

z^{(i)}

. The parameters of this model are optimized by minimizing the following average reconstruction error:

θ, θ^{’} = \underset{θ, θ^{’}}{\arg \min} \frac{1}{n} \sum_{i = 1}^{n} L (x^{(i)}, z^{(i)}) = \underset{θ, θ^{’}}{\arg \min} \frac{1}{n} \sum_{i = 1}^{n} L (x^{(i)}, g_{θ ’} (f_{θ} (x^{(i)})))

(1)

where L is a loss function such as the traditional squared error

L (x, z) = {‖ x - z ‖}^{2}

. This optimization can be carried out by different kinds of methods, such as stochastic gradient descent. An alternative loss, suggested by the interpretation of x and z as either bit vectors or vectors of bit probabilities is the reconstruction cross-entropy:

L Η (x, z) = - \sum_{k = 1}^{d} [x_{k} \log z_{k} + (1 - x_{k}) \log (1 - z_{k})]

(2)

To improve the traditional autoencoders’ effectiveness, a modified autoencoder called the denoising autoencoder (DAE) is proposed [36]. Except for the encoding part and decoding part, the denoising autoencoder has a corrupted part. The first step of DAE applies stochastic mapping

\tilde{x} ~ q_{D} (\tilde{x} | x)

to randomly corrupt the input data x. The random corruption will force the partial input data to change to 0, and other values will remain. These random 0 values, which are destroyed randomly, will train the denoising autoencoder to restore these damaged data. Finally, it will calculate the error between the restored data and the original data. Figure 2 visually shows the process of denoising autoencoder. Note that

x

is the input data and will be converted to

\tilde{x}

by partial corruption. The same as the traditional autoencoder, the corrupted data

\tilde{x}

will reduce the dimension and map it to the hidden layer

y = f_{θ} (\tilde{x}) = s (W \tilde{x} + b)

. Next, it reconstructs the

z = g_{θ ’} (y) = s (W^{’} y + b^{’})

back to the original dimension. Finally, the optimal parameters of the denoising autoencoder are found by training data to minimize the average reconstruction error.

3.1.3. CS Item Rating Generation

Based on the trained DAE, item v_i will be converted to

y_{i} = < y_{i, 1}, y_{i, 2}, \dots, y_{i, d ’} >

where

y_{i, j}

is the value of compact content feature j for item v_i. To generate ratings for a CS item, the similarity between the CS and non-CS items should be evaluated first. In this study, Pearson’s correlation coefficient is used to assess the item similarity. Let

y_{i}

and

y_{j}

be the compact content feature vectors for non-CS item i and CS item j, respectively. The similarity between items i and j is defined as:

S_{i j} = \frac{\sum_{k = 1}^{d ’} (y_{i k} - {\bar{y}}_{i}) \cdot (y_{j k} - {\bar{y}}_{j})}{\sqrt{\sum_{k = 1}^{d ’} {(y_{i k} - {\bar{y}}_{i})}^{2} \cdot \sum_{k = 1}^{d ’} {(y_{j k} - {\bar{y}}_{j})}^{2}}}

(3)

where

{\bar{y}}_{i}

and

{\bar{y}}_{j}

are the mean values of vectors

y_{i}

and

y_{j}

. Next, we can derive the ratings of the CS item from their α most similar non-CS items. The predicted rating for user u on CS item j is formulated as:

{\hat{r}}_{u j} = \frac{\sum_{i \in S^{α} (u, j)} r_{u i} S_{i j}}{\sum_{i \in S^{α} (u, j)} S_{i j}}

(4)

where r_ui is the real ratings for user u on non-CS item i, S^α(u, j) denotes the set of the α most similar non-CS items to a CS item j and user u.

3.2. The NNCF Predictor

In this study, a neural network-based collaborative filtering (NNCF) predictor is proposed to predict the ratings in the updated preference matrix

R^{'} \in ℝ^{N \times M}

where N is the number of users and M is the number of items. As shown in Figure 3, the core architecture of the NNCF predictor includes an input layer, embedding layer, multilayer perceptron (MLP) layers, and output layer. The input layer is composed of two vectors

u_{i}^{o}

and

v_{j}^{o}

that are represented in one-hot encoding format after being converted from the unique identifications of user

u_{i}

and item

v_{j}

, respectively. Each user vector

u_{i}^{o}

and item vector

v_{j}^{o}

rendered as a binary sparse vector is further projected to a dense vector in the embedding layer. The transformed vector is called a latent vector with K-dimension. The user latent vector and item latent vector are fed into the MLP layers. Each layer in the MLP layers can be customized to discover the specific latent structures of user–item interactions. The final output layer is the predicted rating

\hat{r} ’_{i j}

. The prediction function of the NNCF predictor can be written as:

\hat{r} ’_{i j} = f (P^{T} u_{i}^{o}, Q^{T} v_{j}^{o} | P, Q, Θ_{f})

(5)

where

P \in ℝ^{N \times K}

and

Q \in ℝ^{M ’ \times K}

, denoting the latent factor matrix for users and items, respectively;

Θ_{f}

means the model parameters of the function f.

In the NNCF predictor framework, the MLP layers consist of at least three layers of nodes, including the input layer, single or multiple hidden layers, and output layer. Except for the input nodes, each node uses a nonlinear activation function, and each layer is fully connected to the next layer except the output layer. MLP utilizes a supervised learning technique called backpropagation for training and tries to minimize the loss with actual ratings. Let the user latent vector

p_{i}

be

P^{T} u_{i}^{o}

and item latent vector

q_{j}

be

Q^{T} v_{j}^{o}

. The MLP model under the NNCF framework can be defined as follows:

z_{1} = ϕ_{1} (p_{i}, q_{j}) = [\begin{matrix} p_{i} \\ q_{j} \end{matrix}]

(6)

ϕ_{2} (z_{1}) = f_{2} (w_{2}^{T} z_{1} + b_{2})

(7)

\dots \dots

ϕ_{L} (z_{L - 1}) = f_{L} (w_{L}^{T} z_{L - 1} + b_{L})

(8)

\hat{r} ’_{i j} = σ (h^{T} ϕ_{L} (z_{L - 1}))

(9)

where

w_{x}

,

b_{x}

,

f_{x}

, denote the weight matrix, bias vector, activation functions for the xth layer respectively;

σ

and

h

represent the activation function and edge weight of the output layer, respectively. In this study, the training is performed by minimizing the pointwise loss between

\hat{r} ’_{i j}

and

r ’_{i j}

with training data. After getting the model’s optimal parameters, the ratings for user

u_{i}

and item

v_{j}

can be predicted correctly.

4. Implementation and Experiment Results

4.1. Datasets and Data Collection

To demonstrate the feasibility and efficiency of the proposed CS item recommendation system, a real-world dataset created by the Netflix Prize is adopted. The dataset contains 100,498,277 ratings contributed from 480,189 anonymous users for 17,770 movies. The density of the dataset is 1.1778%. Each record in the dataset includes movie IDs, user IDs, users’ ratings for the movie, and rating date. The average number of ratings per user is 209. The textural descriptions (movie plots) for the movies are scratched from OMDb (Open Movie Database). Since not every movie in the Netflix dataset can be found in OMDb, the movies with missing textural descriptions are removed. Besides, the users who have rated less than 100 times are also deleted. Finally, the dataset of 49,771,100 ratings provided by 87,121 users for 12,747 movies is used in the following experiments.

4.2. Recommendation Illustration

Typically, textual descriptions contain many trivial and meaningless words. The preprocessing tasks, including tokenization, stop words removing, and stemming, are performed to generate a set of valid terms from textual descriptions. After text preprocessing, 16,444 terms are obtained. Based on the set of terms, every movie is represented as <v_i_,1, v_i_,2, …, v_i_,16444> where v_i_,j is the occurrence frequency of term j in movie i. The term vectors with 16,444 dimensions cause the sparsity problem and result in long item similarity evaluation time. Therefore, the DAE is applied to extract compact content features from high-dimensional vectors. To improve the performance of the DAE, one hidden layer is added between layers

\tilde{x}

and y, and one hidden layer is added between layers y and z. Table 1 shows the settings of topological structure and training parameters of the DAE. Besides, the dimensionality of reduced content features (the numbers of nodes in y) is suggested as 164 after a set of experiments. The effect of the DAE settings will be further discussed in Section 4.3.

For illustration purposes, we first define a movie receiving no greater than four rating (i.e., the given threshold value ρ = 4) as a CS movie. In this case, 20 items such as movie ID 549, 617, 684, 990, and 1007 are considered the CS items, while the rest of the movies are considered non-CS movies. Pearson’s correlation coefficient in Equation (3) is applied to evaluate the similarity between each pair of a CS movie and a non-CS movie. Based on the ratings of α similar non-CS items, the CS items’ ratings for each user can be generated using Equation (4). Table 2 shows the predicted ratings of the five example CS movies for all users when α is set as 30. For example, the predicted ratings for user seven and movie ID 549, 617, 684, 990, 1007 are 4, 5, 4, 3, and 4, respectively. Note that “-” in the table indicates “no predicted rating” since no similar non-CS movies can be found when applying Equation (4). The effect of α value on the recommendation result will be further discussed in Section 4.3.

Next, the NNCF predictor is built based on the updated preference matrix R′. Since one-hot encoding is applied in the NNCF predictor, the user vector and movie vector’s input dimension are 87,122 and 12,747, respectively. An entry in both user vector and movie vector is projected to latent vectors with 32 dimensions in the embedding layer. A two hidden layer structure in the MLP is applied where the Relu activation function is used. The settings of the topological structure and parameters for the NNCF predictor are shown in Table 3.

The well-trained NCCF predictor is then used to generate predicted ratings for all movies when a target user is requested. The movies with higher ratings are then feedbacked to the user. Table 4 shows that the top 50 ranked movies are suggested for User ID 6.

4.3. Parameter Analysis

In this section, a set of experiments are conducted and analyzed to show how parameter settings affect the proposed recommendation system’s performance.

4.3.1. Dimension Reduction Using the DAE

In this study, the DAE is used to extract the essential features from the textual vector. Two critical factors in the DAE, the number of nodes in the hidden layers and the number of nodes in the latent representation layer (i.e., y), are further studied, while other parameters are the same as those in Table 1. First, the number of nodes in the hidden layers is changed according to a predefined compression ratio where the compression ratio is defined as

compression ratio = \frac{number of nodes in the input layer}{number of nodes in the desired layer}

(10)

For example, if the compression ratio is 10, the number of nodes in the hidden layer will be 1644 (=16,444/10). In this experiment, the set of compression ratios (2, 2.5, 3.3, 5, 10, 12.5, 16.7, 25, 50, and 100) is conducted. Figure 4a shows the relationship between the validation loss and epoch for different compression ratios. It is observed that the lowest validation loss for most compression ratios occurred when the epoch is between 11 and 16. Besides, when the compression ratio is 10 (at epoch 12) and 16.7 (at epoch 13), the validation loss is minimum. Figure 4b shows the training time of the DAE when different compression ratios are applied. The training time surges when the compression ratio is small. Therefore, considering the computational efficiency and model accuracy, the number of nodes in the hidden layer is suggested as 987 (=16,444/16.7).

Second, the number of nodes in the latent representation layer is also changed according to a set of compression ratios. Figure 5a shows the relationship between the validation loss and epoch for compression ratios 100, 125, 166.6, 250, 500, and 1000. At the first several epochs, the validation loss is significant for high compression ratio cases. After that, the loss decreases slowly for all compression ratios. However, the computational time for evaluating item similarity in Equation (4) will be much longer if a small compression ratio is applied. Figure 5b illustrates the computational time for item similarity evaluation for several compression ratios. Note that compression ratio 1 indicates no DAE feature reduction function is applied. When no feature reduction is used, the time for item similarity evaluation is 11.49 seconds (i.e., the vector’s dimension is 16,444). Significantly, it takes only 2.87 seconds to complete the item similarity calculation if the dimension of the vector is reduced to 164 (i.e., the compression ratio is 100). Therefore, considering the computational efficiency and model accuracy, the number of nodes in the latent representation layer is suggested as 164. Besides, the experiment shows that applying DAE for feature reduction in the proposed system can significantly shorten the item similarity evaluation time while keeping the essential characteristics of the original features.

4.3.2. Rating Prediction Using the NNCF Predictor

To show the benefit of the proposed NNCF predictor, the following popular recommendation algorithms are compared:

BasiclineOnly algorithm [37] predicts the baseline estimate for a given user and an item.
KNNWithZScore algorithm [37] is a collaborative filtering algorithm taking into account each user’s z-score normalization.
KNNBaseline [37] is a collaborative filtering algorithm taking into account a baseline rating.
SVD algorithm [38] is equivalent to Probabilistic Matrix Factorization.
SVD++ algorithm [39] is an extension of SVD that takes into account implicit ratings.

In the proposed NNCF model, the topological structure in the MLP layers might affect the recommendation result’s performance. Therefore, four versions of NNCF models are studied. Table 5 shows the architecture and loss function used for each NNCF predictor. Note that the ReLu activation function is applied in each layer of the MLP layers. A dropout rate of 0.05 is used in each layer of the NNCF models to reduce interdependent learning among the neurons.

The popular MAE (mean absolute error), MSE (mean squared error), and RMSE (rooted mean squared error) metrics used in recommendation systems are adopted in this study to evaluate the performance of the proposed recommender:

MAE = \frac{1}{N^{'} \times M} \sum_{i = 1}^{N^{'}} \sum_{j = 1}^{M} | \hat{r} ’_{i j} - r ’_{i j} |

(11)

MSE = \frac{1}{N^{'} \times M} \sum_{i = 1}^{N^{'}} \sum_{j = 1}^{M} {(\hat{r} ’_{i j} - r ’_{i j})}^{2}

(12)

RMSE = \sqrt{\frac{1}{N^{'} \times M} \sum_{i = 1}^{N^{'}} \sum_{j = 1}^{M} {(\hat{r} ’_{i j} - r ’_{i j})}^{2}}

(13)

where

\hat{r} ’_{i j}

denotes the predicted rating,

r ’_{i j}

denotes the real rating, N′ means the total number of testing users, and M represents the total number of items.

Table 6 summarized the three metrics of the five popular baseline methods and the four proposed NNCF models. All the proposed NNCF models outperform the five popular baseline algorithms. Among the four NNCF models, the models with two hidden layers (NNCF₂ and NNCF₄) result in lower MAE, MSE, and RMSEs compared to the models with one hidden layer (NNCF₁ and NNCF₃). Also, the models that make a prediction using the categorical cross entropy loss function (NNCF₃ and NNCF₄) are significantly better than those that make a prediction using the mean square error loss function (NNCF₁ and NNCF₂).

Figure 6 shows the effect of changing the number of nodes (i.e., the latent vector’s dimension) in the embedding layer from 8, 16, 32, 64, to 128. For NNCF₁ and NNCF₂, their MAE values reach the lowest when the nodes are no less than 32. For NNCF₃ and NNCF₄, the lowest MAE can be found when the number of nodes is 16. This experiment shows that selecting an appropriate number of nodes in the embedding layer is critical for achieving optimal recommendation results.

4.3.3. Performance of the Proposed CS Item Recommender

As mentioned in Section 3, item v is called a cold start item (CS item) if the number of ratings for v is no greater than a given threshold value ρ. For simplification, most previous studies simply ignore the CS items or even remove them from the dataset. Unlike previous works, this study derives the CS item ratings from similar non-CS items using auxiliary textual information by the proposed DACR generator.

To show the benefits of the proposed CS item recommender, the CS items’ ratings under different threshold values are derived and tested. For the ease of understanding, DACR_ρ indicates the DACR generator is applied to the CS items that receive no greater than ρ ratings. For example, DACR₄ means that the DACR generator generates the ratings for CS items that receive no greater than four ratings in our dataset. Table 7 summarizes six DACR models and the number of CS items that the six models will deal with. Note that DACR₀ indicates no CS item rating generated using the proposed DACR generator. In addition to the number of CS items, CS item rating might be affected by the number of most similar non-CS movies, α, as shown in Equation (4). Thus, α is changed from 10, 20, 30, 40 to 50 in the following discussion.

Figure 7 illustrates the MAE for different DACR + NNCF combinations. For example, Figure 7a shows the MAE when 6 DACR models are applied with NNCF₁. When no CS item rating is added (i.e., DACR₀), the MAE of the proposed recommender is 0.6577. However, when the DACR₄ model is applied, the MAE decreases to 0.6483, a 1.45% improvement (=(0.6577 − 0.6483)/0.6483) for α = 10. Moreover, when the DACR₁₂ model is applied, the MAE decreases to 0.6062, an 8.49% improvement (=(0.6577 − 0.6062)/0.6062) for α = 50. Figure 7b–d shows similar trends and patterns. Table 8 summarizes the average MAE improvement compared to no CS item rating added when NNCF₁ to NNCF₄ are applied. Based on Figure 7 and Table 8, it is clear that when more CS items are added (from DACR₄ to DACR₁₂), lower MAE values can be found. Besides, when α increases, the MAEs for all DACR models decrease also. However, larger α makes the computation time much longer when evaluating item similarity.

4.3.4. Other Datasets

Except for the Netflix datasets, two more popular datasets are experimented with to show the performance of the proposed CS item recommender. The two datasets are Amazon All Beauty and Amazon CDs & Vinyl [40]. Table 9 shows the features of the two datasets in which they have very sparse density. Figure 8a,b shows the MAE under different DACR settings with NNCF4 model for All Beauty dataset and CDs & Vinyl dataset, respectively. Note that DACR0 means no DACR generator is applied while DACR₁ means that the DACR generator will generate the ratings for CS items that receive no greater than one rating in the dataset. Figure 8a shows All Beauty dataset reveals similar trends with Netflix datasets in which when α increases, the MAE decreases. Figure 8b, however, illustrates when α = 50, the MAE increases. It indicates that fifty similar non-CS movies (neighbors) might be too many for rating generation. Selecting an appropriate number of neighbors when conducting a DACR generator is critical for better performance.

5. Conclusions

Recommendation systems are now playing an essential role in many online applications. Companies such as Amazon, Google, and Netflix have massively applied the technique to their services by estimating their potential customers’ preferences. Although many recommendation methods have been proposed recently, most previous researches suffer from the cold start (CS) problem [41]. Only a small number of ratings are available for some items. To solve the difficulties, this research develops a two-stage CS item recommendation system. The system includes two major components, which are the DACR generator and the NNCF predictor. In the DACR generator, the textual descriptions of items are adopted as auxiliary information for generating the content features of items. Next, a neural network-based dimension reduction method, denoising autoencoder (DAE), is applied to extract content features from vectors. DAE can compress the vectors effectively while maintaining the characteristics of the original vectors.

Moreover, it can significantly reduce the computational time for item similarity evaluations. Thus, the CS items’ ratings are efficiently derived based on the ratings of similar non-CS items. In the second stage, the NNCF predictor is used to predict the ratings in the updated user-item preference matrix after adding CS items’ ratings. In the NCCF predictor, the long sparse vectors are projected to a dense latent vector in the embedding layer as latent vectors. Next, latent vectors are fed into MLP layers for user–item interaction learning. When the user ID is specified, the trained NNCF predictor will return the ratings of all items.

A set of experiments shows that the DAE can significantly reduce the computational time for item similarity evaluations while keeping the original features’ characteristics. Besides, the experiments indicate that the proposed NNCF predictor outperforms several popular baseline algorithms. Finally, we demonstrate that the proposed CS item recommender can achieve up to 8% MAE improvement compared to adding no CS item rating.

Although the proposed system is efficient in solving the CS item recommendation problem, some possible directions can be further improved. First, this study adopted the textual descriptions of items as auxiliary information to derive CS items’ ratings. It is worthwhile to apply different content information such as images and videos to derive items’ content features. Second, although the performance of vector dimension reduction done by the DAE is significant, different types of DAE can be tested. Third, overfitting might appear in the proposed NNCF recommender. Further study can try different strategies such as dropout and L1/L2 regularization to avoid the overfitting difficulties. Finally, it might be interesting to apply the proposed system to variant applications such as online music and news.

Author Contributions

Conceptualization, C.-Y.T.; Formal analysis, Y.-F.C.; Methodology, C.-Y.T.; Software, Y.-F.C.; Validation, Y.-F.C.; Visualization, Y.-J.C.; Writing–review & editing, Y.-J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Ministry of Science and Technology of Taiwan, R.O.C., No. 105-2221-E-155-024-MY3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available online at https://www.kaggle.com/netflix-inc/netflix-prize-data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bahramian, Z.; Abbaspour, R.A.; Claramunt, C. A Cold Start Context-Aware Recommender System for Tour Planning Using Artificial Neural Network and Case Based Reasoning. Mob. Inf. Syst. 2017, 2017, 1–18. [Google Scholar] [CrossRef] [Green Version]
Yoon, Y.; Fu, Y.; Joo, J. Unintended CSR Violation Caused by Online Recommendation. Sustainability 2021, 13, 4053. [Google Scholar] [CrossRef]
Wei, J.; He, J.; Chen, K.; Zhou, Y.; Tang, Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 2017, 69, 29–39. [Google Scholar] [CrossRef] [Green Version]
Bobadilla, J.; Ortega, F.; Hernando, A.; Bernal, J. A collaborative filtering approach to mitigate the new user cold start problem. Knowl. Based Syst. 2012, 26, 225–238. [Google Scholar] [CrossRef] [Green Version]
Nilashi, M.; Salahshour, M.; Ibrahim, O.; Mardani, A.; Esfahani, M.D.; Zakuan, N. A new method for collaborative filtering recommender systems: The case of yahoo! movies and tripadvisor datasets. J. Soft Comput. Decis. Support Syst. 2016, 3, 44–46. [Google Scholar]
Leyli-Abadi, M.; Labiod, L.; Nadif, M. Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, Korea, 23–26 May 2017. [Google Scholar]
Sulikowski, P.; Zdziebko, T.; Coussement, K.; Dyczkowski, K.; Kluza, K.; Sachpazidu-Wójcicka, K. Gaze and Event Tracking for Evaluation of Recommendation-Driven Purchase. Sensors 2021, 21, 1381. [Google Scholar] [CrossRef] [PubMed]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2008. [Google Scholar]
He, X.; Zhang, H.; Kan, M.-Y.; Chua, T.-S. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 17–21 July 2016. [Google Scholar]
Wang, H.; Wang, N.; Yeung, D.-Y. Collaborative Deep Learning for Recommender Systems. In Proceedings of the Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015. [Google Scholar]
Rendle, S. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 9–11 March 2010. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.-S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, 3–7 April 2017; pp. 173–182. [Google Scholar]
Lika, B.; Kolomvatsos, K.; Hadjiefthymiades, S. Facing the cold start problem in recommender systems. Expert Syst. Appl. 2014, 41, 2065–2073. [Google Scholar] [CrossRef]
Ocepek, U.; Rugelj, J.; Bosnić, Z. Improving matrix factorization recommendations for examples in cold start. Expert Syst. Appl. 2015, 42, 6784–6794. [Google Scholar] [CrossRef]
Zhao, W.X.; Li, S.; He, Y.; Chang, E.Y.; Wen, J.-R.; Li, X. Connecting Social Media to E-Commerce: Cold-Start Product Recommendation Using Microblogging Information. IEEE Trans. Knowl. Data Eng. 2015, 28, 1147–1159. [Google Scholar] [CrossRef] [Green Version]
Hong, D.-G.; Lee, Y.-C.; Lee, J.; Kim, S.-W. CrowdStart: Warming up cold-start items using crowdsourcing. Expert Syst. Appl. 2019, 138, 112813. [Google Scholar] [CrossRef]
Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 2019, 52, 1–38. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Shi, Z.; Zuo, W.; Yue, L.; Liang, S.; Li, X. Joint Personalized Markov Chains with social network embedding for cold-start recommendation. Neurocomputing 2020, 386, 208–220. [Google Scholar] [CrossRef]
Zhu, Y.; Lin, J.; He, S.; Wang, B.; Guan, Z.; Liu, H.; Cai, D. Addressing the Item Cold-Start Problem by Attribute-Driven Active Learning. IEEE Trans. Knowl. Data Eng. 2020, 32, 631–644. [Google Scholar] [CrossRef] [Green Version]
Ojagh, S.; Malek, M.R.; Saeedi, S. A Social–Aware Recommender System Based on User’s Personal Smart Devices. ISPRS Int. J. Geo-Inf. 2020, 9, 519. [Google Scholar] [CrossRef]
Wang, H.; Shi, X.; Yeung, D.-Y. Relational stacked denoising autoencoder for tag recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, 18–22 May 2015. [Google Scholar]
Wu, Y.; Dubois, C.; Zheng, A.X.; Ester, M. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 22–25 February 2016. [Google Scholar]
Kuchaiev, O.; Ginsburg, B. Training deep autoencoders for collaborative filtering. arXiv 2017, arXiv:1708.01715. [Google Scholar]
Gong, Y.; Zhang, Q. Hashtag recommendation using attention-based convolutional neural network. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016. [Google Scholar]
Lei, C.; Liu, D.; Li, W.; Zha, Z.-J.; Li, H. Comparative Deep Learning of Hybrid Representations for Image Recommendations. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–31 March 2016. [Google Scholar]
Wang, X.; Yu, L.; Ren, K.; Tao, G.; Zhang, W.; Yu, Y.; Wang, J. Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2017. [Google Scholar]
Yang, C.; Bai, L.; Zhang, C.; Yuan, Q.; Han, J. Bridging collaborative filtering and semi-supervised learning: A neural approach for poi recommendation. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2017. [Google Scholar]
Chen, C.; Meng, X.; Xu, Z.; Lukasiewicz, T. Location-Aware Personalized News Recommendation with Deep Semantic Analysis. IEEE Access 2017, 5, 1624–1638. [Google Scholar] [CrossRef]
Liu, H.; Wang, Y.; Peng, Q.; Wu, F.; Gan, L.; Pan, L.; Jiao, P. Hybrid neural recommendation with joint deep representation learning of ratings and reviews. Neurocomputing 2020, 374, 77–85. [Google Scholar] [CrossRef]
Nassar, N.; Jafar, A.; Rahhal, Y. A novel deep multi-criteria collaborative filtering model for recommendation system. Knowl. Based Syst. 2020, 187, 104811. [Google Scholar] [CrossRef]
Zhang, X.; Liu, H.; Chen, X.; Zhong, J.; Wang, D. A novel hybrid deep recommendation system to differentiate user’s preference and item’s attractiveness. Inf. Sci. 2020, 519, 306–316. [Google Scholar] [CrossRef]
Zheng, X.; Dong, D. An Adversarial Deep Hybrid Model for Text-Aware Recommendation with Convolutional Neural Networks. Appl. Sci. 2020, 10, 156. [Google Scholar] [CrossRef] [Green Version]
Sulikowski, P.; Zdziebko, T. Deep learning-enhanced framework for performance evaluation of a recommending interface with varied recommendation position and intensity based on eye-tracking equipment data processing. Electronics 2020, 9, 266. [Google Scholar] [CrossRef] [Green Version]
Shafqat, W.; Byun, Y.-C. Incorporating Similarity Measures to Optimize Graph Convolutional Neural Networks for Product Recommendation. Appl. Sci. 2021, 11, 1366. [Google Scholar] [CrossRef]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, 5–9 July 2008. [Google Scholar]
Koren, Y. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. Knowl. Discov. Data 2010, 4, 1–24. [Google Scholar] [CrossRef]
Salakhutdinov, R.; Mnih, A. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, 5–9 July 2008. [Google Scholar]
Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B. Introduction to recommender systems handbook. In Recommender Systems Handbook; Springer: Berlin/Heidelberg, Germany; pp. 1–35.
Ni, J.; Li, J.; McAuley, J. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 7 November 2019. [Google Scholar]
Zhang, Z.-P.; Kudo, Y.; Murai, T.; Ren, Y.-G. Addressing Complete New Item Cold-Start Recommendation: A Niche Item-Based Collaborative Filtering via Interrelationship Mining. Appl. Sci. 2019, 9, 1894. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The framework of the proposed cold start item recommendation system.

Figure 2. The schematic structure of the DAE.

Figure 3. The structure of the NNCF predictor.

Figure 4. Experiments for the number of nodes in the hidden layers of the DAE.

Figure 5. Experiments for the number of nodes in the latent representation layer of the DAE.

Figure 6. Effect of the number of nodes in the embedding layer of the NNCF predictors.

Figure 7. The performance of the proposed recommender under different DACR + NNCF settings.

Figure 8. The performance of the proposed recommender for two Amazon datasets under different DACR settings.

Table 1. The structure and parameter settings for the DAE.

Parameter	Value
Number of nodes for x and z	16,444
Numbers of nodes in the two hidden layers	(987, 987)
Number of nodes for y	164
Batch size	10
Loss function	cross entropy

Table 2. The predicted ratings for five example CS movies.

	549	617	684	990	1007
User ID	549	617	684	990	1007
1	-	3	4	5	4
2	4	5	4	3	4
3	4.50	4.50	-	-	4
4	3.50	-	-	-	-
…	…	…	…	…	…
87,120	4	5	-	-	4
87,121	-	5	-	-	3

Table 3. The structure and parameter settings for the NNCF predictor.

Name	Value
Number of nodes for users and movies in the input layer	(87,122, 12,474)
Number of nodes for users and movies in the embedding layer	(32, 32)
Number of hidden layers in the MLP	2
Number of nodes in the hidden layers of the MLP	(50, 10)
Batch size	200
Loss function	Mean squared error

Table 4. Top 50 recommended movies for User ID 6.

Ranking	Movie ID
1	3582
2	10,789
3	5072
…	…
49	587
50	7701

Table 5. The architecture and loss function used for each NNCF predictor.

Model	Architecture in MLP Layers	Loss Function
NNCF₁	One hidden layer: (50) nodes	mean squared error
NNCF₂	Two hidden layers: (50, 10) nodes	mean squared error
NNCF₃	One hidden layer: (50) nodes	categorical crossentropy
NNCF₄	Two hidden layers: (50, 10) nodes	categorical crossentropy

Table 6. The performance for the NNCF predictors.

Method	MAE	MSE	RMSE
BaselineOnly	0.723446	0.850494	0.922222
KNNWithZScore	0.684817	0.786650	0.886933
KNNBaseline	0.691259	0.783414	0.885107
SVD	0.650909	0.691885	0.831796
SVD++	0.641086	0.688568	0.829800
NNCF₁	0.648282	0.638057	0.798785
NNCF₂	0.639093	0.633585	0.795980
NNCF₃	0.602003	0.632619	0.754669
NNCF₄	0.598555	0.630442	0.751286

Table 7. The summary for the six DACR models.

Model	Threshold Value ρ	Number of Generated CS Items
DACR₀	0	None
DACR₄	4	20
DACR₆	6	40
DACR₈	8	73
DACR₁₀	10	145
DACR₁₂	12	285

An item v is defined as a CS item if the rating received is no greater than the given threshold value ρ.

Table 8. The average MAE improvement compared to no CS item rating added.

	10	20	30	40	50
Model	10	20	30	40	50
DACR₄	1.23%	1.31%	1.55%	1.62%	1.61%
DACR₆	1.29%	1.26%	1.46%	1.96%	1.80%
DACR₈	1.55%	2.11%	2.20%	2.63%	2.80%
DACR₁₀	1.57%	2.72%	3.51%	3.88%	4.62%
DACR₁₂	2.04%	4.54%	5.74%	7.03%	8.11%

Table 9. The main features for the two Amazon datasets.

	Amazon All Beauty	Amazon CDs & Vinyl
Features	Amazon All Beauty	Amazon CDs & Vinyl
No. of Users	324,038	1,944,316
No. of Items	32,586	434,060
No. of Ratings	371,345	4,543,369

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsai, C.-Y.; Chiu, Y.-F.; Chen, Y.-J. A Two-Stage Neural Network-Based Cold Start Item Recommender. Appl. Sci. 2021, 11, 4243. https://doi.org/10.3390/app11094243

AMA Style

Tsai C-Y, Chiu Y-F, Chen Y-J. A Two-Stage Neural Network-Based Cold Start Item Recommender. Applied Sciences. 2021; 11(9):4243. https://doi.org/10.3390/app11094243

Chicago/Turabian Style

Tsai, Chieh-Yuan, Yi-Fan Chiu, and Yu-Jen Chen. 2021. "A Two-Stage Neural Network-Based Cold Start Item Recommender" Applied Sciences 11, no. 9: 4243. https://doi.org/10.3390/app11094243

APA Style

Tsai, C.-Y., Chiu, Y.-F., & Chen, Y.-J. (2021). A Two-Stage Neural Network-Based Cold Start Item Recommender. Applied Sciences, 11(9), 4243. https://doi.org/10.3390/app11094243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Stage Neural Network-Based Cold Start Item Recommender

Abstract

1. Introduction

2. Related Work

2.1. Cold Start Problems

2.2. Deep Learning-Based Recommendation Systems

3. Methodology

3.1. The DACR Generator

3.1.1. Content Information and Text Preprocessing

3.1.2. Content Feature Extracting Using DAE

3.1.3. CS Item Rating Generation

3.2. The NNCF Predictor

4. Implementation and Experiment Results

4.1. Datasets and Data Collection

4.2. Recommendation Illustration

4.3. Parameter Analysis

4.3.1. Dimension Reduction Using the DAE

4.3.2. Rating Prediction Using the NNCF Predictor

4.3.3. Performance of the Proposed CS Item Recommender

4.3.4. Other Datasets

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI