Evolving Hierarchical and Tag Information via the Deeply Enhanced Weighted Non-Negative Matrix Factorization of Rating Predictions

Identifying the hidden features of items and users of a modern recommendation system, wherein features are represented as hierarchical structures, allows us to understand the association between the two entities. Moreover, when tag information that is added to items by users themselves is coupled with hierarchically structured features, the rating prediction efficiency and system personalization are improved. To this effect, we developed a novel model that acquires hidden-level hierarchical features of users and items and combines them with the tag information of items that regularizes the matrix factorization process of a basic weighted non-negative matrix factorization (WNMF) model to complete our prediction model. The idea behind the proposed approach was to deeply factorize a basic WNMF model to obtain hidden hierarchical features of user’s preferences and item characteristics that reveal a deep relationship between them by regularizing the process with tag information as an auxiliary parameter. Experiments were conducted on the MovieLens 100K dataset, and the empirical results confirmed the potential of the proposed approach and its superiority over models that use the primary features of users and items or tag information separately in the prediction process.


Introduction
Recently, with the increase in the availability of data from online content providers, delivering valuable information that gratifies and holds a consumer's interest has attracted significant attention; thus, modeling an effective recommendation system is essential. The primary objective of a recommendation system is to offer suggestions based on user preferences, which are solicited from historical data, such as ratings, reviews, and tags. Recommendations help in accelerating searches and enable users to access more pertinent content. Therefore, web service providers have extensively cogitated about developing recommender systems that analyze and harness user-item interactions to increase customer satisfaction, profits, and personalized suggestions for their services. Several modern-day internet applications have integrated recommendation systems, including Google, Netflix, eBay, and Amazon.
Recommendation systems are designed based on the type of information obtained such that the diversity of information influences their implementation and structure. To this effect, two traditional approaches exist for building recommendation systems: content-based filtering (CBF) and collaborative filtering (CF) [1,2]. The former approach generates recommendations by analyzing the availability of the user-item interaction data, which largely requires collecting explicit information [3][4][5]. For instance, content-based movie recommendations accommodate the features of a movie that match those of a user's past preferences. Thus, identifying a connection between the items and users is highly important. However, in recent years, owing to the limitations of this approach, such as privacy concerns and the dearth of supplementary information for items, web services have adopted the CF architecture in recommendation systems. The algorithms of this method utilize the items rated by a user to predict unrated items when offering recommendations and subsequently automate these predictions by acquiring user perceptions among a niche audience [2,3,[6][7][8][9][10].
Memory-and model-based techniques are commonly used to elucidate CF recommendations [3,[11][12][13][14][15]. Past studies have demonstrated the benefits of memory-based CF, wherein rating predictions are computed from the preferences of similar users via a rating matrix [12,[16][17][18][19]. Conversely, the model-based CF technique leverages a user-item rating matrix to initially build a predictive model using deep learning methods and then source the rating predictions from it [3,20]. CF-based recommendation systems are susceptible to data sparsity and the cold-start problem, which are open issues in the recommendation system research area and put the responsibility on any kind of recommendation system algorithms and methods to avoid and solve them [21]. First, fewer user interactions with items in a user-item rating matrix invokes data sparsity; specifically, the input rating matrix is not sufficient to train a model to make predictions. Thus, only 10-25% of the matrix is populated with ratings. Second, the cold-start problem arises when information about new users or items and their interactions is insufficient to garner suitable recommendations.
One of the most effective implementations of model-based CF is the matrix factorization (MF) method. This method deconstructs the user-item rating matrix into two less latent factor sub-matrices of user preferences and item characteristics, respectively, and then a vector constituting an item and a user feature is generated to predict the user's rating for an item [3,14,22,23]. Moreover, MF spontaneously integrates a mix of implicit and explicit information related to users or items. Factorization methods have since demonstrated substantial efficiency when resolving the issues of data sparsity and the cold start in recommendation systems.
In this study, we aimed to address the two aforementioned issues using hierarchical and tag information through enhanced matrix factorization to eventually improving the performance of recommender systems. Hierarchical information helps with meaningfully concealing information regarding items, such as categories of movie genres on streaming websites (e.g., Netflix and Disney+) or product catalogs on shopping websites (e.g., Amazon, Alibaba, and eBay).
Users and items of the real practical recommendation systems could exhibit certain hierarchical structures. For example, a user (girl) may usually select movies from the main category "romance," or more exactly, the user watches movies under the sub-category of romance drama. Similarly, the item (the Apple Watch Series 5) can be placed in the main category "electronics," or more specifically, the item is tantamount to the sub-category "smart watches." The classification of an item into appropriate lower-level categories or nodes is conducted sequentially. Items in the same hierarchical level are likely to share similar attributes, thus they are likely to get similar rating scores. Equivalently, users in the same hierarchical level are likely to share similar preferences, thus they are likely to rate certain items similarly [24]. For this reason, recently, evolving hierarchical structures of items or users have been developing to improve recommendation system performances. The priority of hierarchical structures and their unavailability also motivated us to research hierarchical structures of users and items for recommendation systems. During the research, evolving the hierarchical structures of items and users simultaneously and mathematically modeling them for recommendation systems were studied. Along with the above, integrating tag information with mathematically modeled hierarchical structures of items and users into a systematic model that puts a basis for a recommendation system was also investigated.
Symmetry 2020, 12, 1930 3 of 17 In contrast, tag information comprises words or short phrases assigned to items by a user that reflects their associations or behavior, and in turn, facilitates predictions by passing it as a value to the prediction algorithm. Researchers have previously reported on the benefits of making recommendations using tags and generating hierarchical information to not only improve results but also tackle issues of data sparsity and cold starts [8,13,[24][25][26][27][28]. Furthermore, to the best of our knowledge, despite the significant amount of research that has been conducted to explicate the use of matrix factorization via hierarchical and tag information individually in recommendation systems, the two have rarely been applied in a combination.
In this study, we developed a novel MF-based methodology to predict ratings by incorporating both hierarchical and tag information simultaneously. The rationale behind the proposed approach was to deeply enrich a basic MF model to obtain hierarchical relationships for predicting the ratings and then regularize it using tag information. Our main contributions using this approach included the following: • Deeply extending the basic MF model to identify hierarchical relationships that facilitate the rating predictions.

•
Regularizing the resultant model with tag information, as well as hierarchical data.

•
Reducing data sparsity and cold-start issues encountered by other CF methods.
The remainder of this paper is structured as follows. In Section 2, works pertaining to tag-based recommendation systems, generating hierarchical features, and existing MF methods are reviewed. In Sections 3 and 4, we discuss the proposed methodology in detail and validate its accuracy via experiments and comparisons with other MF methods. Section 5 presents the conclusions and scope of future work; finally, the reviewed materials are referenced, where many of which are recent publications.

Related Work
Several studies in the recent past have harnessed hierarchical and tag information as auxiliary features to address issues related to data sparsity and cold starts in recommendation systems [13,25,28,29]. CF-based recommender systems are commonly employed to predict ratings based on user histories; however, they ignore costly features, which introduce data sparsity and cold starts, which in turn hampers performance. Therefore, various studies have integrated auxiliary information in the recommendation process [30,31].
Auxiliary features often maintain a rich knowledge structure i.e., a hierarchy with dependencies. Yang et al. [13] proposed an MF-based framework with recursive regularization that analyzes the impacts of hierarchically organized features in user-item interactions to improve the recommendation accuracy and eliminate the cold-start problem. Lu et al. [32] developed a framework that exploited these hierarchical relationships to identify more reliable neighbors; moreover, the framework modeled the hierarchical structure based on potential users' preferences. The hierarchical itemspace rank (HIR) algorithm utilizes the intrinsic hierarchical structure of an itemspace to mitigate data sparsity that may affect the quality of recommendations [33].
Most modern recommender systems trawl both explicit and implicit data for useful information, including ratings, images, text (tags), social information, items, and user characteristics, to offer recommendations. We can thus infer that analyzing tag information is important in recommender systems, as they not only recap the characteristics of items but also help in identifying user preferences. For example, food recommendations are made by a model trained on a dataset comprising user preferences that are collated from ratings and tags specified in product forms to indicate their preferred food components and features [25]. Karen et al. [27] proposed a generic method that modifies CF algorithms to accommodate tags and deconstructs 3D correlations into three 2D correlations. Moreover, Wang et al. [34] formulated a novel approach that combined tags and ratings-based CF to discern similar users and items.
Our proposed methodology deviates from these methods in that the tags obtained from user-item interactions are used to regularize the MF process, whereas hierarchical information delivers the rating predictions. In summary, existing MF models that use hierarchical and tag information individually have delivered satisfactory results despite the complexity. However, to the best of our knowledge, there is no available advantageous work that seamlessly incorporates the hierarchical and tag information.

Methodology
This section is devoted to illustrating our proposed methodology that predicts rating scores by evolving hierarchical structures of items and users simultaneously with a mathematically modeled combination of tag information. Specifically, the notations that are used in this paper are first introduced, and then, a basic model that builds the basis of the proposed model is described. After that, we go into the details of the model components that mathematically model the hierarchical structures of items and users simultaneously and the integration of tag information, respectively, the combination of which leads to an optimization problem. Lastly, we come up with an efficient algorithm to solve it. Table 1 enumerates the notations used in this paper.

Basic Matrix Factorization
We modeled our approach on a basic weighted non-negative matrix factorization (WNMF) method owing to its feasible and easy implementation in recommendation systems with large inputs and sparse data. This method factorizes an input rating matrix into two non-negative sub-matrices P and Q of sizes n × r and r × m, respectively.
The rating score given by p i to q j is then computed as R (i, j) = P(i, :)Q(:, j). P and Q are evaluated by solving the following optimization problem: Symmetry 2020, 12, 1930 5 of 17 where W is the hyperparameter that regulates the contribution of R (i, j) in the learning process such that W(i, j) = 1 for R (i, j) > 0; else, W(i, j) = 0. is the Hadamard element-wise multiplication operator, λ is the regularization parameter used to moderate the complexity and overfitting during learning, and P 2 F and Q 2 F are the Frobenius norms of the corresponding matrices [27].

Acquiring the Hierarchical Structured Information
Some features of users and items are hierarchically structured. For instance, as shown in Figure 1b, the genres of movies can be organized into a hierarchical structure. It is very likely that movies that are associated with the detailed genres are more similar than those in subgenres. For this reason, it should be suitable to recommend a movie that is in the same detailed genre as one that has got a high rating score from the user. Hierarchical structures of users and items involve complementary information and capturing them simultaneously can further improve the recommendation performance. Therefore, in this subsection, acquiring the hierarchically structured information of users and items is introduced by enhancing the basic weighted non-negative matrix factorization model. Some features of users and items are hierarchically structured. For instance, as shown in Figure  1b, the genres of movies can be organized into a hierarchical structure. It is very likely that movies that are associated with the detailed genres are more similar than those in subgenres. For this reason, it should be suitable to recommend a movie that is in the same detailed genre as one that has got a high rating score from the user. Hierarchical structures of users and items involve complementary information and capturing them simultaneously can further improve the recommendation performance. Therefore, in this subsection, acquiring the hierarchically structured information of users and items is introduced by enhancing the basic weighted non-negative matrix factorization model.
One of the most significant challenges for recommendation systems is to elicit valuable information from the features of highly correlated users and items in a user-item interaction that forms the basis of the prediction process. Typically, this is modeled using the flat attributes of users (for example, gender and age) or items (in the case of a movie, this can include an actor, a producer, release date, language, and country). However, these features may often be represented in a multilevel structure, i.e., a hierarchy, in the form of a tree with nested nodes (for example, movie genres and user occupations). Simple representations of a hierarchical structure include movie genres and product categories on e-commerce websites, as shown in Figure 1.  For example, the movie Godfather (an item) can be classified by traversing the hierarchical tree nodes as follows: main genre→subgenre, per Figure 1b, which then resembles crime→gangster. Similarly, the Apple Watch Series 5 (an item) can be placed in a hierarchical structure, per Figure 1a, as main category→subcategory→explicit subcategory, which is tantamount to electronics→cell phones, smart watches→smart watches. The classification of an item into appropriate lower-level categories or nodes is conducted sequentially.
User preferences are similarly structured. For instance, a user who chooses to rate movies in the crime genre may prefer the gangster subgenre over others, and those who shop for items belonging One of the most significant challenges for recommendation systems is to elicit valuable information from the features of highly correlated users and items in a user-item interaction that forms the basis of the prediction process. Typically, this is modeled using the flat attributes of users (for example, gender and age) or items (in the case of a movie, this can include an actor, a producer, release date, language, and country). However, these features may often be represented in a multilevel structure, i.e., a hierarchy, in the form of a tree with nested nodes (for example, movie genres and user occupations). Simple representations of a hierarchical structure include movie genres and product categories on e-commerce websites, as shown in Figure 1.
For example, the movie Godfather (an item) can be classified by traversing the hierarchical tree nodes as follows: main genre→subgenre, per Figure 1b, which then resembles crime→gangster. Similarly, the Apple Watch Series 5 (an item) can be placed in a hierarchical structure, per Figure 1a, as main category→subcategory→explicit subcategory, which is tantamount to electronics→cell phones, smart watches→smart watches. The classification of an item into appropriate lower-level categories or nodes is conducted sequentially.
User preferences are similarly structured. For instance, a user who chooses to rate movies in the crime genre may prefer the gangster subgenre over others, and those who shop for items belonging to a particular hierarchical level of the product catalog may express coincidental preferences by consistently rating items that exhibit similar characteristics.
From Section 3.2, WNMF was adopted as the core model to acquire implicit hierarchical information and thereby predict rating scores. The user-item rating matrix, R, was deconstructed into two lower-dimensional non-negative submatrices, P and Q, constituting user preferences and item characteristics, respectively, and expressed as the flat structures of features. Because P and Q are non-negative, we applied the non-negative matrix factorization to them to interpret the corresponding hierarchically structured information, which then served to predict the rating scores given by Equation (1).
P and Q were extracted such that P ∈ R n×r and Q ∈ R r×m to indicate the latent representations of n users and m items in an r-dimensional latent category (space). P and Q were further factorized to model the hierarchical structure owing to their non-negativity.
Therefore, in a particular embodiment, P was factorized into two matrices, P 1 ∈ R n×n 1 and P 2 ∈ R n 1 ×r , as follows: where n is the number of users, r is the number of latent categories (space) in the first hierarchical level, and n 1 and is the number of subcategories in the second hierarchical level. Thus, P 1 ∈ R n×n 1 is the relationship of n users to n 1 subcategories.P 2 denotes the second level of the hierarchical structure of users obtained from the relationship between the number of latent categories (space) in the first hierarchical level and n 1 , i.e., the number of latent subcategories in the second hierarchical level.
To compute the third level of a hierarchical structure of users, as given in Equation (4),P 2 is further factorized as P 2 ∈ R n 1 ×n 2 andP 3 ∈ R n 2 ×r : where n 2 is the number of subcategories in the third hierarchical level. Therefore, deep factorization on P serves to obtain the xth level of the hierarchical structure of users, P x , which is accomplished by factorizingP x−1 , the latent category relationship matrix of the (x − 1)th level of the hierarchical structure, into non-negative matrices, as follows: where P i ≥ 0 for i ∈ {1, 2, .., x}, P 1 is an n × n 1 matrix such that P i is an n i−1 × n i matrix, and P x is n x−1 × r matrix. The above factorization process as illustrated in Figure 2 is repeated for Q to obtain the level of the hierarchical structure of items. For this, the relationship of m items with r-dimensional latent categories (space) is represented as Q ∈ R r×m , which is further factorized into Q 1 ∈ R m 1 ×m andQ 2 ∈ R r×m 1 to describe the second level of items in the hierarchy given by: where m 1 is the number of sub-categories in the second hierarchical level and Q 1 ∈ R m 1 ×m is the relationship of m items to the m 1 latent subcategories. The latent category relationship of the non-negative matrixQ 2 ∈ R r×m 1 of the second hierarchical level is defined as the affiliation between r-dimensional latent categories (space) in the first hierarchical level and m 1 latent subcategories in the second hierarchical level. Equation (7) gives the third level of the hierarchical structure of items, wherẽ Q 2 is also factorized as Q 2 ∈ R m 2 ×m 1 andQ 3 ∈ R r×m 2 , where m 2 is the number of subcategories in the third hierarchical level: Symmetry 2020, 12, x FOR PEER REVIEW 7 of 17 where ≥ 0 for j ∈ 1,2, . . , y , is an m × m matrix such that is an m × m matrix, and is an r × m matrix.  Finally, the below optimization problem needs to be effectively solved for building a model that outlines the hierarchical structures of users and items: where ≥ 0 for i ∈ 1,2, . . , x and ≥ 0 for j ∈ 1,2, . . , y .
The rating prediction process that involves acquired user's and item's hierarchically structured information is represented in Figure 4. Deep factorization on Q, as illustrated in Figure 3, secures the yth level of a hierarchical structure of items, Q y , which is accomplished by factorizingQ y−1 , in the (y − 1)th level of the hierarchy, as follows: where Q j ≥ 0 for j ∈ 1, 2, .., y , Q 1 is an m 1 × m matrix such that Q j is an m j × m j−1 matrix, and Q y is an r × m y−1 matrix. where ≥ 0 for j ∈ 1,2, . . , y , is an m × m matrix such that is an m × m matrix, and is an r × m matrix.  Finally, the below optimization problem needs to be effectively solved for building a model that outlines the hierarchical structures of users and items: where ≥ 0 for i ∈ 1,2, . . , x and ≥ 0 for j ∈ 1,2, . . , y .
The rating prediction process that involves acquired user's and item's hierarchically structured information is represented in Figure 4. Finally, the below optimization problem needs to be effectively solved for building a model that outlines the hierarchical structures of users and items: where P i ≥ 0 for i ∈ {1, 2, .., x} and Q j ≥ 0 for j ∈ 1, 2, .., y .
The rating prediction process that involves acquired user's and item's hierarchically structured information is represented in Figure 4.
The rating prediction process that involves acquired user's and item's hierarchically structured information is represented in Figure 4.

Incorporating Tag Information
Tag information was incorporated uniquely into our proposed methodology for deriving an association between the supplementary information solicited from WNMF and tag repetitiveness in items [3]. For example, an "organized crime" tag assigned to the movie "The Godfather" (item) by a user may also apply to other items with similar characteristics, which is reflected in the degree of repetitiveness. Therefore, the matrix factorization process of a basic WNMF model is regularized using the tag information to complete our prediction model. In short, we aimed to form two item-specific latent feature vectors from the MF process of our WMNF model that are similar in nature and contain items with common tag information. For a tag information matrix T, each of its components T it for item i and tag t is a tf * idf value [35]: where tf(i, t) is the normalized frequency of t occurring in i, df(t) is the number of items that contain t, and m is the total number of items. Thus, the similarity between items i and j is computed using the cosine similarity metric given, as follows: where T ij is the index of tags occurring in both i and j. The two item-specific latent feature vectors that are most similar are then obtained by affixing an item similarity regularization criterion function to the WNMF model, as follows: where S i,j defines the similarity between i and j; q 1 q 2 . . . ., q m are latent characteristic vectors that populate Q; r is the dimension of each item in the vector, i.e., q r i and q r j are the values of vector items i and j of the r th dimension; L denotes the Laplacian matrix given by L = D − S for a diagonal matrix D such that D ij = j S ij . tr(·) is a trace of the matrix; β is an extra regularization parameter that controls the contribution of the tag information [36]. The rating predictions were made by combining Equations (9) and (12) and utilizing the following objective function for the minimization task: where P i ≥ 0 for i ∈ {1, 2, .., x} and Q j ≥ 0 for j ∈ 1, 2, .., y .

Optimization Problem
The optimization problem is complicated owing to the non-convexity of the objective function, but solving for it also helps in validating the method that is administered in a recommendation system. Our optimization method modified the approach in [37] in that all variables of the objective function given in Equation (13) were updated interchangeably such that the function becomes convex, which does not occur otherwise.

The Basis of Updating P i
When P i is updated, terms unrelated to P i are discarded by fixing the other variables, and the resulting objective function is expressed as: where A i and H i for 1 ≤ i ≤ x, are defined as: The Lagrangian function in Equation (14) is: where M is the Lagrangian multiplier. The derivative of L(P i ) with respect to P i is then given by: By setting the derivative to zero and employing the Karush-Kuhn-Tucker complementary condition [37], i.e., M(s, t)P i (s, t) = 0, we obtain: Finally, the updated rule of P i is computed using: Similarly, for Q i , the unrelated terms are initially discarded by fixing the other variables, and the resulting objective function is expressed as: where B i and K i for 1 ≤ i ≤ x, are defined as: We can then compute the updated rule for Q i in the same way as P i : The optimization with the above updating rules for P i and Q j tries to unveil the approximation of the factors in the proposed model. Each hierarchical level is pre-trained to get an initial approximation of the matrices P i and Q j . The input user-item rating matrix is factorized intoP IQI by solving Equation (2). Then,P i andQ i are further factorized intoP i ≈ P 1P2 andQ i ≈Q 2 Q 1 , respectively. The factorization step is continued up until the pth user and qth item hierarchical levels are obtained. The fine-tuning process is performed by updating P i and Q i using Equations (20) and (24) separately. The step first involves updating Q i in sequence and then P i in sequence. Finally, the predicted rating matrix will be equal to R = P 1 . . . P x Q y . . . Q 1 .

Convergence Analysis
The examination of the convergence of the proposed model was conducted as follows. The assistant function in [38] was used to prove the convergence of the model. [38] is defined as G(h, h ) for F(h) if the conditions:

Definition 1. The assistant function
are satisfied.

Assumption 1. If G [38] is an assistant function for F, then F is non-increasing under the update:
Assumption 2. [39] For any matrices A ∈ R n×n + ,B ∈ R k×k + , S ∈ R k×k + , and S ∈ R k×k + , where A and B are symmetric, the following inequality holds: n s=1 k t=1 (AS B)(s, t)S 2 (s, t) S (s, t) ≥ Tr S T ASB (28) The objective function in Equation (14) can be written in the following form by developing the quadratic terms and removing terms that are unrelated to P i : The above function is an assistant function for J(P i ). Moreover, it is a convex function in (P i ) and its global minimum is: Proof. The proof is similar to that in [40] and thus the details are omitted. (20) will monotonically decrease the value of the objective in Equation (13).

Theorem 2. Updating P i with Equation
Proof. With Assumption 1 and Theorem 1, we have: That is, J(P i ) decreases monotonically. Equivalently, the update rule for Q i will also monotonically decrease the value of the objective in Equation (13). Since the value of the objective in Equation (13) is at least edged by zero, we can have shown that the optimization technique of the proposed method converges.

Time Complexity Analysis
The most expensive operations in the proposed model are the initialization and fine-tuning process that leads to increasing the efficiency of the model. Namely, the time complexity of the decomposition of P i ∈ R n i−1 ×r to P i ∈ R n i−1 ×n i and P i+1 ∈ R n i ×r is O(kn i−1 n i r) for 1 < i < x and O(knn 1 r) for i = 1, where k is the number of iterations in the decomposition process. Hence, the cost of initializing the P i 's is O(kr(nn 1 + n 1 n 2 + . . . + n x−2 n x−1 )). Likewise, the cost of initializing the Q i 's is O kr mm 1 + m 1 m 2 + . . . + m y−2 m y−1 . The computational costs of fine-tuning P i and Q i in each iteration are O(nn i−1 n i + nn i m + n i−1 n i m) and O(mm i−1 m i + mm i n + m i−1 m i n). Let n 0 = n, m 0 = m, n x = m y = r, then the time complexity of fine-tuning is where k f is the number of iterations in the fine-tuning process. The time complexity of computing the item similarities and L is O m 2 t , where m is the total number of items and t is the total number of tags. Hence, the total time complexity is the sum of the cost of the initialization, fine-tuning, and computing the item similarities. It is interesting to note that in practice, two hierarchical levels of users and items, x = 2 and y = 2, give better performance advancement over MF and WNMF. When x > 2 and y > 2, the performance of the proposed model is also better than that of x = 2 and y = 2, but the time complexity grows. Therefore, the optimal value of x and y is chosen to be 2 practically because the time complexity is not larger than for MF and WNMF.

Dataset
To evaluate the performance of our model, an experiment was performed with the latest small MovieLens 100K dataset. The dataset comprises 100,000 movie ratings and 3683 tags that are essentially user-generated metadata (a single word or short phrase) about movies. The ratings are scored on a scale of 0.5 to 5.0 stars, and movies and users are selected from a total of 19 genres and 21 occupation categories, respectively. While the genres and occupations are leveraged for hierarchical information of the movies and users, the tags lend to tag information.

Measurement Metric
The dataset was randomly divided into 60% and 80% for training, and the remaining instances were split as 40% and 20% for testing. The prediction accuracy of the proposed model was measured using the popular mean absolute error (MAE) metric. MAE returns the average absolute deviation of the prediction from the ground truth: where τ is a set of ratings, and R and R are the true and predicted ratings, respectively. The smaller the value of MAE, the more accurate the prediction; hence, MAE is preferred when the indicator values are small.

Results
We evaluated the model using two indicators: the rating prediction error (i.e., MAE) for the predetermined weights of the tag information and the extent of mitigating the item cold-start problem.

Prediction Accuracy with Tag Information Weights
It is worth noting that the proposed method completed the entire workflow for the rating prediction only in the case of items constituting tag information, while for the rest of the instances, it morphed into a basic WNMF model, i.e., without solving for Equations (10)- (13). To prove the superiority of our approach, two baseline methods were selected for comparison, where the results are summarized in Table 2. 1.
Matrix factorization: Proposed by Koren et al. [3], this method factorizes a user-item rating matrix and learns the resultant user and item latent feature vectors to minimize the error between the true and predicted ratings.

2.
Weighted non-negative matrix factorization: This was also chosen as the base model for the proposed approach, where WNMF attempts to factorize a weighted user-item rating matrix into two non-negative submatrices to minimize the error between the true and predicted ratings.
The results were taken when the parameter r was defined as 20 and the size of n 1 ranged according to {50, 100, 150, 200, 250}, while m 1 ranged according to {100, 200, 300, 400, 500}. The values of the hierarchical layers x and y were equal to 2. W R ≈ W (P 1 P 2 Q 1 Q 2 ) where the matrices are given as follows P 1 ∈ R n×n 1 , P 2 ∈ R n 1 ×r and Q 1 ∈ R r×m 1 , P 2 ∈ R m 1 ×m . Overall, when the values of the dimensions rose, the model performance tended to grow at first and consequently fell. The extra regularization parameter, β, controls the contribution of tag information in learning the item latent feature vector. In other words, for β = 0, our methodology adopted the basic WNMF to compute Equation (13) and thereafter predict ratings, whereas, for non-zero β values, the weight of the tag information manifested its effects on the predictions, as illustrated in Figure 5. Although this reflects a certain degree of reliance on β for the proposed approach, it also proved the efficiency of using a combination of hierarchical and tag information. The correlation between MAE and β for β in the range of 0.05-3.0 was plotted for the 80% training dataset and the accuracy increased proportionally with β, peaking between 1.0 and 2.1 (lowest MAE recorded).

Mitigation of the Item Cold Start
One of the main challenges encountered when building a recommendation system is the coldstart problem, which arises when a new user or item is introduced for which no past interactions are available. In particular, collaborative filtering algorithms are more prone to the cold-start problem. As basic models, matrix factorization algorithms (WNMF and MF) have poor performances in the case of the cold-start problem due to a lack of preference information [26,27,41]. Supposing the tag information is accessible for use, our proposed model can mitigate the cold-start problem by seamlessly incorporating the tag information to provide a recommendation. Tag information not only contains an explanation of the items but also provides the sentiment of users. In particular, the proposed method tries to make two item-specific latent feature vectors as similar as possible if the two items have a similar tagging history. It can give recommendations to new users who have no preference for any items. In such cases, the proposed approach helped in alleviating the cold-start problem by integrating tag information, where other comparable methods failed.
To test this, the ratings of 50 and 100 randomly selected items from the 80% training dataset were discarded such that they were viewed as new items (cold-start items) in the recommendation system. In the cold-start experiments, the results of the proposed model performance were taken when the parameters of the model were set to the optimal values of β = 1.8, r = 20, and the number of hierarchical layers x and y were equal to 2. The comparative results are presented in Table 3, which shows that the proposed method outperformed the MF and WNMF models, validating the conducted

Mitigation of the Item Cold Start
One of the main challenges encountered when building a recommendation system is the cold-start problem, which arises when a new user or item is introduced for which no past interactions are available. In particular, collaborative filtering algorithms are more prone to the cold-start problem. As basic models, matrix factorization algorithms (WNMF and MF) have poor performances in the case of the cold-start problem due to a lack of preference information [26,27,41]. Supposing the tag information is accessible for use, our proposed model can mitigate the cold-start problem by seamlessly incorporating the tag information to provide a recommendation. Tag information not only contains an explanation of the items but also provides the sentiment of users. In particular, the proposed method tries to make two item-specific latent feature vectors as similar as possible if the two items have a similar tagging history. It can give recommendations to new users who have no preference for any items. In such cases, the proposed approach helped in alleviating the cold-start problem by integrating tag information, where other comparable methods failed.
To test this, the ratings of 50 and 100 randomly selected items from the 80% training dataset were discarded such that they were viewed as new items (cold-start items) in the recommendation system. In the cold-start experiments, the results of the proposed model performance were taken when the parameters of the model were set to the optimal values of β = 1.8, r = 20, and the number of hierarchical layers x and y were equal to 2. The comparative results are presented in Table 3, which shows that the proposed method outperformed the MF and WNMF models, validating the conducted test, showing that tag information could be used to execute recommendations for cold-start items. It is evident that in both instances, the proposed methodology helped with mitigating the cold-start problem for new items significantly better than its competitors.

Top-N Recommendation Results
Along with providing superior MAE results for rating predictions, the proposed model also showed its superiority when performing the top-N recommendation task. Experiments on the proposed model for top-N recommendation identified the items that best fit the user's personal tastes obtained from their hierarchically structured features and tagging history. To evaluate the top-N performance of the proposed model, an 80% training dataset was used to generate a ranked list of size N items for each user. The proposed method and the other two baseline cutting edge methods were compared using the most widely used MovieLens 100K dataset, as indicated in Figure 6. The comparison task was performed for three sizes of N: the first was the top-5, the second was the top-10, and the final one was the top-15. When the size of N was equal to 5, the MAE of the MF method was 0.748, while the MAE of the WNMF method was higher by 0.01 than the MF method. However, the proposed model outperformed both the MF and WNMF methods and accomplished the lowest error rate of 0.736 for the top-5 and 0.752 for the top-10, whereas the other two methods (MF and WNMF) showed 0.757 and 0.772 for the top-10, respectively. Our suggested approach required expensive operations for the initialization and fine-tuning process. For this reason, the proposed method had a slightly higher error rate compared to the MF method, as indicated for the top-15. From these experiments, the proposed method still worked successfully and the superiority was clearly verified. Along with providing superior MAE results for rating predictions, the proposed model also showed its superiority when performing the top-N recommendation task. Experiments on the proposed model for top-N recommendation identified the items that best fit the user's personal tastes obtained from their hierarchically structured features and tagging history. To evaluate the top-N performance of the proposed model, an 80% training dataset was used to generate a ranked list of size N items for each user. The proposed method and the other two baseline cutting edge methods were compared using the most widely used MovieLens 100K dataset, as indicated in Figure 6. The comparison task was performed for three sizes of N: the first was the top-5, the second was the top-10, and the final one was the top-15. When the size of N was equal to 5, the MAE of the MF method was 0.748, while the MAE of the WNMF method was higher by 0.01 than the MF method. However, the proposed model outperformed both the MF and WNMF methods and accomplished the lowest error rate of 0.736 for the top-5 and 0.752 for the top-10, whereas the other two methods (MF and WNMF) showed 0.757 and 0.772 for the top-10, respectively. Our suggested approach required expensive operations for the initialization and fine-tuning process. For this reason, the proposed method had a slightly higher error rate compared to the MF method, as indicated for the top-15. From these experiments, the proposed method still worked successfully and the superiority was clearly verified.

Conclusions
Presently, while the development of personalized recommendation systems has been continuing to grow to a high degree, data sparsity, cold starts, and improving recommendation system performances are still open challenges that need to be solved in the recommendation system area. In this study, we proposed a novel rating prediction model with enhanced matrix factorization using

Conclusions
Presently, while the development of personalized recommendation systems has been continuing to grow to a high degree, data sparsity, cold starts, and improving recommendation system performances are still open challenges that need to be solved in the recommendation system area. In this study, we proposed a novel rating prediction model with enhanced matrix factorization using hierarchical and tag information that addressed the above issues. Experimental results revealed the significant influence of the hierarchical and tag information used in combination to alleviate the issues of data sparsity and item cold starts compared to established MF techniques. The entire workflow of our proposed model for rating predictions was completed only in the case of items constituting tag information with the hierarchical information of users and items. In particular, deep factorization on the user preference and item characteristic matrices was accomplished due to their non-negativity to get hidden-level hierarchical structured features, while tag information was used to regularize the matrix factorization process of a basic WNMF model to complete our prediction model. During the experimental testimony process, we concluded that if the values of the dimensions increased, the proposed model performance tended to increase at first and then decrease. Despite the superiority of the proposed approach, several problems were encountered, especially with the advances in the domain that focus on the high volume of data available for making recommendations. Therefore, future research could explore more sophisticated models for estimating the importance of the hidden features of users and items that the features represented as hierarchical structures, as well as tag information preference, by using recent deep learning methods and algorithms. Additionally, future research work might similarly also develop an explainable and interpretable recommendation system based on the above hidden features.