2.1. Establishment of Latent Factor Model
The enhanced latent-factor framework in this paper differs from standard latent factor models in three key ways. First, it integrates a baseline bias correction component to offset systematic skew in user or item ratings. Second, it incorporates a dimension-reduced semantic projection optimized through singular value decomposition and gradient descent jointly, enabling stable convergence even under extreme sparsity (99.5%). Third, the model adopts a whole-matrix evaluation scheme instead of a conventional
truncation, aligning the training objective more closely with performance in low-density recommendation environments. The baseline method [
28] is mainly used to provide references for other models, so that the model can better approach the benchmark level. One of its important advantages is that it can generate recommendations for new registered users without any information. This paper uses the gradient descent method to solve the problem of the benchmark prediction model.
The
’s prediction rating is
, and the error of prediction is
. Equation (1) shows the first-order partial derivative of
and
.
where,
represents the loss function. The parameters
and
are the deviations of user
and item
from the average level, respectively.
Moving in the direction opposite to the gradient, we obtain the iterative Equation (2):
Here, represents the learning rate, denotes the regularization parameter. Through multiple experiments, the optimal value is determined according to the actual situation.
The main goal of using factorial models to generate predictive ratings is to reveal hidden features of items that can explain the observed rating. Some examples of this model include the PLSA model [
29], the neural network model [
30], and the implicit Dirichlet assignment model [
31]. Recently, the matrix factorization model has been favored by more and more people because of its accuracy and stability.
The idea of establishing the factorization model comes from the singular value decomposition (SVD) principle [
32]. The implementation scheme of SVD technology is to decompose the original matrix, select the first
singular values as the eigenvalues of the prediction matrix, and generate a new low-dimensional matrix. This matrix is the approximate matrix of the original matrix, and the matrix value is used as the prediction rating.
Given a user–item rating matrix
of size
(where
), SVD yields three matrices:
,
, and
, such that Equation (3):
In the above equation,
is the singular value diagonal matrix, and its size is
. The elements on the diagonal of
are singular values of
and satisfy the arrangement order of
. The matrices
and
are orthogonal matrices. The size of
is
and satisfies
, and the size of
is
and satisfies
. The schematic diagram of the singular value decomposition of the
matrix is shown in
Figure 1.
In the process of dimension reduction using singular value decomposition, it is necessary to determine how many dimensions are retained. This parameter needs to be adjusted according to the actual situation. The specific operation of dimension reduction is to retain the first
eigenvalues (i.e.,
) in the singular value matrix
to obtain a new eigenmatrix
. And we select the corresponding
eigenvectors in the matrices
and
to form the matrices
and
, and finally synthesize a new matrix
, as shown in Equation (4).
Then
represents the predicted rating of the user
on the item
. The SVD decomposition and dimension reduction process is shown in
Figure 2.
The steps of using singular value decomposition to complete scoring prediction are as follows:
The rating matrix is decomposed into , , and by the singular value decomposition algorithm.
We obtain the first singular values of the matrix to form .
We select the corresponding eigenvectors from and to form and .
Matrices and are synthesized from , , and .
We obtain the prediction rating of the user on the item .
The SVD algorithm is the usual way to solve the problem of sparse data in the recommendation system, which can achieve the goal of dimension reduction. However, this method has the following two problems:
Occupying too much storage space. The actual system has many users and items, and the matrix needs a large storage space after generating the prediction rating.
The operation’s efficiency is low. The algorithm needs to decompose the matrix. For the matrix with very high dimensions in practice, the algorithm will require a large amount of computation and take a long time.
These two problems limit the application of the SVD algorithm. After continuous research, Simon Funk improved the SVD algorithm based on the gradient descent method [
33] and proposed Funk SVD [
34], i.e., a factorization model.
The factor decomposition model utilizes scoring data from the e-commerce platform as its primary input. These data encompass information on all users, items, and the corresponding user–item ratings. In
Figure 3, to illustrate the model’s functionality, we consider a simplified scenario with ratings from 3 users on 4 items, assuming 3 hidden features.
In this factor decomposition model, matrix serves as the foundation, where denotes user ’s preference for item . The model’s main function is to find latent item features from the original data, which are then used for item classification and rating prediction. The matrix undergoes decomposition into matrix and matrix .
Here, is the matrix of user–feature, with indicating user ’s preference level for hidden feature . Conversely, denotes the matrix of item-feature, where signifies the weight of item within the feature set .
The predicted preference of user
for item
is calculated using Equation (5):
In Equation (5), represents user ’s preference vector (a row in matrix ), while denotes item ’s weight vector (a column in matrix ).
This decomposition approach offers several advantages:
It autonomously extracts and utilizes latent item attributes for classification, eliminating the need for manual item categorization.
The model’s granularity is flexible and determined by the number of hidden features, allowing for adjustable levels of refinement.
Rather than explicit item categorization, the model assigns weights to each item across various classes.
To optimize the vectors and , we employ a loss function minimization approach. The input data are partitioned into training and test sets, and gradient descent is utilized to iteratively refine and , thereby reducing the loss function value and improving prediction accuracy.
The initial loss function is defined as Equation (6):
where
is the set of known user–item interactions.
To mitigate overfitting, we introduce a regularization term, modifying the loss function to Equation (7):
Here, serves as the regularization parameter, fine-tuned through empirical testing. The optimization process employs gradient descent, involving the following steps:
Compute the partial derivatives of
and
, as shown in Equation (8):
Update
and
iteratively, as Equation (9):
Parameter denotes the learning rate, represents the prediction error. The model’s implementation requires:
Initializing vectors and based on the dataset.
Tuning parameters include learning rate , regularization parameter , iteration count , and hidden feature count .
This factor decomposition approach allows for the incorporation of corrective measures to address practical issues, such as bias in user scoring patterns or consistent item overestimation.
Equation (10) depicts the ultimate predictive model, known as the implicit semantic model [
35].
To optimize the model parameters (
,
,
,
,), we minimize the following normalized square loss function, as shown in Equation (11):
The optimization process employs stochastic gradient descent, with the following update rules, as shown in Equation (12).
To adjust the parameters for a given training sample, we utilize the technique of moving in the direction contrary to the gradient, as illustrated in Equation (13).
By fine-tuning parameter and , the model achieves enhanced prediction accuracy.