MovieDIRec: Drafted-Input-Based Recommendation System for Movies

: In a DNN-based recommendation system, the input selection of a model and design of an appropriate input are very important in terms of the accuracy and reﬂection of complex user preferences. Since the learning of layers by the goal of the model depends on the input, the more closely the input is related to the goal, the less the model needs to learn unnecessary information. In relation to this, the term Drafted-Input, deﬁned in this paper, is input data that have been appropriately selected and processed to meet the goals of the system, and is a subject that is updated while continuously reﬂecting user preferences along with the learning of model parameters. In this paper, the effects of properly designed and generated inputs on accuracy and usability are veriﬁed using the proposed systems. Furthermore, the proposed method and user–item interaction are compared with state-of-the-art systems using simple embedding data as the input, and a model suitable for a practical client–server environment is also proposed.


Introduction
Recently, various studies related to recommendation systems have been actively conducted. There are two main themes. The first is methods to solve the fundamental problems of the recommendation system such as the first rater, the cold start problem, overspecialization, and protection of user privacy [1][2][3], and the second is the improvement of the accuracy of the recommendation system [4][5][6][7][8][9].
Currently, various methods combining a DNN (Deep Neural Network) with collaborative filtering, content-based filtering, etc., including pure DNN-based methods, are being studied and advanced to improve accuracy [2,[10][11][12]. Most of the methods of recent research are methods of inferring preference trends using similar users, such as collaborative filtering, based on vectors embedding user-item interaction. In this case, the model learns the user-item preference distribution. However, since the data describing the item are limited to the embedding vector dimension, there is a tendency to generalize the user's complex preferences. On the other hand, a method using content-based filtering is also being actively studied. The study mainly solves the cold-start problem or the firstrater problem and is introduced as a solution with improved accuracy compared to pure content-based filtering.
The method we propose is an approach using a DNN, which uses the Drafted-Input data with potential meanings to omit the inaccurate and generalized semantic transformation process from the model and induces learning only the necessary meanings. Certain parts of the drafting process of Drafted-Input clearly fall under feature engineering or pre-processing. However, it means more than preprocessing, in that it is continuously updated by learning and each role exists within the model network. Drafted-Input is data processed by analyzing/selecting related meta data according to the purpose pursued by

•
It is the subject of training and is input data that are continuously updated through learning once they are created ≈ preliminary and updatable object. • These are the input data selected to describe the target in terms of the goal of the model ≈ select for a certain purpose.
In this paper, we propose a movie recommendation system that focuses on the latter, improving accuracy, and also providing an appropriate configuration to apply it in a practical client-server environment. The reason why the scope is limited to movies is that the proposed method has the characteristic of operating only in one domain, and the data are relatively open compared to multiple domains. In this paper, movie data of IMDB and rating data of MovieLens-20m were used [13]. We propose two types of DNN-based models that learn through the user's implicit feedback and provide personalized recommendations; the first is a model that is as rough as possible so that it is easy to understand and modify concepts and can be applied in a centralized environment. Second, the Auto-Encoder concept is applied to the first model in consideration of the practical environment so that the server and the client divide and share the operation. It is a model that reduces the traffic for inference as much as possible.
Our Contributions. The major contributions are summarized as follows: • Verify the effect of Drafted-Input of movie/user on training and recommendation accuracy; • Propose an inference resource distribution method based on Auto-Encoder considering the client-server environment; • Propose a method to personalize by paying attention to specific preference features of items in the network using User's weights extracted through a specific method.

DNN-Based Recommendation System
So far, various approaches using a DNN have been proposed in the field of recommendation systems. A well-known example is the recommendation system of the YouTube platform [11]. Due to the vast amount of content on YouTube, a lot of calculations are required for recommendations. Therefore, the method that YouTube has chosen is to apply the Candidate model that filters out candidates from a huge amount of content and the ranking model that infers ranking among the filtered content. It is to infer the user's action using a model similar to the candidate model through hundreds of videos that are impressions from the candidate model. YouTube uses implicit feedback based on whether the user watched or not, instead of explicit feedback such as 'like' and 'dislike', which are functions provided by the platform. This is because, in general, explicit feedback is very sparse, and more user history can be utilized using implicit feedback. Furthermore, YouTube took a method of solving the cold-start problem to some extent by including the user's geographic information or demographic information in the input features.
Another approach that is rapidly emerging is a method using a Variational Auto Encoder (VAE) [4,[6][7][8]. Dissimilar to the dimensionality reduction in the input, which is one of the purposes of the commonly used Auto Encoder, a VAE aims to generate new data using the learned distribution. To apply this to the recommendation system, these studies suggest improvement methods such as applying the reparameterization trick or modifying the loss function of the existing VAE to effectively tune the parameters. A VAE has the characteristic of being able to solve problems of linear latent factor models (e.g., Matrix Factorization), such as overfitting caused by sparse data or slowing down due to model size, by applying the concept of the multinomial distribution, which was not mainly used in the recommendation system. However, as mentioned above, if a single shared model is used, distribution learning using embedding vectors tends to generalize user preferences. In addition, it is also impossible to reflect the negative experience to the user's preference by using a vector that maps interaction to 0 and 1 for the distribution calculation. Due to these characteristics, the results are reflected insensitively to the user's dislike when recommending.

TF-IDF
The term Frequency-Inverse Document Frequency (TF-IDF) is one of the text vectorization techniques and is a method of weighing the importance of each word in a document through word frequency and inverse document frequency. When the document is d and the word is t, the tf-idf vectorization can be expressed as the following expression: Here, t f (t, d) can be seen as the frequency of the term in the document, and id f is calculated as follows: This property can prevent frequently included words such as "this" and "the" from affecting the vector expressing the meaning of the document, although it contains little meaning.

Truncated-SVD
Singular Value Decomposition (SVD) means decomposing a given m × n matrix A as follows.
In the above equation, U is an m × m orthogonal matrix obtained by the eigen decomposition of AA T . It is called a left singular vector of A, and V is an n × n type right singular vector obtained by the eigen decomposition of A T A. Finally, Σ is a rectangular diagonal matrix of the form m × n whose diagonal elements are the square roots of the eigenvalues of U and V. The one method is called Full-SVD, and there are Compact-SVDs in which a portion having singular values of zeros is reduced and a Thin-SVD method in which only rows excluding a diagonal matrix are reduced. However, it is difficult to use in a recommendation system environment with sparse data.
Another method is the Truncated-SVD method, which extracts and uses only the top t singular values of the SVD. This is aimed at approximation rather than restoration, and by setting t at a level that does not negatively affect the approximation, a reduced UΣV T is obtained, and a compressed vector as much as t can be obtained through this. Because of these characteristics, it can be used in a more suitable way for sparse data. Both SVD and t-SVD are methods frequently used as matrix factorization techniques for collaborative filtering, and studies using them are still being actively conducted [14][15][16][17][18].

Auto Encoder
Dissimilar to the VAE described above, an Auto Encoder is generally used for dimensionality reduction. The model consists of an encoder model that compresses input features and a decoder model that approximates the original by expanding the compressed code information. The Auto Encoder has the characteristic that an SVD is a linear dimensionality reduction algorithm, whereas the Auto Encoder is a non-linear based algorithm, so it can operate smoothly even for complex data compression. Methods using the data compression and approximation characteristics of an Auto Encoder in the recommendation system have been actively proposed [19,20]. These methods aim to create a new latent vector for a user or item, and it was confirmed that they generate a vector of better quality than the traditional generation methods. However, in terms of performance, the overall performance was lower than the recent state-of-the-art systems using a VAE.

Our Approach
In this chapter, our proposed DNN-based approaches are described in detail. As stated in the introduction, we propose two models: the MovieDIRec, which is rough and easily configured, and MovieDIRec+, which can be applied to a practical environment by applying an Auto Encoder to MovieDIRec. Furthermore, the Drafted-Input creation method and outline proposed in this paper are described in detail.

Drafted-Input
Drafted-Input is created through a series of preprocessing processes by combining the user's rating vector with the movie's metadata obtained through IMDB. Drafted-Input is data that describe the object in more detail than a simple embedding vector and is composed of data closely related to user preferences. We used director, actor, genre, release date, votes, plot, etc., from the movie information provided by IMDB, and the Drafted-Input Data generated through it consisted of the following: M: Data that describe the Movie and consist of a Director, Actor, Genre, Story, and Popularity. U W : User's preferred weights corresponding to each feature of M except for popularity. U P : User's preferred weight for popularity. U F : User Features that express user preferences through the t-SVD method.
The draft process to create Drafted-Input is carried out before learning and when new movies and users are introduced. In this paper, TF-IDF and t-SVD were used during the draft process. According to [6], t-SVD, a type of matrix factorization method, has the following disadvantages: (1) Speed decrease due to model size. (2) Not applicable to new user/item. (3) A large amount of user feedback is concentrated on well-known content, so overfitting is easy.
However, in the proposed system, the draft process, train, and runtime processes were separated, so the speed of the t-SVD model did not affect the runtime. In addition, it was possible to apply a new user through the draft process that could be connected to provisioning. Lastly, even if it was a new item, the Actor and Director features created by the t-SVD were shared and learned between movies, so they could be directly applied to new movies.

Movie Features
Movie Features M are a feature for describing a movie and are information related to preferences. Among them, Director, Actor, and Popularity, except for Genre and Story, which are fixed features, can be interpreted as continuously changing features, which are defined as features that need to be updated by user feedback. We converted the rating data into rating data corresponding to each Actor/Director, differentially applied weights set appropriately, and compressed them through the t-SVD to transform them into matrices with potential meaning. In the case of popularity P, it consisted of a single scalar value, and a normalized rank according to the number of ratings per published date was used.
It should be noted that, in the case of P, it required a different treatment from other features of M. Popularity is a scalar value indicating the popularity of the movie, and it was judged that separate treatment was necessary because it was the only verified value directly related to the quality of a movie among features and an important value that affects the overall preference. Therefore, the user's preferred weight corresponding to popularity was also treated independently.
In the case of Genre, vectorization was performed through TF-IDF. The reason that simple count vectorization was not applied is that count vectors contain very sparse information and the genre distribution tends to be very different.
In the case of the story, the following pre-processing was performed in advance and, then, vectorization was performed:

1.
Remove numbers and symbols that are not related to the story; 2.
Remove person names and attach NER(Named Entity Recognition) tags using the pre-trained Bert model [21]; 3.
Remove movies with five or fewer descriptive words; 4.
Only words included in the movie plot were left, and TF-IDF was conducted only for words that appeared three or more times in all movies.
While genre and story features were mapped 1:1 with a movie, the created actor and director features were shared to all movies, and they were mapped as N:N, because a movie often includes multiple actors/directors. Therefore, in Drafted-Input, the actor and director features were replaced with two features weighted and averaged according to the written order. When updating, the gradient for the weighted averaged feature was obtained, so the update was performed by applying it to the target features at once. Figure 1 is a visualization of the process of Drafted-Input for the movie being input to the model.

Remove person names and attach NER(Named Entity Recognition) tags us
pre-trained Bert model [21]; 3. Remove movies with five or fewer descriptive words; 4. Only words included in the movie plot were left, and TF-IDF was conducted o words that appeared three or more times in all movies.
While genre and story features were mapped 1:1 with a movie, the created ac director features were shared to all movies, and they were mapped as N:N, be movie often includes multiple actors/directors. Therefore, in Drafted-Input, the ac director features were replaced with two features weighted and averaged accordin written order. When updating, the gradient for the weighted averaged feature w tained, so the update was performed by applying it to the target features at once. 1 is a visualization of the process of Drafted-Input for the movie being input to the

User Features
User features consist of , , and . This is information expressing user ence.
is a preference weight array corresponding to (actors, directors, genre, respectively, and is a single weight corresponding to popularity. Lastly, i ture that directly expresses the user rather than the previous features and was o by using t-SVD in the User × Movie rating matrix. Since these features all contain ing preferences, they must be continuously updated not only in Train, but also in Ru should express how much each target movie feature affects the user's pref We tested the following methods to obtain the user's weight. First, we tried to obt user's preference bias by using the characteristic of the L2 norm, in which the resu increased as the outlier was larger. In this case, a sparse count vector set , not Features , was used for the weight calculation. Set had a size unique to each n of features: 151,859 actors, 48,459 directors, 28 genres, and 12,624 words of stories. F , , which is a rating set for specific movies I of the user, exists, and , corresp to four types of metadata also exists. The preference weight was generated lows: However, the above formula tended to significantly lower the values of oth tures when the user had a large bias towards a particular feature. This was due characteristic of L2-norm, whose value increased exponentially according to the the outlier, and we tried to lead so that the value was not greatly influenced by ea ture by adjusting the scale of the value through the equation below:

User Features
User features consist of U W , U P , and U F . This is information expressing user preference. U W is a preference weight array corresponding to (actors, directors, genre, story), respectively, and U P is a single weight corresponding to popularity. Lastly, U F is a feature that directly expresses the user rather than the previous features and was obtained by using t-SVD in the User × Movie rating matrix. Since these features all contain changing preferences, they must be continuously updated not only in Train, but also in Runtime.
U W should express how much each target movie feature affects the user's preference. We tested the following methods to obtain the user's weight. First, we tried to obtain the user's preference bias by using the characteristic of the L2 norm, in which the result value increased as the outlier was larger. In this case, a sparse count vector set E, not Movie Features M, was used for the weight calculation. Set E had a size unique to each number of features: 151,859 actors, 48,459 directors, 28 genres, and 12,624 words of stories. For user u, R u , which is a rating set for specific movies I of the user, exists, and E w, I corresponding to four types of metadata W also exists. The preference weight U w was generated as follows: However, the above formula tended to significantly lower the values of other features when the user had a large bias towards a particular feature. This was due to the characteristic of L2-norm, whose value increased exponentially according to the size of the outlier, and we tri to lead so that the value was not greatly influenced by each feature by adjusting the scale of the value through the equation below: As a result of applying the above formula, the difference according to the scale of the value dropped significantly, but in the case of features such as actors and stories, embedded items with countless but small values entered the average element and tended to lower the result value. Therefore, we chose the coefficient of variation as a method to more appropriately detect the bias extracted by outliers without being affected by the scale between features. Since the coefficient of variation uses the standard deviation of the target vector, the scale became more uniform than the methods using the square. In addition to the process of obtaining the variance in the vector, the process of dividing by the arithmetic mean was included, so that the bias toward the outlier could be obtained more flattened. The method was as follows: In the case of U P , it was created as an average of differentially applied popularity, just such as how to create actors and director features. It was judged that this method was suitable for users to explain the evaluation tendency according to popularity.
If the specific user's U P was U P,u , the movie's popularity was M p , the rated item set was I, and the weighted rating set was R, then U P,u was obtained as follows: U P represents how much the user depends on the popularity of a movie, and by using this, the degree of attenuation and amplification according to popularity can be adjusted. That is since users with high U P tend to give high scores to movies with high popularity, the influence on preference should be amplified. On the other hand, a person with a low U P is a person who has a negative view of most movies and should try to attenuate the influence on preference. Considering this, if there was P i , which is the popularity of specific movie i, and U P,u , which is the U P of a specific user u, the above amplification and attenuation effect could be achieved just by multiplying P i × U P,u to the previous layer's output.
Because of these characteristics of U P,u , it is important to guide the user to input the preferred movie at a higher rate than the unfavorable movie in the provisioning stage where the user's explicit feedback is provided.

Proposed Methods: MovieDIRec and MovieDIRec+
The first model proposed is shown in the left of Figure 2. The output layer of the model was a Softmax classifier that predicted implicit feedback. Implicit feedback was implemented by converting four or more points into positive in the rating data and binarizing them. Therefore, in the final layer of the model, Softmax performed binary classification to predict the user's positive interaction. The model consisted of fully connected layers that simply compressed or expanded the input, and multiply and concatenation layers that connected U and M.
Drafted-Input, which refers to the preferences of actors, directors, and users, must change over time. Therefore, Drafted-Inputs except for M G , M S , and M P were included in the learning target together with the model parameters during training. As shown in Figure 1, M A and M D were features combined into one dimension by weighting all actors/directors included in the target movie. Therefore, when updating, all M A and M D included in the target movie were updated using the gradient of M A and M D as it was.
We constructed a model network so that as U P and U W had higher values, more attention was paid to the M features corresponding to U P and U W . The features that were paid attention to according to the user's weight were finally combined with the user's latent meaning U F to interpret the preference, and, at this time, it played a role in expanding the part so that the model could focus on the user's interest.
MovieDIRec+ in the right of Figure 2 was the second model proposed in this paper. Additionally, it is a model configured so that the client and server can share the recommendation load in a practical environment. As shown in Figure 3, by placing an Auto-Encoder in the MovieDIRec model, the encoding part of Movie features could be configured as the Server-side and the decoding part as the Client-side. We constructed a model network so that as and had higher values, more attention was paid to the features corresponding to and . The features that were paid attention to according to the user's weight were finally combined with the user's latent meaning to interpret the preference, and, at this time, it played a role in expanding the part so that the model could focus on the user's interest.
MovieDIRec+ in the right of Figure 2 was the second model proposed in this paper. Additionally, it is a model configured so that the client and server can share the recommendation load in a practical environment. As shown in Figure 3, by placing an Auto-Encoder in the MovieDIRec model, the encoding part of Movie features could be configured as the Server-side and the decoding part as the Client-side.
This configuration had the following advantages in a client-server environment; First, coded could be used as a compressed feature for a movie, and the processing speed was improved as the input was reduced and the encoder layers were omitted. If the existed in the client environment, the client could proceed with inference only through , which is a coded feature, and , which is a scalar value. Second, learning can be distributed by a separate client-server model. This means split learning, and the client model performed the same function as performing batch learning based on the user simply by passing the gradient for to the server-side. Finally, by using compressed information, a candidate group could be derived by methods such as nearest neighbor search, which can dramatically improve speed [11].  However, according to [22][23][24], the matrix factorization method did not satisfy the triangle inequality caused by the dot product. Since a matrix composed of t-SVD was included among the components, it was necessary to check that the similarity check between the transformed vectors was valid. There was one interesting point here. Figure 4 is a This configuration had the following advantages in a client-server environment; First, coded M o could be used as a compressed feature for a movie, and the processing speed was improved as the input was reduced and the encoder layers were omitted. If the U F existed in the client environment, the client could proceed with inference only through M o , which is a coded feature, and M P , which is a scalar value. Second, learning can be distributed by a separate client-server model. This means split learning, and the client model performed the same function as performing batch learning based on the user simply by passing the gradient for M o to the server-side. Finally, by using compressed information, a candidate group could be derived by methods such as nearest neighbor search, which can dramatically improve speed [11].
However, according to [22][23][24], the matrix factorization method did not satisfy the triangle inequality caused by the dot product. Since a matrix composed of t-SVD was included among the components, it was necessary to check that the similarity check between the transformed vectors was valid. There was one interesting point here. Figure 4 is a graph that visualizes cases in which the triangle inequality was checked and unsatisfied for all pairs by sampling each epoch for 50 random M o as the experimental results. As shown in the graph, it was seen that the M o feature was updated in a direction that satisfied the triangle inequality by a nonlinear transformation. This was a more meaningful result as a test using a model with fixed t-SVD features, except for input training. However, according to [22][23][24], the matrix factorization method did not satisfy the triangle inequality caused by the dot product. Since a matrix composed of t-SVD was included among the components, it was necessary to check that the similarity check between the transformed vectors was valid. There was one interesting point here. Figure 4 is a graph that visualizes cases in which the triangle inequality was checked and unsatisfied for all pairs by sampling each epoch for 50 random as the experimental results. As shown in the graph, it was seen that the feature was updated in a direction that satisfied the triangle inequality by a nonlinear transformation. This was a more meaningful result as a test using a model with fixed t-SVD features, except for input training.

Experiments
In this chapter, we compared and verified the learning results of MovieDIRec and MovieDIRec+ models with various state-of-the-art recommendation systems.

Experiments
In this chapter, we compared and verified the learning results of MovieDIRec and MovieDIRec+ models with various state-of-the-art recommendation systems.

Metric
As a metric for performance evaluation, it was decided to use Recall@k and NDCG@k, which are ranking-based metrics commonly used in most approaches to compare with various state-of-the-art recommendation systems [4][5][6][7][8]22,25]. Before verification, the probabilities predicted by the implicit positive of the Softmax output were sorted and ranked for verification.
Recall@k for verification was performed as follows:r means the set predicted for implicit feedback as the sorted Softmax outputr is greater than 0.5. R is the ground truth value, which is a set of users' implicit feedback. Finally, I[ ] means indicator function.
NDCG (Normalized Discounted Cumulative Gain) is a metric mainly used in the ranking-based system. Since a metric of the same scale was required for comparison, NDCG was normalized by dividing DCG by IDCG, which was the DCG value of the ideal Appl. Sci. 2021, 11, 10412 9 of 14 result. The expression of DCG@k for implicit feedback composed of the binary class used in the proposed method was as follows:

Setup
The output of the model was composed of a Softmax output layer as a binary probability predicting implicit feedback. Binary cross entropy was used as the loss function. The parameters of the model were updated by Adam and, in the case of the input train, the updates were applied by SGD. The learning rate of both was set equal to 0.0001, Adam's epsilon was set to 1 × 10 −8 , and betas were set to (0.9, 0.999).
As comparison methods, there were records verified using Recall@k and NDCG@k metric using ML-20M dataset for accuracy comparison between methods, and systems with high accuracy were selected.

•
VASP [4]: The author proposed a model ensembled by element-wise multiplication of NEASE and FLVAE as a VAE-based Top-N recommendation system. • RaCT [5]: An efficient and scalable learning-to-rank algorithm was proposed by borrowing the actor-critic idea of reinforcement learning to approximate the ranking metric. • RecVAE [6]: The authors followed the Mult-VAE model using multinomial distribution instead of Gaussian and Bernoulli distribution, which is generally used as the likelihood function in VAE, but improved the performance by modifying the Evidence Lower Bound (ELBO) formula and detailed architecture. • CML [22]: By combining metric learning algorithms with collaborative filtering, the authors proposed a method to learn using similarity between user-user and useritem, and achieve significant speedup and approximate nearest-neighbor search with a slight decrease in accuracy.

Dataset
In the case of the Rating Dataset, the preprocessing was performed as follows, considering the negative impact on learning: (1) Filtering to have at least 5 ratings per user among 20 million data of ML-20m. (2) Among the filtered data, the movie to be evaluated consisted of only movies included in the IMDB dataset. The data were finally filtered to 51,869 users, 4714 movies, and 6,429,862 ratings. (3) Finally, the distribution was as follows, so that there was no overlapping rating in each process.
• D@1: All rating data; • D@2, 3: Data(D@2) randomly extracted from 10 ratings per user to make Drafted-Input in D@1 and remaining data(D@3); • D@4, 5: Data(D@4) extracted from D@3 of 10,000 Held-out users and the rest of the data (D@5); • D@6, 7: Data(D@6) extracted at a rate of 10% from D@5 for training and data(D@7) extracted from 20% of the remaining data for testing for input training; • D@8: Data extracted at a rate of 12% from D@4 for validation/test.
When 80% was used as train data, it had a density of about 2.6%, which was 9 times higher than 0.28%, which was the average density value of train data processed in comparative methods to compare. For an accurate comparison, the density of D@6, the train data, were extracted at a rate of 10% to lower the density to 0.26%.
On the other hand, in the case of test data D@4, the Recall@k and NDCG@k metrics used for comparison produced different results depending on the target average number of ratings and the ratio of positives, so it was essential to set them to the same level as the comparison methods. Therefore, it was extracted at a rate of 12% to approximate 14, which was the average size of validation of the comparison methods.
The steps for verification and comparison were as follows: First, we created Drafted-Input using D@2. Afterward, the entire model was trained using Drafted-Input and D@6. Finally, validation was carried out through D@8, and in the case of the input train model, the effect of the learned Drafted-Input had to be utilized, so it was tested through D@7. Table 1 is the data setting of the baselines and the proposed system. In the baselines, the density of the entire dataset was similar, so no separate processing was required, whereas the distribution of the proposed method was greatly changed by filtering by IMDB movies. Therefore, we matched the target density and the number of interactions per user in the test through a fixed ratio.  Table 2 is a comparison table for metrics with target baselines. For a more detailed comparison, we divided the two models into four models with or without input training. Each metric was calculated using D@7 and D@8, respectively, and the models were trained only by 10 epochs using D@6 in consideration of overfitting. The numbers written in the metric columns were calculated for all users and averaged. Comparing the metric values between the proposed models showed different performances depending on the environment of the model. In the case of the models undergoing input training, the NDCG values increased, but the Recall values tended to decrease. We repeated a lot of learning to interpret this phenomenon and came to the following conclusions: First, as a result of examining the output of the models subjected to Drafted-Input learning, it was observed that the positive prediction including false positive and true positive was reduced by about 10%. In addition, NDCG is a metric that is more sensitive to ranking than Recall, and Recall tends to score high when there are many positive predic-tions. Therefore, if NDCG scored high in distribution with a reduced positive prediction, it was interpreted as improved in terms of recommendation accuracy. Table 3 is a metric table that assumes a rich rating data environment without matching the density with baselines. For this, we separately created an 80% train set and a 20% validation set and trained them for 30 epochs. It could be seen that the more data, the higher the learning progress. In addition, even in the environment of Table 3, it was observed that, as in Table 2, there was a tendency to become more personalized when input training was carried out.  Figure 5a is a graph of the candidate group test result of MovieDIRec+, which consisted of the following; First, a compressed M o was generated for all movies using a pre-trained server model, and 100 clusters and centroid vectors representing each cluster were generated through a Gaussian mixture. Afterwards, preference inference was performed on all centroid vectors for 10,000 Held-out users who were not used in the train, and the results were sorted by each user. Finally, inferences about the movies in the cluster were performed for each ranked cluster. mances depending on the environment of the model. In the case of the models undergoing input training, the values increased, but the values tended to decrease. We repeated a lot of learning to interpret this phenomenon and came to the following conclusions: First, as a result of examining the output of the models subjected to Drafted-Input learning, it was observed that the positive prediction including false positive and true positive was reduced by about 10%. In addition, is a metric that is more sensitive to ranking than , and Recall tends to score high when there are many positive predictions. Therefore, if scored high in distribution with a reduced positive prediction, it was interpreted as improved in terms of recommendation accuracy. Table 3 is a metric table that assumes a rich rating data environment without matching the density with baselines. For this, we separately created an 80% train set and a 20% validation set and trained them for 30 epochs. It could be seen that the more data, the higher the learning progress. In addition, even in the environment of Table 3, it was observed that, as in Table 2, there was a tendency to become more personalized when input training was carried out.  Figure 5a is a graph of the candidate group test result of MovieDIRec+, which consisted of the following; First, a compressed was generated for all movies using a pretrained server model, and 100 clusters and centroid vectors representing each cluster were generated through a Gaussian mixture. Afterwards, preference inference was performed on all centroid vectors for 10,000 Held-out users who were not used in the train, and the results were sorted by each user. Finally, inferences about the movies in the cluster were performed for each ranked cluster. : This is a graph measuring the consumed time to recommend a candidate group for a specific user according to a change in k using MovieDIRec+. Candidate@k means that the top k among the candidates for the user was used for recommendation. x-axis means k, and y-axis means time (seconds).

Results and Analysis
As a result of the test, positive interactions were gathered in a large proportion in the clusters ranked at the top. This candidate group recommendation had a great advantage in terms of speed. Table 4 below summarizes the speed comparison of the entire model and the client's model for a specific user, including whether or not the candidate group recommendation was applied. The test environment was conducted in the same server for an accurate comparison of speed, and the target movie was 91,514 movies including unrated movies. In the case of the candidate group recommendation, it was conducted through Candidate@10. Analyzing the table, the candidate process and split model approach were very efficient methods in terms of load sharing. By limiting the candidate group to 10, the number of inference objects was reduced by 10 times, but the inference speed was increased by more than 20 times compared to the comparative model. This was not due to the reduced number of inference objects, but was judged to be a load due to the huge input data handling that occurred during the entire reasoning. Figure 5b is a graph measuring the consumed time for inference by increasing the k of Candidate@k from 1 to 100 using it as a related visualization graph. As a result, the second column of Table 4 took longer than the 2.9961 s, which was the time taken for the client model that did not use the candidate. This was a phenomenon caused by the operation of indexing the movies in each cluster together with the load caused by the huge input data handling of the total inference.

Conclusions
In this paper, we proposed MovieDIRec and MovieDIRec+ using Drafted-Input defined by us. The model prevented overfitting towards identity by configuration and enabled personalized recommendations according to user characteristics. The proposed systems were compared with state-of-the-art methods by dividing the input training case and the without input training case, and it was confirmed that they received a high score.
The proposed system was a system that operated specifically for the Movie domain. However, we judged that if the characteristics of Drafted-Input were well generated and trained, the same performance could be achieved in any domain. In particular, in terms of distribution using a split model, our approach could dramatically reduce the load on the server, and even if device learning was not performed, it would be possible to recommend at a breakthrough speed in fields that require real-time recommendation.
We plan the future work for the system proposed in this paper as follows: Combined with meta learning, it will be improved to uniquely update the client-side model in a federated environment. It was judged that this could provide an advantage in terms of personalization and user satisfaction by providing adaptation through a few-shot explicit feedback. In addition, we plan to add a configuration that allows the training of the model to largely reflect the user's recent preferences. Similar to YouTube's mechanism, changing preferences can be tracked if the user's recent interactions contribute significantly to updating user features.