Next Article in Journal
Coherent Structure of Flow Based on Denoised Signals in T-junction Ducts with Vertical Blades
Next Article in Special Issue
Bipartite Structures in Social Networks: Traditional versus Entropy-Driven Analyses
Previous Article in Journal
On Entropic Framework Based on Standard and Fractional Phonon Boltzmann Transport Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Matching Users’ Preference under Target Revenue Constraints in Data Recommendation Systems

1
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
2
School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
*
Authors to whom correspondence should be addressed.
Entropy 2019, 21(2), 205; https://doi.org/10.3390/e21020205
Submission received: 30 January 2019 / Revised: 18 February 2019 / Accepted: 18 February 2019 / Published: 21 February 2019
(This article belongs to the Special Issue Entropy and Information in Networks, from Societies to Cities)

Abstract

:
This paper focuses on the problem of finding a particular data recommendation strategy based on the user preference and a system expected revenue. To this end, we formulate this problem as an optimization by designing the recommendation mechanism as close to the user behavior as possible with a certain revenue constraint. In fact, the optimal recommendation distribution is the one that is the closest to the utility distribution in the sense of relative entropy and satisfies expected revenue. We show that the optimal recommendation distribution follows the same form as the message importance measure (MIM) if the target revenue is reasonable, i.e., neither too small nor too large. Therefore, the optimal recommendation distribution can be regarded as the normalized MIM, where the parameter, called importance coefficient, presents the concern of the system and switches the attention of the system over data sets with different occurring probability. By adjusting the importance coefficient, our MIM based framework of data recommendation can then be applied to systems with various system requirements and data distributions. Therefore, the obtained results illustrate the physical meaning of MIM from the data recommendation perspective and validate the rationality of MIM in one aspect.

Graphical Abstract

1. Introduction

Intelligent data recommendation has been one of the most fundamental techniques in wireless mobile Internet, and becomes more and more crucial in the era of big data. Note that, with the explosive growth of data, it is difficult to send all the data to the users within a tolerable time with traditional data processing technology [1,2]. By sending data in advance when the system is idle other than wasting time to wait for a clear request from users, the delay for users to acquire their desired data could be reduced significantly [3]. It is also arduous for users to find desired data among the mass of data available in the Internet [4]. In general, users can access their interested data faster and easier with the help of data recommendation systems, since they are usually well-designed based on the preferences of users [5,6]. Compared to search engines, recommendation systems are more convenient since they are less action/skill/knowledge demanding [4]. For example, many applications in mobile phones prefer to recommend some data based on the user’s interests to improve user experience. In addition, the Internet enterprises can make a profit through mobile networks by using push-based advertisement technology [7].
Most previous works mainly discussed data push based on the content to solve the problem of data delivery [3,4,5,6,7,8,9,10]. Data push is usually regarded as a strategy of data delivery in distributed information systems [3]. The architecture for a mobile push system was proposed in [4,8]. In addition, Ref. [9] put forward an effective wireless push system for high-speed data broadcasting. The push and pull techniques for time-varying data network was discussed in [10]. Furthermore, on this basis, various content-based recommendation systems were provided as information-filtering systems which push data to users based on the knowledge about their preferences [11,12]. Ref. [11] discussed the joint content recommendation, and the privacy-enhancing technology in recommendation systems was investigated in [12]. In addition, Ref. [13] put forward a personalized social image recommendation method. Furthermore, the recommendation technology was also used to solve the problem in multimedia big data [14]. Recently, user customization becomes crucial. The personalized concern of users usually can be characterized by many data properties, such as data format and keywords [4,15,16]. Instead of discussing data delivery based on content, as an alternative, we shall discuss the distribution of recommendation sequence based on users’ preference when the revenue of the recommendation system is required. In particular, we choose the frequency of data used to describe the preference of users since it has nothing to do with the concrete content.
Furthermore, we take the revenue of the recommendation system into account. That is, a recommendation process consumes resources while getting benefits. Different recommendations may bring different results. For example, the cost of omitting data that should be pushed may be much smaller than that of pushing invalid data to a user erroneously. On one hand, the users would rate the recommendation system higher if the desired data is correctly pushed. On the other hand, pushing the information that users do not ask, such as advertisements, may seriously degrade the user experience. Nevertheless, pushing advertisements does bring higher revenue. To balance the loss of different types of push errors, we shall weigh different types of push errors differently, which is similar to cost-sensitive learning [17,18,19,20].
For different application scenarios, the system focuses on different events according to the need. For example, the small-probability events captures our attention in the minority subsets detection [21,22], while someone prefers the event with high probability in support vector machine (SVM) [23]. For applications in the wireless mobile Internet, in the stage of user expansion, recommending the desired data accurately to attract more users is more important. However, in the mature stage of applications, more advertisements would be pushed to earn more, although it may degrade the user experience at some extent. This paper will mainly discuss both of these two cases.
We also present some new findings by employing the message importance [24,25,26,27,28], which characterizes the concern degree of events. Message importance measure (MIM) was proposed to characterize the message importance of events which can be described by discrete random variables where people pay most attention to the small-probability ones, and it highlights the importance of minority subsets [29]. In fact, it is an extension of Shannon entropy [30,31] and Rényi entropy [32] from the perspective of Fadeev’s postulates [32,33]. That is, the first three postulates are satisfied for all of them, and MIM weakens the fourth postulates on the foundation of Rényi entropy. Moreover, the logarithmic form and polynomial form are adopted in Shannon entropy and Rényi entropy, respectively, while MIM uses the exponential form. Ref. [34] showed that MIM focuses on a specific event by choosing a corresponding importance coefficient. In fact, MIM has a wide range of applications in big data, such as compressed storage and communication [35] and mobile edge computing [36].
Note that a superior recommendation sequence should resemble that generated by the user himself, which means that the data recommendation mechanism agrees with user behavior. To this end, the probability distribution of the recommendation sequence and the utilization frequency of the user data should be as close as possible in the statistical sense. According to [37], this means that the relative entropy between the distribution of recommendation sequence and that of user data should be minimized. We assume that the recommendation model pursues best user performance with a certain revenue guarantee in this paper.
In this paper, we first find a particular recommendation distribution on the best effort in maximizing the probability of observing the recommendation sequence with the utilization frequency of the user data when the expected revenue is provided. Then, its main properties, such as monotonicity and geometrical characteristic, are fully discussed. This optimal recommendation system can be regarded as an information-filtering system, and the importance coefficient determines what events the system prefers to recommend. The results also show that excessively low expectation of revenue can not constrain recommendation distribution and exorbitant expectation of revenue makes the recommendation system impossible to design. The constraints on the recommendation distribution are true if the minimum average revenue is neither too small nor too large, and there is a trade-off between the recommendation accuracy and the expected revenue.
It is also noted that the form of this optimal recommendation distribution is the same as that of MIM when the minimum average revenue is neither too small nor too large. The optimal recommendation distribution is determined by the proportion of recommendation value of the corresponding event in total recommendation value, where the recommendation value is a special weight factor. The recommendation value can be seen as a measure of message importance, since it satisfies the postulates of the message importance. Due to the same form with MIM, the optimal recommendation probability can be given by the normalized message importance measure when MIM is used to characterize the concern of system. Furthermore, when importance coefficient is positive, the small-probability events will be given more attention, which is magnifying the importance index of small-probability events and lessening that of high-probability events. Therefore, we confirm the rationality of MIM from another perspective in this paper for characterizing the physical meaning of MIM by using data recommendation system rather than based on information theory. In addition, we expand MIM to the general case whatever the probability of systems interested events is. Since the importance coefficient determines what events set systems are interested in, we can switch to different application scenarios by means of it. That is, advertising systems are discussed if the importance coefficient is positive, while noncommercial systems are adopted if importance coefficient is negative. Compared with previous works about MIM [29,34,35], most properties of optimal recommendation distribution is the same, but a clear definition of the desired event set can be given in this paper. The relationship between utility distribution and MIM was preliminarily discussed in [38].
The main contributions of this paper can be summarized as follows. (1) We put forward an optimal recommendation distribution that makes the recommendation mechanism agree with user behavior with a certain revenue guarantee, which can improve the design of recommendation strategy. (2) We illuminate that this optimal recommendation distribution is normalized message importance, when we use MIM to characterize the concern of systems, which presents a new physical explanation of MIM from data recommendation perspective. (3) We expand MIM to the general case, and we also discuss the importance coefficient selection as well as its relationship with what events systems focus on.
The rest of this paper is organized as follows. The setup of optimal recommendation is introduced in Section 2, including the system model and the discussion of constraints. In Section 3, we solve the problem of optimal recommendation in our system model, and give complete solutions. Section 4 investigates the properties of this optimal recommendation distribution. The geometric interpretation is also discussed in this part. Then, we discuss the relationship between this optimal recommendation distribution and MIM in Section 5. It is noted that recommendation distribution can be seen as normalized message importance in this case. The numerical results is shown and discussed to certificate our theoretical results in Section 6. Section 7 concludes the paper. In addition, the main notations in this paper are listed in Table 1.

2. System Model

We consider a recommendation system with N classes of data, as shown in Figure 1. In fact, data is often stored based on its categories for the convenience of indexing. For example, the news website usually classifies news into the following categories: politics, entertainment, business, scientific, sports, and so on. At each time instant, the information source generates a piece of data, which belongs to a certain class with probability distribution Q. In general, the generated data sequence does not match the preference of the user. To optimize the information transmission process, therefore, a recommendation unit is used to determine whether the generated data should be pushed to the user with some deliberately designed probability distribution P. One the one hand, the recommendation unit can make predictions of the user’s needs and push some data to the user before he actually starts the retrieval process. In doing so, the transmission delay can be largely reduced, especially when the data amount is large. On the other hand, the recommending unit enables non-expert users to search and to access their desired data much easier. Furthermore, we can profit more by pushing some advertisements to the user.

2.1. Data Model

We refer to the empirical probability mass function of the class indexes over the whole data set as the raw distribution and denote it as Q = { q 1 , q 2 , , q N } . We refer to the probability mass function of users’ preferences over the classes as the utility distribution and denote it as U = { u 1 , u 2 , , u N } . That is, for each piece of data, it belongs to class i with probability q i and would be preferred by the user with probability u i . To fit the preference of the user under some target revenue constraint, the system will make random recommendations according to a recommendation distribution P = { p 1 , p 2 , , p N } .
We assume that each piece of data belongs to one and only one of the N sets. That is, S i S j = for i j , where S i is the set of data belonging to the i-th class. Thus, the whole data set would be S = S 1 S 2 S N 1 S N . The raw distribution can be expressed as q i = Pr { d S i } = crad ( S i ) / crad ( S ) . In addition, the utility distribution U can be obtained by studying the data-using behavior of a specific group of users and thus, in this paper, is assumed to be known prior.
For traditional data push, we usually expect to make | u ( t ) s ( t ) | smaller than a given value [10]. Different from them, we do not consider this problem based on content. As an alternative, our goal is to find the optimal recommendation distribution P so that the recommended data would fit the preference of the user as much as possible. To be specific, each recommended sequence of data should resemble the desired data sequence of the user in the statistical sense. For a sequence of user’s favorite data, let u n be the corresponding class indexes. As n goes to infinity, it is clear that u n T ( U ) with probability one, where T ( U ) is the typical set under distribution U. That is, Pr { 1 n log Pr ( u n ) + H ( U ) = 0 } = 1 , where Pr ( u n ) is the occurring probability of u n and H ( U ) is the joint entropy of U [37]. Since the class-index sequence r n of recommended data is actually generated with distribution P, the probability that r n falls in the typical set T ( U ) of distribution U would be Pr { r n T ( U ) } = . 2 n D ( P U ) , where D ( P U ) is the relative entropy between P and U [37]. It is clear that the optimal P would maximizes probability Pr { r n T ( U ) } , which is equivalent to minimizing the relative entropy
D ( P U ) .
In particular, our desired recommendation distribution P is not exactly the same as the utility distribution of the user because we would also like to intentionally push some advertisements to the users to increase our profit.

2.2. Revenue Model

We assume that the user divides the whole data set into two parts, i.e., the desired ones and the unwanted ones (e.g., advertisements). At the recommendation unit, the data can also be classified into two types according to whether it is recommended to the user. Different push types may strikingly lead to different results. For example, the cost of omitting data that should be pushed may be much smaller than that of erroneously pushing invalid data to a user. The user experience will be enhanced if the data needed by users is correctly pushed. Pushing some not needed content to users, such as some advertisements, may seriously degrade the user experience, but it still can bring in advertising revenue for the content delivery enterprise. Using a similar revenue model as that in cost-sensitive learning [17,18,19,20], we shall evaluate the revenue of the recommendation system as follows:
  • The cost of making a recommendation is C p ;
  • The revenue of a recommendation when the pushed data is liked by the user is R p ;
  • The cost of a recommendation when the pushed data is not liked is C n ;
  • The revenue of a recommendation when the pushed data is not liked (but can serve as an advertisement) is R a d ;
  • The cost of missing to recommend a piece of desired data of the user is C m ;
Therefore, the revenue of recommending a piece of data belonging to class i can be summarized in Table 2.
Moreover, the corresponding matrix of occurring probability is given by Table 3.
In this paper, we assume C p , R p , C n , R a d , C m are constraints for a given recommendation system for simplicity. The expected system revenue can then be expressed as
R ̲ ( P ) = i = 1 N ( R p C p ) p i u i + ( R a d C p C n ) p i ( 1 u i ) C m ( 1 p i ) u i
= ( R p + C n + C m R a d ) ( 1 i = 1 N p i u i ) + R p C p
= ( R p + C n + C m R a d ) i = 1 N p i ( 1 u i ) + R p C p .

2.3. Problem Formulation

In this paper, we consider the following three kinds of recommendation systems: (1) the adverting system where recommending unwanted advertisements yields higher revenue; (2) the noncommercial system where recommending user’s desired data brings higher revenue; (3) the neutral system where the system revenue is independent from recommendation probability P. For each given target revenue constraint R ̲ ( P ) β and each kind of system, we shall optimize the recommendation distribution P by minimizing D ( P | | U ) . In particular, the following auxiliary variable is used:
α = β + C p R p R a d R p C n C m .

2.3.1. Advertising Systems

In an advertising system, recommending a piece of user’s unwanted data (an advertisement) yields higher revenue. Since the revenue of recommending an advertisement is the main source of income in this case, which is larger than other revenue and cost, the advertising system would satisfy the following condition:
C 1 : R p + C n + C m R a d < 0 .
By combining Labels (4)–(6), it is clear that the constraint R ̲ ( P ) β is equivalent to
i = 1 N p i ( 1 u i ) α .
For advertising systems, therefore, the feasible set of recommendation distribution P can be expressed as
E 1 = P : i = 1 N p i ( 1 u i ) α | R a d > R p + C n + C m .
For a given target revenue β , we shall solve the optimal recommendation distribution P of advertising systems from
P 1 : arg min P E 1 D ( P U )
s . t . i = 1 N p i ( 1 u i ) α
i = 1 N p i = 1 .

2.3.2. Noncommercial Systems

A noncommercial system is defined as a recommendation system where the revenue R p C p of recommending a piece of desired data is larger than the sum of revenue R a d C p C n of recommending an advertisement and the cost of not recommending a piece of desired data C m . That is,
C 2 : R p + C n + C m R a d > 0 .
Accordingly, the constraint R ̲ ( P ) β is equivalent to
i = 1 N p i ( 1 u i ) α .
Therefore, the feasible set of recommendation distribution P for noncommercial systems can be expressed as
E 2 = P : i = 1 N p i ( 1 u i ) α | R a d < R p + C n + C m .
Afterwards, we can solve the optimal recommendation distribution P through the following optimization problem:
P 2 : arg min P E 2 D ( P U )
s . t . i = 1 N p i ( 1 u i ) α
i = 1 N p i = 1 .

2.3.3. Neutral Systems

For the case R p + C n + C m R a d = 0 , the corresponding expected system revenue degrades to
R ̲ ( P ) = R p C p
and is independent from the recommendation distribution P. As long as the target revenue satisfies β < R p C p , the constraint R ̲ ( P ) β can be met by any recommendation distribution. Therefore, the recommendation distribution can be chosen as P = U .

2.3.4. Discussion of Systems

We note that in the scenario of the wireless mobile Internet, each new application needs to attract more users (i.e., grab larger market share) with excellent user experience in its early stage. After the market share and the user groups being stable, the application can earn money by pushing some advertisements at the cost of some degradations in user experience.
Noncommercial systems usually appear in the stage of user expanding. In order to enlarge market share, the main tasks of this stage are to attract more users and to convince them of the recommendation systems with excellent user experience. Therefore, the revenue by recommending desired data of users should be larger than that of recommending advertisements. To be specific, the desired data of users would be recommended with higher probability to increase the target revenue R ̲ ( P ) . Since R ̲ ( P ) is decreasing with i = 1 N p i ( 1 u i ) , those high-probability events are more important in this case.
Advertising systems are usually adopted in the mature stage of applications where users have been accustomed to the recommendation system and some advertisements are acceptable. Therefore, the applications would increase the revenue of pushing advertisements and the actual number of advertisement recommendations. To be specific, the desired data of users should be recommended with relatively small-probability while advertisements should be recommended with higher probability. In this sense, the small-probability events are more important here.
Remark 1.
Since P : R ̲ ( P ) β = E 1 E 2 P : R ̲ ( P ) β | R a d = R p + C n + C m , these three kinds of recommendation systems cover all the cases of this problem.
Different from the other recommendation systems based on content [11,13,14], the recommendation systems in this paper discuss the distribution of recommendations based on the preference of users and the target revenue of the recommendation system. Thus, we have focused on the integrated planning of a sequence of recommendations other than the recommendation of a specific piece of data.

3. Optimal Recommendation Distribution

In this part, we shall present the optimal recommendation distribution for both advertising systems and noncommercial systems explicitly. We define an auxiliary variable and an auxiliary function as follows:
γ u = i = 1 N u i 2 ,
g ( ϖ , V ) = i = 1 N v i 2 e ϖ ( 1 v i ) i = 1 N v i e ϖ ( 1 v i ) ,
where ϖ ( , + ) is a constant and V = { v 1 , v 2 , , v N } is a general probability mass function. Actually, we have γ u = e H 2 ( U ) where H 2 ( U ) is the Rényi entropy H α ( · ) when α = 2 [39].
In particular, we have the following lemma on g ( ϖ , V ) .
Lemma 1.
Function g ( ϖ , V ) is monotonically decreasing with ϖ.
Proof of Lemma 1.
Refer to Appendix A. □
Lemma 2.
g ( 0 , V ) = i = 1 N v i 2 , g ( , V ) = v max , and g ( + , V ) = v min , where v max = max { v 1 , v 2 , , v N } and v min = min { v 1 , v 2 , , v N } .
Proof of Lemma 2.
Refer to Appendix B. □
It is clear that we have g ( 0 , P ) = γ p and g ( 0 , U ) = γ u .

3.1. Optimal Advertising System

Theorem 1.
For an advertising system with R p + C n + C m R a d < 0 , the optimal recommendation distribution is the solution of Problem P 1 and is given by
p x * = u x i f α 1 γ u u x e ϖ * ( 1 u x ) i = 1 N u i e ϖ * ( 1 u i ) i f 1 γ u < α 1 N a N i f α > 1 , ,
for 1 x N , where α is defined in Label (5), γ u is defined in Label (15), NaN means no solution exists, and ϖ * > 0 is the solution to g ( ϖ , U ) = 1 α .
Proof of Theorem 1.
First, if α > 1 , no solution exists since i = 1 N p i ( 1 u i ) is always smaller than or equal to one and the constraint (9a) can never be satisfied.
Second, if α 1 γ u , we have i = 1 N p i ( 1 u i ) i = 1 N u i ( 1 u i ) = 1 γ u α . That is, the constraint (9a) could by satisfied with any distribution P. Thus, the solution to Problem P 1 would be P = U , which minimizes D ( P | | U ) .
Third, if 1 γ u < α 1 , we shall solve Problem P 1 based on the following Karush–Kuhn–Tucher (KKT) conditions:
P L ( P , λ , μ ) = P i = 1 N p i ln p i u i + λ α i = 1 N p i ( 1 u i ) + μ i = 1 N p i 1 = 0 ,
λ ( α i = 1 N p i ( 1 u i ) ) = 0 ,
i = 1 N p i 1 = 0 ,
α i = 1 N p i ( 1 u i ) 0 ,
λ 0 .
Differentiating P L ( P , λ , μ ) with respect to p x * and set it to zero, we have
ln p x * + 1 ln u x λ ( 1 u x ) + μ = 0 ,
and thus p x * = u x e λ ( 1 u x ) μ 1 . Together with constraint (18b), we further have e μ + 1 = i = 1 N u i e λ ( 1 u i ) and
p x * = u x e λ ( 1 u x ) i = 1 N u i e λ ( 1 u i ) ,
where λ 0 is the solution to i = 1 N p i * ( 1 u i ) = α , i.e., g ( λ , U ) = 1 α . By substituting λ with ϖ * , the desired result in Label (17) would be obtained.
The condition 1 γ u < α 1 implies g ( ϖ * , U ) = 1 α γ u = g ( 0 , U ) . Since g ( ϖ , U ) has been shown to be monotonically decreasing with ϖ in Lemma 1, we then have ϖ * > 0 . This completes the proof of Theorem 1. □
Remark 2.
We denote
β 0 = ( 1 γ u ) ( R a d R p C n C m ) + R p C p ,
β a d = ( R a d R p C n C m ) u min + R a d C p C n C m ,
and have
  • α 1 γ u is equivalent to β β 0 , which means that the target revenue is low and can be achieved by pushing data according to the preference of the user exactly.
  • 1 γ u < α 1 is equivalent to β 0 < β β a d , which means that the target revenue can only be achieved when some advertisements are pushed according to probability p x * .
  • 1 < α is equivalent to β > β a d , which means that the target is too high to achieve, even when advertisements are pushed with probability one.

3.2. Optimal Noncommercial System

Theorem 2.
For a noncommercial system with R p + C n + C m R a d > 0 , the optimal recommendation distribution is the solution of Problem P 2 and is given by
p x * = N a N i f α < 0 , u x e ϖ * ( 1 u x ) i = 1 N u i e ϖ * ( 1 u i ) i f 0 α < 1 γ u , u x i f 1 γ u α ,
for 1 x N , where α is defined in Label (5), γ u is defined in Label (15), NaN means no solution exists, and ϖ * < 0 is the solution to g ( ϖ , U ) = 1 α .
Proof of Theorem 2.
First, if α < 0 , no solution exists since i = 1 N p i ( 1 u i ) is positive and cannot be smaller a negative number, and thus the constraint (13b) can never be satisfied.
Second, if α 1 γ u and we set p x * = u x , we would have i = 1 N p i * ( 1 u i ) = 1 γ u α , i.e., constraint (13a) is satisfied. Since setting p x * = u x minimizes D ( P | | U ) , P = U should be the solution of Problem 2.
Third, if 0 α < 1 γ u , we shall solve Problem P 2 using the following KKT conditions:
P L ( P , λ , μ ) = P i = 1 N p i ln p i u i + λ i = 1 N p i ( 1 u i ) α + μ i = 1 N p i 1 = 0 ,
λ ( i = 1 N p i ( 1 u i ) α ) = 0 ,
i = 1 N p i 1 = 0 ,
i = 1 N p i ( 1 u i ) α 0 ,
λ 0 .
Differentiating P L ( P , λ , μ ) with respect to p x * and set it to zero, we have
ln p x * + 1 ln u x + λ ( 1 u x ) + μ = 0 ,
and p x * = u x e λ ( 1 u x ) μ 1 . Together with constraint (24b), we then have e μ + 1 = i = 1 N u i e λ ( 1 u i ) and
p x * = u x e λ ( 1 u x ) i = 1 N u i e λ ( 1 u i ) ,
where λ > 0 is the solution to i = 1 N p i * ( 1 u i ) = α .
By denoting ϖ * = λ , (26) turns to be p x * = u x e ϖ * ( 1 u x ) i = 1 N u i e ϖ * ( 1 u i ) which is the desired result in Label (23).
Moreover, the condition 0 α < 1 γ u implies g ( ϖ * , U ) = 1 α γ u = g ( 0 , U ) . Since g ( ϖ , U ) is monotonically decreasing with ϖ (cf. Lemma 1), we see that ϖ * < 0 . Thus, Theorem 2 is proved. □
Remark 3.
We denote
β n o = ( R a d R p C n C m ) u max + R a d C p C n C m ,
and have the following observations.
  • α < 0 is equivalent to β β n o , which means that the target revenue is high and cannot be achieved by any recommendation distribution.
  • 0 α < 1 γ u is equivalent to β 0 < β β n o , which means that the target revenue is not too high and the information is pushed according to probability p x * . The user experience is limited by the target revenue in this case.
  • 1 γ u α is equivalent to β β 0 , which can be achieved by pushing data according to the preference of the user exactly.

3.3. Short Summary

We further denote
β n e = R p C p ,
and the optimal recommendation distributions for various systems and various target revenues can then be summarized in Table 4.
Cases ③, ⑤, and ⑧ are extreme cases where the target revenue is beyond the reach of the system. For cases ①, ④, and ⑦, the target revenue is low ( β < β n o or β < β a d ), and thus is easy to achieve. In particular, the constraints (9a) and (13a) are actually inactive. Thus, the optimal recommendation distribution would be exactly the same with the utility distribution.
Cases ② and ⑥ are more practical and meaningful due to the appropriate target revenues used. To further study the property of the optimal recommendation distribution of these two cases, the following function is introduced:
f ( ϖ , x , U ) = u x e ϖ ( 1 u x ) i = 1 N u i e ϖ ( 1 u i ) ,
where ϖ ( , + ) .
In doing so, the optimal recommendation distribution of cases ② and ⑥ can be jointly expressed as
p x * = f ( ϖ * , x , U ) ,
where ϖ * is the solution to g ( ϖ , U ) = 1 α .
In particular, f ( ϖ * , x , U ) presents the optimal solution of case ② if ϖ * > 0 and presents the solution of case ⑥ if ϖ * < 0 . Moreover, when ϖ = 0 , we have
f ( 0 , x , U ) = u x ,
which can be considered as the solution to cases ①, ④, and ⑦.

4. Property of Optimal Recommendation Distributions

In this section, we shall investigate how the optimal recommendation distribution diverges with respect to the utility distribution, in various systems and under various target revenue constraints. To do so, we first study the property of function f ( ϖ , x , U ) , where ϖ x and ϖ ˜ x 0 are, respectively, the solution to the following equations:
u x = g ( ϖ , U ) ,
u x = f ( ϖ , x , U ) .

4.1. Monotonicity of f ( ϖ , x , U )

Theorem 3.
f ( ϖ , x , U ) has the following properties:
(1) 
f ( ϖ , x , U ) is monotonically increasing with ϖ in ( , ϖ x ) ;
(2) 
f ( ϖ , x , U ) is monotonically decreasing with ϖ in ( ϖ x , + ) ;
(3) 
ϖ x is decreasing with u x , i.e, ϖ y < ϖ x if u y > u x ;
(4) 
ϖ x < 0 if u x > γ u ;
ϖ x = 0 if u x = γ u ;
ϖ x > 0 if u x < γ u .
Proof of Theorem 3.
(1) and (2) The derivative of ( ϖ , x , U ) with respect to ϖ can be expressed as
f ( ϖ , x , U ) ϖ = u x e ϖ ( 1 u x ) ( 1 u x ) i = 1 N u i e ϖ ( 1 u i ) u x e ϖ ( 1 u x ) i = 1 N u i e ϖ ( 1 u i ) ( 1 u i ) i = 1 N u i e ϖ ( 1 u i ) 2
= u x e ϖ ( 1 u x ) i = 1 N u i ( u i u x ) e ϖ ( 1 u i ) i = 1 N u i e ϖ ( 1 u i ) 2 .
Since we have u x e ϖ ( 1 u x ) > 0 and i = 1 N u i e ϖ ( 1 u i ) 2 > 0 , the sign of the derivative only depends on the term i = 1 N u i ( u i u x ) e ϖ ( 1 u i ) .
Lemma 1 and Lemma 2 show that g ( ϖ , U ) is monotonically decreasing with ϖ and ϖ x is the unique solution to equation u x = g ( ϖ , U ) . For ϖ < ϖ x , therefore, we then have g ( ϖ , U ) > g ( ϖ x , U ) = u x , which is equivalent to i = 1 N u i ( u i u x ) e ϖ ( 1 u i ) > 0 . Thus, we have f ( ϖ , x , U ) ϖ > 0 and f ( ϖ , x , U ) is increasing with ϖ if ϖ < ϖ x . Likewise, it can be readily proved that f ( ϖ , x , U ) is decreasing with ϖ if ϖ > ϖ x .
(3) Suppose g ( ϖ x , U ) = u x , g ( ϖ y , U ) = u y and u x < u y , we would have, g ( ϖ x , U ) < g ( ϖ y , U ) . Since g ( ϖ ) is decreasing with ϖ (cf. Lemma 1), we then have ϖ x > ϖ y , i.e., ϖ x is decreasing with u x .
(4) First, we have g ( 0 , U ) = γ u by the definition of g ( ϖ , U ) (cf. (16)). Using the monotonicity of g ( ϖ , U ) with respect to ϖ (cf. Lemma 1), we have ϖ x < 0 if u x > γ u and ϖ x > 0 if u x < γ u .
Thus, the proof of Theorem 3 is completed. □
In particular, according to Lemma 2, if u x = u min , ϖ x will approach positive infinity, while ϖ x will approach negative infinity if u x = u max .
Remark 4.
(1) 
f ( ϖ , x , U ) is monotonously decreasing with ϖ if u x = u max ;
(2) 
f ( ϖ , x , U ) is monotonously increasing with ϖ if u x = u min .
Remark 5.
We denote
β x = ( R a d R p C n C m ) u x C p C n C m + R a d ,
and the relationships between f ( ϖ , x , U ) and β in different systems are shown as follows.
  • For advertising systems,
    (1) 
    f ( ϖ , x , U ) is monotonically decreasing with β in ( β 0 , β a d ) if u x γ u ;
    (2) 
    f ( ϖ , x , U ) is monotonically increasing with β in ( β 0 , β x ) and monotonically decreasing in ( β x , β a d ) if u x < γ u .
  • For noncommercial systems,
    (1) 
    f ( ϖ , x , U ) is monotonically decreasing with β in ( β 0 , β n o ) if u x γ u ;
    (2) 
    f ( ϖ , x , U ) is monotonically increasing with β in ( β 0 , β x ) and monotonically decreasing in ( β x , β n o ) if u x < γ u .

4.2. Discussion of Parameters

Assume that there is unique minimum p min and unique maximum p max in utility distribution U. Without loss of generality, let u 1 < u 2 u t γ u < u t + 1 u N 1 < u N , and u min = u 1 and u max = u N . P * = ( p 1 * , p 2 * , . . . , p N * ) is used to denote the optimal recommendation distribution. In addition, we only discuss the relationship between the optimal recommendation distribution and the parameters ( β and ϖ ) in Cases ② and ⑥ in this part.
In fact, we have the following proposition on ϖ ˜ x .
Proposition 1.
ϖ ˜ x has the following properties:
(1) 
ϖ ˜ x exists when u x γ u , u x u max and u x u min ;
(2) 
ϖ ˜ x < 0 if u x > γ u ;
ϖ ˜ x > 0 if u x < γ u ;
(3) 
ϖ ˜ x is decreasing with u x , i.e, ϖ ˜ y < ϖ ˜ x if u y > u x .
Proof of Proposition 1.
Refer to Appendix C. □
For convenience, we denote ϖ ˜ 1 = Δ + and ϖ ˜ N = Δ .
For advertising systems, the optimal recommendation distribution is given by (17) and ϖ * > 0 (cf. Theorem 1). As ϖ * + , we have
p 1 * = f ( + , 1 , U ) = lim ϖ * + u min e ϖ * ( 1 u min ) i = 1 N p i e ϖ * ( 1 u i )
= lim ϖ * + u min e ϖ * ( 1 u min ) u min e ϖ * ( 1 u min ) + u i u min u i e ϖ * ( 1 u i )
= lim ϖ * + 1 1 + u i u min u i u min e ϖ * ( u min u i )
= 1 .
Obviously, p k * = f ( + , k , U ) = 0 when k 2 . Therefore, the utility distribution P * is ( 1 , 0 , 0 , , 0 ) here.
Based on Proposition 1, if 0 < ϖ ˜ N 3 + 1 < ϖ * < ϖ ˜ N 3 ( 1 N 3 N 1 ), we have p x * > u x for 1 x N 3 and p x * < u x for N 3 + 1 x N . The amount of optimal recommendation probability which is larger than corresponding utility probability is N 3 . Let N 4 = N N 3 . N 3 decreases with increasing of ϖ * (cf. Proposition 1). In particular, if ϖ * > ϖ ˜ 2 > 0 , only the recommendation probability of the event with the smallest utility probability is enlarged and that of other events is reduced, compared to the corresponding utility probability. As the parameter ϖ * approaches positive infinity, the recommendation distribution will be ( 1 , 0 , 0 , , 0 ) . In conclusion, this recommendation system is a special information-filtering system which prefers to push the small-probability events.
For noncommercial systems, we can get the optimal recommendation distribution by (23) and ϖ * < 0 (cf. Theorem 2). As ϖ , it is noted that
p N * = f ( , N , U ) = lim ϖ u max e ϖ ( 1 u max ) i = 1 N u i e ϖ ( 1 u i )
= lim ϖ u max e ϖ ( 1 u max ) u max e ϖ ( 1 u max ) + u i u max u i e ϖ ( 1 u i )
= lim ϖ 1 1 + u i u max u i u max e ϖ ( u max u i )
= 1 ,
and p k * = f ( , k , U ) = 0 when k N 1 . Hence, P * = ( 0 , 0 , .. , 0 , 1 ) in this case.
If ϖ ˜ K + 1 < ϖ < ϖ ˜ K < 0 ( 1 K N 1 ), p x * < u x for 1 x K and p x * > u x for K + 1 x N . The number of optimal recommendation probability which is larger than corresponding utility probability is N K . Let N 2 = N K and N 1 = K . It is noted that N 2 decreases with decreasing of ϖ (cf. Proposition 1). In particular, if ϖ < ϖ ˜ N 1 < 0 , only the recommendation probability of the event with largest utility probability is enlarged and that of other events is reduced, compared to the corresponding utility probability. As the parameter ϖ approaches negative infinity, the push distribution will be ( 0 , 0 , , 0 , 1 ) . In this case, the high-probability events are preferred to be pushed by this recommendation system.
Let the optimal recommendation distribution be equal to the utility distribution, i.e., P * = U , and we have
u j = u j e ϖ ( 1 u j ) i = 1 N u i e ϖ ( 1 u i ) = f ( ϖ , j , U ) , 1 j N .
It is noted that ϖ = 0 is the solution of (38) according to (31). Since f ( ϖ , 1 , U ) is monotonously increasing with ϖ (cf. Remark 4), f ( ϖ , 1 , U ) f ( 0 , 1 , U ) = u 1 when ϖ 0 . Thus, there exists one and only one root for (38), which is ϖ = 0 , and P * = U in this case. Here, all the data types are fairly treated.
For convenience, the relationship between parameters, the optimal recommendation distribution and the utility distribution is summarized in Table 5.
In fact, the recommendation system can be regarded as an information-filtering system which push data based on the preferences of users [12]. The input and output of this information-filtering system can be seen as utility distribution and optimal recommendation distribution, respectively. For advertising systems, compared to the input, on the output port, the recommendation probability of data belonging to the set { S i | 1 i K } is amplified, and that belonging to the set { S i | K + 1 i N } is minified, where 1 K N 1 and 0 < ϖ ˜ K + 1 < ϖ < ϖ ˜ K . Since u 1 < u 2 u N 1 < u N , the advertising system is a special information-filtering system which prefers to push the small-probability events. For noncommercial systems, the data with higher utility probability, i.e., { S i | K + 1 i N } , is more likely to be pushed, and the data with smaller utility probability, i.e., { S i | 1 i K } , tends to be overlooked, where ϖ ˜ K + 1 < ϖ < ϖ ˜ K < 0 and 1 K N 1 . Since u 1 < u 2 u N 1 < u N , the high-probability events are preferred to be pushed by the noncommercial system.
Remark 6.
The recommendation system can be regarded as an information-filtering system, and parameter ϖ is like an indicator which can reflect the system preference. If ϖ > 0 , the system is a advertising system, which prefers to recommend the small-probability events, while the system is a noncommercial system, which is more likely to recommend the high-probability events if ϖ < 0 . In particular, the system pushes data according to the preference of the user exactly if ϖ = 0 .

4.3. Geometric Interpretation

In this part, we will give a geometric interpretation of optimal recommendation distribution by means of probability simplex. Let P denote all the probability distributions from an alphabet { S 1 , S 2 , , S N } , and point U denote the utility distribution. In addition, P is the notation to denote recommendation distribution and P * denotes the optimal recommendation distribution. With the help of the method of types [37], we have Figure 2 to characterize the generation of optimal recommendation distribution.
In Figure 2, all cases can be grouped into three major categories, i.e., R p + C n + C m R a d < 0 , R p + C n + C m R a d = 0 and R p + C n + C m R a d > 0 . In fact, these three categories are advertising systems, neutral systems and noncommercial systems, which are denoted by triangles Δ A B U , Δ B C U and Δ A C U , respectively. Triangles Δ A 1 B 1 C 1 denotes the region where the average revenue is equal to or larger than a given value. In this paper, our goal is to find the optimal recommendation distribution so that the recommended data would fit the preference of the user as much as possible when the average revenue satisfies the predefined requirement. Since the matching degree between the recommended data and the user’s preference is characterized by 2 n D ( P U ) (cf. Section 2.1), P * should be a recommendation distribution on the border of or in the triangle Δ A 1 B 1 C 1 , which is closest to utility distribution U in relative entropy. Therefore, there is no solution if recommendation distribution P falls within regions ③, ⑤ and ⑧.
For advertising systems, P can be divided into regions ①②③. Obviously, P * = U for region ① since U falls within region ①. P * in ② is the recommendation distribution closest to U in relative entropy. It is noted that 2 n D ( P * U ) > 2 n D ( P U ) if P P * and P . There is no solution in region ③ since P Δ A 1 B 1 C 1 . There is a similar situation for noncommercial systems, which are composed of regions ⑥, ⑦ and ⑧. There are only two regions for neutral systems. In this case, there is no solution when P , and P * = U when P .
Furthermore, triangle Δ A B U characterizes the set E 1 and triangle Δ A C U characterizes the set E 2 .

5. Relationship between Optimal Recommendation Strategy and MIM

5.1. Normalized Recommendation Value

In this section, we shall focus on the relationship between optimal recommendation distribution and MIM in Cases ② and ⑥. In fact, the optimal recommendation distributions in other cases are invariable and it makes little sense to discuss the relationship between them.
Within a period of time, the parameters C p , R p , C n , R a d , C m can be seen as invariants of a given recommendation system, and they do not change with the users. The strategy recommendation is determined by the personalized user concern and the expected revenue in this paper. The former is characterized by the utility distribution, and the latter is denoted by β . Based on the discussion in Section 3.3, there is one-to-one mapping between the parameter ϖ and the minimum average revenue β . Thus, once the recommendation environment is determined, we only need to select the proper parameter ϖ * to get the optimal recommendation distribution based on the utility distribution, and here ϖ * is chosen to satisfy the expectation of the average revenue.
The form of optimal recommendation distribution in Label (29) suggests that we allocate recommendation proportion by weight factor u i e ϖ * ( 1 u i ) , where U = { u 1 , u 2 , , u N } is the utility distribution. In a sense, we take the weight factor u i e ϖ * ( 1 u i ) as the recommendation value of data belonging to the i-th class, and we generate optimal recommendation distribution based on the recommendation value. It is noted that the optimal recommendation distribution is the normalized recommendation value. Furthermore, the total recommendation value of this user is i = 1 N u i e ϖ ( 1 u i ) .
In fact, the recommendation values are the subjective view, and they are not objective quantity in nature. They show the relative level of recommendation tendency. If all the recommendation values multiply by the same constant, the suggested recommendation distribution will not change.

5.2. Optimal Recommendation and MIM

Generally speaking, there must be a reason of the fact that the recommendation strategy prefers specific data. According to the discussion in Remark 6, advertising systems prefer the small-probability events (i.e., advertisement), and they push as many small-probability events as possible with the increasing of the minimum average revenue since pushing small-probability events is the main source of income for this system. Noncommercial systems prefer the high-probability events, and they push as many high-probability events as possible with the increasing of the minimum average revenue since recommending a piece of desired data is the main source of income for this system. The preference of the recommendation system means that the system has its own evaluation of the degree of importance for different data. The system prefers to recommend the important data in order to achieve the recommendation target. Thus, the recommendation distribution is determined by the importance of data. That is, the recommendation probability of data belonging to the i-th class is the proportion of recommendation value u i e ϖ * ( 1 u i ) in total recommendation value. In this sense, the recommendation value gives intuitive description of message importance. Furthermore, their relative magnitudes are more significant than their absolute magnitude. For a given parameter, the recommendation value is a quantity with respect to probability distribution. Hence, the recommendation value can be seen as a measure of message importance from this viewpoint, which agrees with the main general principles for definition of importance in previous literature [24,25,26,27,28]. Moreover, the total recommendation value characterizes the total message importance.
In fact, the form of this importance measure is extremely similar to that of MIM [29]. MIM is proposed to measure the message importance in the case where small-probability events contain most valuable information and the parameter ϖ in MIM is called importance coefficient. The importance coefficient is always larger than zero in MIM. Furthermore, the MIM highlights the importance of minority subsets and ignores the high-probability event by taking them for grant. As stated in Remark 6, the small-probability events are highlighted when ϖ > 0 . Therefore, MIM is consistent with the conclusion of this paper.
In addition, as the parameter increased to being sufficiently large, the message importance of the event with minimum probability dominates the MIM according to [29,35]. It is the same with recommendation value since lim ϖ + u min e ϖ ( 1 u min ) i = 1 N p i e ϖ ( 1 u i ) = 1 (cf. (36)). Furthermore, if ϖ is not a very large positive number, the form of f ( ϖ , x , P ) ( 1 < x < N ) is similar with Shannon entropy, which is discussed in [40]. Ref. [34] discussed the selection of importance coefficient, and it pointed out that the event with probability u j becomes the principal component in MIM when ϖ = 1 / u j . Due to the same form, the optimal recommendation distribution also has this conclusion. That is, f ( 1 / u j , j , U ) is larger than f ( 1 / u j , i , U ) if i j , which is shown in Figure 3.
Remark 7.
The optimal recommendation distribution is normalized message importance by means of MIM. In other words, the normalized message importance can also be seen as a distribution that is closest to utility distribution in relative entropy, when the average revenue of this recommendation system is larger than or equal to a predefined value.
Although the MIM is proposed based on information theory, it can also be given from the viewpoint of data recommendation. In fact, Remark 7 characterizes the physical meaning of MIM from data recommendation perspective, which confirms the rationality of MIM in one aspect.
Remark 8.
We also expands MIM to general cases whatever the probability of systems interested event is. The importance coefficient ϖ plays a switch role on the event sets of systems attention, that is,
(1) 
If ϖ > 0 , the importance index of small-probability events is magnified while that of high-probability events is lessened. In this case, the system prefers to recommend the small-probability events;
(2) 
If ϖ < 0 , the importance index of high-probability events is magnified while that of small-probability events is lessened. In this case, the system prefers to recommend the high-probability events;
(3) 
If ϖ = 0 , the importance of all events is the same.
In addition, the value of this parameter ϖ can also give a clear definition of the set which recommendation systems are interested in. For example, for a given utility distribution U ( u 1 < u 2 u N 1 < u N ), if 0 < ϖ ˜ K + 1 < ϖ < ϖ ˜ K ( 1 K N 1 ), the sparse events are focused on, and the set of these events is { S i | 1 i K } here. In fact, { S i | 1 i K } gives an unambiguous description of the definition of small-probability events in MIM. On the contrary, { S i | K i N } is the set of the events with high-probability which is highlighted if ϖ ˜ K + 1 < ϖ < ϖ ˜ K < 0 ( 1 K N 1 ). In particular, if ϖ > ϖ ˜ 2 > 0 , only the events with minimum probability will be focused on.

6. Numerical Results

In this section, some numerical results are presented to to validate the theoretical founds in this paper.

6.1. Property of g ( ϖ , V )

Figure 4 depicts the function g ( ϖ , V ) versus parameter ϖ . The probability distributions are V 1 = { 0.1 , 0.2 , 0.3 , 0.4 } , V 2 = { 0.05 , 0.15 , 0.3 , 0.5 } , V 3 = { 0.13 , 0.17 , 0.34 , 0.36 } , V 4 = { 0.01 , 0.11 , 0.12 , 0.76 } and V 5 = { 0.22 , 0.25 , 0.25 , 0.28 } . The scaling factor of ϖ is varying from 100 to 100.
Some observations are obtained in Figure 4. The functions in all the cases are monotonically decreasing with ϖ . When ϖ is small enough, i.e., ϖ = 100 , g ( ϖ , V ) is close to max ( V i ) for 1 i 5 , while g ( ϖ , V ) approaches min ( V i ) for 1 i 5 when ϖ = 100 . In fact, we obtain min ( V i ) g ( ϖ , V ) max ( V i ) . These results validate Lemma 1 and Lemma 2. Furthermore, the velocity of change of g ( ϖ , V ) in region 20 < ϖ < 20 is bigger than that in regions ϖ < 20 and ϖ > 20 . The KL distances between these probability distributions and uniform distribution are 0.1536 , 0.3523 , 0.1230 , 0.9153 , 0.0052 , respectively. Thus, the amplitude of variation of g ( ϖ , V ) decreases with decreasing of KL distance with uniform distribution.

6.2. Optimal Recommendation Distribution

Then, we focus on proposed optimal recommendation distribution in this paper. The parameters of recommendation system are D1 = { C p = 4.5 , R p = 2 , C n = 2 , R a d = 11 , C m = 2 } and D2 = { C p = 1 , R p = 9 , C n = 2 , R a d = 3 , C m = 2 } . The minimum average revenue β is varying from 0 to 3. The utility distributions are U1 = { 0.1 , 0.2 , 0.3 , 0.4 } and U2 = { 0.05 , 0.15 , 0.3 , 0.5 } . Let U 0 = { 0.25 , 0.25 , 0.25 , 0.25 } . Some results are listed in Table 6.
Figure 5 shows the relationship between optimal recommendation distribution and the minimum average revenue. Some observations can be obtained. In fact, the optimal recommendation distribution can be divided into three phases. In phase one, in which the minimum average revenue is small ( β < β 0 ), the optimal recommendation distribution is exactly the same as utility distribution. In phase two, the minimum average revenue is neither too small nor too large ( β 0 < β < β n o or β 0 < β < β a d ). In this case, the optimal recommendation distribution changes with the increasing of minimum average revenue. In the phase three, in which the minimum average revenue is too large ( β > β n o or β > β a d ), there is no appropriate recommendation distribution.
The optimal recommendation probability p 1 * versus the average revenue is depicted in subgraph one of Figure 5. If D1 is adopted, which means advertising systems, we obtain p 1 * increases with the increasing of minimum average revenue when β 0 < β < β a d . p 1 * is larger than u 1 in this case. p 1 * approaches one when the average revenue is close to β a d . If D2 is adopted, p 1 * decreases with the increasing of minimum average revenue in the region β 0 < β < β n o , and p 1 * is smaller than u 1 . p 1 * approaches zero as the average revenue increasing to β n o . In addition, p 1 * = u 1 when β < β 0 .
Subgraph two of Figure 5 depicts the optimal recommendation probability p 2 * versus the minimum average revenue. If β < β 0 , the optimal recommendation probability is equal to the corresponding utility probability. If β > β n o or β > β a d , p 2 * in the four cases will decrease from u 2 to zero. In this case, p 2 * increases at the early stage and then decreases for D1, U1 and D1, U2, while p 2 * is monotonously increasing for D2, U1 and D2, U2. Since u 2 < γ u , R p + C n + C m R a d < 0 for D1 and R p + C n + C m R a d > 0 for D2, these numerical results agree with the discussion in Section 4.1. Subgraph three is similar to the Subgraph two.
Subgraph four is contrary to the Subgraph one, which shows the optimal recommendation probability p 4 * versus the minimum average revenue. If β < β 0 , p 4 * is equal to u 4 . For D1, i.e., advertising systems, if the minimum average revenue is larger than β 0 and smaller than β a d , p 4 * is smaller than u 4 and it decreases to zero as β β a d . However, p 4 * is larger than u 4 and it increases to one as β β n o for D2, i.e., noncommercial systems.
Figure 6 illustrates minimum KL distance between recommendation distribution and utility distribution versus the minimum average revenue when β 0 < β < β n o or β 0 < β < β a d . It is noted that the constraints on the minimum average revenue are true. The minimum KL distance increases with the increasing of the minimum average revenue. This figure also shows that the minimum average revenue can be obtained for a given minimum KL distance, when the utility distribution and recommendation system parameters are invariant. Since this KL distance represents the accuracy of recommendation strategy, there is a trade-off between the recommendation accuracy and the minimum average revenue.

6.3. Property of f ( ϖ , x , U )

Figure 3 shows that the importance coefficient ϖ versus the function f ( ϖ , x , U ) . The utility distribution is { 0.03 , 0.07 , 0.12 , 0.240.25 , 0.29 } and the importance coefficient is varying from 40 to 40. It is easy to check that u 1 < u 2 < u 3 < i = 1 6 u i 2 < u 4 < u 5 < u 6 . We denote { ϖ 2 , ϖ 3 , ϖ 4 , ϖ 5 } = { 24.6 , 13.6 , 3.3 , 6.3 } .
f ( ϖ , 1 , U ) is monotonically increasing with increasing of the importance coefficient. f ( ϖ , 1 , U ) is close to zero when ϖ < 30 . f ( ϖ , i , U ) ( i = 2 , 3 , 4 , 5 ) increases with the increasing of ϖ when ϖ < ϖ i , and then it decreases with the increasing of ϖ when ϖ > ϖ i . They achieve the maximum value when ϖ = ϖ i . It is also noted that ϖ 5 < ϖ 4 < 0 < ϖ 3 < ϖ 2 . If ϖ < 30 , f ( ϖ , i , U ) ( i = 2 , 3 ) is very close to zero, and f ( ϖ , i , U ) > 0.05 ( i = 2 , 3 ) if ϖ > 30 , which means it changes faster when ϖ < 0 . On the contrary, f ( ϖ , i , U ) ( i = 4 , 5 ) changes faster with ϖ in ( 0 , + ) than that in ( , 0 ) since that it is still bigger than 0.5 when ϖ = 40 and it approaches zero when ϖ > 30 . In addition, f ( ϖ , 6 , U ) decreases monotonically with the increasing of the importance coefficient, and f ( ϖ , 6 , U ) is close to zero when ϖ > 30 .
Some other observations are also obtained. When ϖ = 0 , we obtain f ( ϖ , i , U ) = u i ( i = 1 , 2 , 3 , 4 , 5 , 6 ). Without loss of generality, we take f ( ϖ , 5 , U ) as an example. There is a non-zero importance coefficient ϖ ˜ 5 which makes f ( ϖ , 5 , U ) = u 5 . If 0 > ϖ ˜ 4 > ϖ > ϖ ˜ 5 , we obtain f ( ϖ , i , U ) > u i for i = 5 , 6 and f ( ϖ , i , U ) < u i for i = 1 , 2 , 3 , 4 . Compared with the utility distribution, the results of the elements in set { S 5 , S 6 } are enlarged, and those in set { S 1 , S 2 , S 3 , S 4 } are minified. The difference between these two sets is that the utility probability of S 5 or S 6 is larger than that in { S 1 , S 2 , S 3 , S 4 } . In addition. it is also noted that f ( ϖ , i , U ) > u i for i = 6 and f ( ϖ , i , U ) < u i for i = 1 , 2 , 3 , 4 , 5 , when ϖ < ϖ ˜ 5 . Here, only the function output of the element in high-probability set { S 6 } is larger than the corresponding utility probability.
Furthermore, when ϖ = 1 / p i { 33.3333 , 14.2857 , 8.3333 , 4.1667 , 4 , 3.4483 } for 1 i 6 , f ( ϖ , i , U ) > f ( ϖ , j , U ) for j i .

7. Conclusions

In this paper, we discussed the optimal data recommendation problem when the recommendation model pursues best user performance with a certain revenue guarantee. Firstly, we illuminated the system model and formulated this problem as an optimization. Then, we gave the explicit solution of this problem in different cases, such as advertising systems and noncommercial systems, which can improve the design of data recommendation strategy. In fact, the optimal recommendation distribution is the one that is the closest to the utility distribution in the relative entropy when it satisfies expected revenue. There is a trade-off between the recommendation accuracy and the expected revenue. In addition, the properties of this optimal recommendation distribution, such as monotonicity and geometric interpretation, were also discussed in this paper. Furthermore, the optimal recommendation system can be regarded as an information-filtering system, and the importance coefficient determines what events the system prefers to recommend.
We also obtained that the optimal recommendation probability is the proportion of corresponding recommendation value in total recommendation value if the minimum average revenue is neither too small nor too large. In fact, the recommendation value is a special weight factor in determining the optimal recommendation distribution, and it can be regarded as a measure of importance. Since its form and properties are the same as those of MIM, the optimal recommendation distribution is exactly the normalized MIM, where MIM is used to characterize the concern of the system. The parameter in MIM, i.e., importance coefficient, plays a switch role on the event sets of systems’ attention. That is, the importance index of high-probability events is enlarged for the negative importance coefficient (i.e., noncommercial system), while the importance index of small-probability events is magnified by systems for the positive importance coefficient (i.e., advertising systems). In particular, only the maximum-probability event or the minimum-probability event are focused on as the importance coefficient approaches to negative infinity or positive infinity, respectively. These results give a new physical explanation of MIM from the data recommendation perspective, which can validate the rationality of MIM in one aspect. MIM is also extended to the general case for whatever the probability of systems interested events is. One can adjust the importance coefficient to focus on desired data type. Compared with previous MIM, the set of desired events can be defined precisely. These results can help us formulate appropriate data recommendation strategy in different scenarios.
In the future, we may consider its applications in next generation cellular systems [41,42], wireless sensor networks [43] and very high speed railway communication systems [44] by taking the signal transmission mode into accounts.

Author Contributions

Conceptualization, S.L., Y.D. and P.F.; Formal analysis, S.L., Y.D., P.F., R.S. and S.W.; Methodology, S.L., Y.D. and P.F.; Writing—original draft, S.L.; Writing—review and editing, S.L., Y.D. and P.F.

Funding

The authors are very thankful for the support of the National Natural Science Foundation of China (NSFC) No. 61771283 and No. 61701247, and the China Major State Basic Research Development Program (973 Program) No. 2012CB316100(2).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MIMmessage importance measure
KLKullback–Leibler
SVMsupport vector machine

Appendix A. Proof of Lemma 1

The derivation of g ( ϖ , V ) is given by
g ( ϖ , V ) ϖ = i = 1 N v i 2 ( 1 v i ) e ϖ ( 1 v i ) j = 1 N v j e ϖ ( 1 v j ) i = 1 N v i 2 e ϖ ( 1 v i ) j = 1 N v j ( 1 v j ) e ϖ ( 1 v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= i = 1 N j = 1 N v i 2 ( 1 v i ) v j e ϖ ( 2 v i v j ) i = 1 N j = 1 N v i 2 v j ( 1 v j ) e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= i = 1 N j = 1 N v i 2 v j ( v j v i ) e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= v i < v j v i 2 v j ( v j v i ) e ϖ ( 2 v i v j ) + v i > v j v i 2 v j ( v j v i ) e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= v j < v i v j 2 v i ( v i v j ) e ϖ ( 2 v j v i ) + v i > v j v i 2 v j ( v j v i ) e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= v i > v j ( v j v i ) v i v j ( v i v j ) e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2
= v i > v j ( v j v i ) 2 v i v j e ϖ ( 2 v i v j ) i = 1 N v i e ϖ ( 1 v i ) 2 < 0 .
(A1d) is obtained by exchanging the notation of subscript. (A1f) follows from the fact that v i 0 for 1 i N and not all of v i is zero.

Appendix B. Proof of Lemma 2

When ϖ = 0 , we have
g ( 0 , V ) = i = 1 N v i 2 i = 1 N v i = i = 1 N v i 2 .
Let g ( , V ) = Δ lim ϖ g ( ϖ , V ) , and we obtain
g ( , V ) = lim ϖ i = 1 n v i 2 e ϖ ( 1 v i ) i = 1 n v i e ϖ ( 1 v i ) = lim ϖ v max 2 e ϖ ( 1 v max ) + v i v max v i 2 e ϖ ( 1 v i ) v max e ϖ ( 1 v max ) + v i v max v i e ϖ ( 1 v i ) = lim ϖ v max + v i v max v i 2 v max e ϖ ( v max v i ) 1 + v i v max v i v max e ϖ ( v max v i ) = v max .
Let g ( + , V ) = Δ lim ϖ + g ( ϖ , V ) , and it is noted that
g ( + , V ) = lim ϖ + i = 1 n v i 2 e ϖ ( 1 v i ) i = 1 n v i e ϖ ( 1 v i ) = lim ϖ + v min 2 e ϖ ( 1 v min ) + v i v min v i 2 e ϖ ( 1 v i ) v min e ϖ ( 1 v min ) + v i v min v i e ϖ ( 1 v i ) = lim ϖ + v min + v i v min v i 2 v min e ϖ ( v min v i ) 1 + v i v min v i v min e ϖ ( v min v i ) = v min .
Hence, v min < g ( ϖ , V ) < v max when ϖ ( , + ) according to Lemma 1.

Appendix C. Proof of Proposition 1

(1) First, if u x = u max and u x = u min , the non-zero solution of equation (33) does not exist since f ( ϖ , x , U ) is monotonously changing (cf. Remark 4).
Second, if u x = γ u , u x u max and u x u min , no solution exists since f ( 0 , x , U ) > f ( ϖ , x , U ) for ϖ 0 according to Theorem 3.
Third, if u x < γ u , u x u max and u x u min , according to Theorem 3, we have ϖ x > 0 and f ( ϖ , x , U ) is increasing in ( 0 , ϖ x ) while decreasing in ( ϖ x , + ) , where ϖ x is the solution to g ( ϖ , U ) = 1 α . It is easy to check that f ( + , x , U ) f ( 0 , x , U ) = u x < 0 (cf. (36)) and f ( ϖ x , x , U ) f ( 0 , x , U ) > 0 . According to the zero point theorem, the non-zero solution to f ( ϖ , x , U ) = u x , i.e., ϖ ˜ x (cf. (33)), would be found in ( ϖ x , + ) .
Fourth, likewise, the non-zero solution ϖ ˜ x can be found in ( , ϖ x ) if u x > γ u , u x u max and u x u min .
(2) and (3) Consider u y < u x < γ u first. According to Theorem 3, we have ϖ x > 0 and f ( ϖ , x , U ) is increasing in ( 0 , ϖ x ) while decreasing in ( ϖ x , + ) , where ϖ x is the solution to g ( ϖ , U ) = 1 α . Since we have f ( 0 , x , U ) = u x , the non-zero solution to f ( ϖ , x , U ) = u x , i.e., ϖ ˜ x (cf. (33)), would only be found in ( ϖ x , + ) . Likewise, we obtain that ϖ ˜ y > ϖ y > 0 . Since ϖ y > ϖ x when u y < u x (cf. Theorem 3), we have ϖ ˜ x > ϖ x > 0 and ϖ ˜ y > ϖ y > ϖ x > 0 .
Furthermore, f ( ϖ ˜ y , y , U ) = u y implies e ϖ ˜ y ( 1 u y ) i = 1 N u i e ϖ ˜ y ( 1 u i ) = 1 . Hence, we have
f ( ϖ ˜ y , x , U ) = e ϖ ˜ y ( 1 u x ) i = 1 N u i e ϖ ˜ y ( 1 u i )
= e ϖ ˜ y ( u y u x ) e ϖ ˜ y ( 1 u y ) i = 1 N u i e ϖ ˜ y ( 1 u i )
< u x = f ( ϖ ˜ x , x , U ) ,
where (A5b) follows e ϖ ˜ y ( 1 u y ) i = 1 N u i e ϖ ˜ y ( 1 u i ) = 1 , u x > u y and ϖ ˜ y > 0 .
Since f ( ϖ , x , U ) is decreasing with ϖ according to Theorem 3, we then have ϖ ˜ y > ϖ ˜ x .
Second, likewise, we can prove that ϖ ˜ x < ϖ ˜ y < 0 if u x > u y > γ u .
Third, if u y < γ u < u x , we shall have ϖ ˜ x < 0 and ϖ ˜ y > 0 . Obviously, ϖ ˜ y > ϖ ˜ x in this case.
Thus, the proof of Proposition 1 is completed.

References

  1. Chen, M.; Mao, S.; Zhang, Y.; Leungm, V.C. Definition and features of big data. In Big Data: Related Technologies, Challenges and Future Prospects; Springer: New York, NY, USA, 2014; pp. 2–5. [Google Scholar]
  2. Bi, S.; Zhang, R.; Ding, Z. Wireless communications in the era of big data. IEEE Commun. Mag. 2015, 53, 190–199. [Google Scholar] [CrossRef] [Green Version]
  3. Franklin, M.; Zdonik, S. Data in your face: Push technology in perspective. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, 1–4 June 1998; pp. 516–519. [Google Scholar]
  4. Hauswirth, M. Internet-Scale Push Systems for Information Distribution–Architecture, Components, and Communication. Available online: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=2C5856A9798C3085378770287B32D626?doi=10.1.1.7.4907&rep=rep1&type=pdf (accessed on 20 February 2019).
  5. Tadrous, J.; Eryilmaz, A.; Gamal, H.E. Proactive content download and user demand shaping for data networks. IEEE Trans. Netw. 2015, 23, 1917–1930. [Google Scholar] [CrossRef]
  6. Shoukry, O.; ElMohsen, M.A.; Tadrous, J. Proactive scheduling for content pre-fetching in mobile networks. In Proceedings of the 2014 IEEE International Conference on Communications, Sydney, Australia, 10–14 June 2014; pp. 2848–2854. [Google Scholar]
  7. Kim, Y.; Lee, J.; Park, S.; Choi, B. Mobile advertisement system using data push scheduling based on user preference. In Proceedings of the IEEE Wireless Telecommunications Symposium (WTS), Prague, Czech Republic, 22–24 April 2009; pp. 1–5. [Google Scholar]
  8. Podnar, I.; Hauswirth, M.; Jazayeri, M. Mobile push: Delivering content to mobile users. In Proceedings of the 22nd IEEE International Conference on Distributed Computing Systems Workshops, Vienna, Austria, 2–5 July 2002; pp. 563–568. [Google Scholar]
  9. Nicopolitidis, P.; Papadimitriou, G.I.; Pomportsis, A.S. An adaptive wireless push system for high-speed data broadcasting. In Proceedings of the 14th IEEE Workshop on Local & Metropolitan Area Networks, Crete, Greece, 18 September 2005; pp. 1–5. [Google Scholar]
  10. Bhide, M.; Deolasee, P.; Katkar, A.; Panchbudhe, A.; Ramamritham, K.; Shenoy, P. Adaptive push-pull: Disseminating dynamic web data. IEEE Trans. Comput. 2002, 51, 652–668. [Google Scholar] [CrossRef]
  11. Li, Y.; Chen, L.; Shi, H.; Hong, X.; Shi, J. Joint content recommendation and delivery in mobile wireless Networks with Outage Management. Entropy 2018, 20, 64. [Google Scholar] [CrossRef]
  12. Parra-Arnau, J.; Rebollo-Monedero, D.; Forné, J. Optimal Forgery and Suppression of Ratings for Privacy Enhancement in Recommendation Systems. Entropy 2014, 16, 1586–1631. [Google Scholar] [CrossRef] [Green Version]
  13. Zhang, J.; Yang, Y.; Tian, Q.; Zhuo, L.; Liu, X. personalized social image recommendation method based on user-image-tag model. IEEE Trans. Multimed. 2017, 19, 2439–2449. [Google Scholar] [CrossRef]
  14. Zhou, P.; Zhou, Y.; Wu, D.; Jin, H. Differentially private online learning for cloud-based Video recommendation with multimedia big data in social networks. IEEE Trans. Multimed. 2016, 18, 1217–1229. [Google Scholar] [CrossRef]
  15. Verbert, K.; Manouselis, N.; Ochoa, X. Context-aware recommender systems for learning: A survey and future challenges. IEEE Trans. Learn. Technol. 2012, 5, 318–335. [Google Scholar] [CrossRef]
  16. Cheng, Z.; Shen, J. On effective location-aware music recommendation. ACM Trans. Inf. Syst. 2016, 34, 13. [Google Scholar] [CrossRef]
  17. Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; pp. 973–978. [Google Scholar]
  18. Zhou, Z.; Liu, X. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
  19. Lomax, S.; Vadera, S. A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 2013, 45, 16:1–16:35. [Google Scholar] [CrossRef]
  20. Du, J.; Ni, E.A.; Ling, C.X. Adapting cost-sensitive learning for reject option. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), Toronto, ON, Canada, 26–30 October 2010; pp. 1865–1868. [Google Scholar]
  21. Zieba, A. Counterterrorism systems of spain and poland: Comparative studies. Prz. Politol. 2015, 3, 65–78. [Google Scholar] [CrossRef]
  22. Ando, S.; Suzuki, E. An information theoretic approach to detection of minority subsets in database. In Proceedings of the Sixth International Conference on Data Mining (ICDM), Hong Kong, China, 18–22 December 2006; pp. 1–10. [Google Scholar]
  23. Vapnik, V.; Kotz, S. Estimation of Dependences Based on Empirical Data; Jordan, M., Kleinberg, J., Schölkopf, B., Eds.; Springer: New York, NY, USA, 2006. [Google Scholar]
  24. Ivanchev, J.; Aydt, H.; Knoll, A. Information maximizing optimal sensor placement robust against variations of traffic demand based on importance of nodes. IEEE Trans. Intell. Transp. Syst. 2016, 17, 714–725. [Google Scholar] [CrossRef]
  25. Kawanaka, T.; Rokugawa, S.; Yamashita, H. Information security in communication network of memory channel considering information importance. In Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 10–13 December 2017; pp. 1169–1173. [Google Scholar]
  26. Mönks, U.; Lohweg, V. Machine conditioning by importance controlled information fusion. In Proceedings of the IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA), Cagliari, Italy, 10–13 September 2013; pp. 1–8. [Google Scholar]
  27. Li, Y.; Zhang, M.; Geng, X. Leveraging implicit relative labeling-importance information for effective multi-label learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Atlantic City, NJ, USA, 14–17 November 2015; pp. 251–260. [Google Scholar]
  28. Masnick, B.; Wolf, J. On linear unequal error protection codes. IEEE Trans. Inf. Theory 1967, 3, 600–607. [Google Scholar] [CrossRef]
  29. Fan, P.; Dong, Y.; Lu, J.; Liu, S. Message importance measure and its application to minority subset detection in big data. In Proceedings of the IEEE Globecom Workshops (GC Wkshps), Washington, DC, USA, 4–8 December 2016; pp. 1–6. [Google Scholar]
  30. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
  31. Verdu, S. Fifty years of shannon theory. IEEE Trans. Inf. Theory 1998, 44, 2057–2078. [Google Scholar] [CrossRef]
  32. Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1961; pp. 547–561. [Google Scholar]
  33. Fadeev, D.K. Zum Begriff der Entropie ciner endlichen Wahrscheinlichkeitsschemas. In Arbeiten zur Informationstheorie I; Deutscher Verlag der Wissenschaften: Berlin, Germany, 1957; pp. 85–90. [Google Scholar]
  34. She, R.; Liu, S.; Dong, Y.; Fan, P. Focusing on a probability element: parameter selection of message importance measure in big data. In Proceedings of the IEEE International Conference on Communications (ICC), Paris, France, 20–26 May 2017; pp. 1–6. [Google Scholar]
  35. Liu, S.; She, R.; Fan, P.; Letaief, K.B. Non-parametric Message Important Measure: Storage Code Design and Transmission Planning for Big Data. IEEE Trans. Commun. 2018, 66, 5181–5196. [Google Scholar] [CrossRef]
  36. She, R.; Liu, S.; Fan, P. Recognizing Information Feature Variation: Message Importance Transfer Measure and Its Applications in Big Data. Entropy 2018, 20, 401. [Google Scholar] [CrossRef]
  37. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
  38. Liu, S.; She, R.; Wan, S.; Fan, P.; Dong, Y. A Switch to the Concern of User: Importance Coefficient in Utility Distribution and Message Importance Measure. In Proceedings of the IEEE International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 25–29 June 2018; pp. 1–6. [Google Scholar]
  39. Van Erven, T.; Harremoës, P. Rényi divergence and kullback-leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
  40. She, R.; Liu, S.; Fan, P. Information Measure Similarity Theory: Message Importance Measure via Shannon Entropy. arXiv, 2019; arXiv:1901.01137. [Google Scholar]
  41. Xiong, K.; Fan, P.; Lu, Y.; Letaief, K.B. Energy efficiency with proportional rate fairness in multirelay OFDM networks. IEEE J. Select. Areas Commun. 2016, 34, 1431–1447. [Google Scholar] [CrossRef]
  42. Xiong, K.; Chen, C.; Qu, G.; Fan, P.; Letaief, K.B. Group cooperation with optimal resource allocation in wireless powered communication networks. IEEE Trans. Wirel. Commun. 2017, 16, 3840–3853. [Google Scholar] [CrossRef]
  43. Wang, Q.; Wu, D.O.; Fan, P. Delay-constrained optimal link scheduling in wireless sensor networks. IEEE Trans. Veh. Technol. 2010, 59, 4564–4577. [Google Scholar] [CrossRef]
  44. Zhang, C.; Fan, P.; Xiong, K.; Fan, P. Optimal power allocation with delay constraint for signal transmission from a moving train to base stations in high-speed railway scenarios. IEEE Trans. Veh. Technol. 2015, 64, 5775–5788. [Google Scholar] [CrossRef]
Figure 1. System model.
Figure 1. System model.
Entropy 21 00205 g001
Figure 2. Probability simplex and optimal recommendation. Region ⓘ denotes Case ⓘ in Table 3 for 1 i 8 .
Figure 2. Probability simplex and optimal recommendation. Region ⓘ denotes Case ⓘ in Table 3 for 1 i 8 .
Entropy 21 00205 g002
Figure 3. Function f ( ϖ , x , U ) vs. importance coefficient ϖ , when the utility distribution is U = { 0.03 , 0.07 , 0.12 , 0.24 , 0.25 , 0.29 } .
Figure 3. Function f ( ϖ , x , U ) vs. importance coefficient ϖ , when the utility distribution is U = { 0.03 , 0.07 , 0.12 , 0.24 , 0.25 , 0.29 } .
Entropy 21 00205 g003
Figure 4. g ( ϖ , P ) vs. ϖ .
Figure 4. g ( ϖ , P ) vs. ϖ .
Entropy 21 00205 g004
Figure 5. The optimal recommendation distribution vs. minimum average revenue. The parameters set { C p , R p , C n , R a d , C m } is denoted by D1 and D2, where D1 = { 4.5 , 2 , 2 , 11 , 2 } and D2 = { 1 , 9 , 2 , 3 , 2 } . The utility distributions are U1 = { 0.1 , 0.2 , 0.3 , 0.4 } and U2 = { 0.05 , 0.15 , 0.3 , 0.5 } .
Figure 5. The optimal recommendation distribution vs. minimum average revenue. The parameters set { C p , R p , C n , R a d , C m } is denoted by D1 and D2, where D1 = { 4.5 , 2 , 2 , 11 , 2 } and D2 = { 1 , 9 , 2 , 3 , 2 } . The utility distributions are U1 = { 0.1 , 0.2 , 0.3 , 0.4 } and U2 = { 0.05 , 0.15 , 0.3 , 0.5 } .
Entropy 21 00205 g005
Figure 6. Minimum KL distance between recommendation distribution and utility distribution vs. minimum average revenue. The parameters set { C p , R p , C n , R a d , C m } is denoted by D1 and D2, where D1 = { 4.5 , 2 , 2 , 11 , 2 } and D1 = { 1 , 9 , 2 , 3 , 2 } . The utility distribution is U1 = { 0.1 , 0.2 , 0.3 , 0.4 } and U2 = { 0.05 , 0.15 , 0.3 , 0.5 } .
Figure 6. Minimum KL distance between recommendation distribution and utility distribution vs. minimum average revenue. The parameters set { C p , R p , C n , R a d , C m } is denoted by D1 and D2, where D1 = { 4.5 , 2 , 2 , 11 , 2 } and D1 = { 1 , 9 , 2 , 3 , 2 } . The utility distribution is U1 = { 0.1 , 0.2 , 0.3 , 0.4 } and U2 = { 0.05 , 0.15 , 0.3 , 0.5 } .
Entropy 21 00205 g006
Table 1. Notations.
Table 1. Notations.
NotationDescription
SThe set of all the data
NThe number of data classes
S i The set of data belonging to the i-th class
S i S j = for i j , and S = S 1 S 2 S N 1 S N
Q = { q 1 , q 2 , . . . , q N } Raw distribution: the probability distribution of information source
P = { r 1 , r 2 , . . . , r N } Recommendation distribution: the probability distribution of recommended data
U = { u 1 , u 2 , . . . , u N } Utility distribution: the probability distribution of user’s preferred data
D ( P U ) The relative entropy or Kullback–Leibler (KL) distance between P and U
C p The cost of a single data push
R p The earning when the pushed data is liked by the user
C n The cost when the pushed data is not liked by the user
R a d The advertising revenue when the pushed data is not liked by the user
C m The cost missing to push a piece of user’s desired data
β The target revenue of a single data push
ϖ The importance coefficient
α α = β + C p R p R a d R p C n C m
γ u γ u = i = 1 N u i 2
g ( ϖ , V ) g ( ϖ , P ) = i = 1 N v i 2 e ϖ ( 1 v i ) i = 1 N v i e ϖ ( 1 v i )
f ( ϖ , x , U ) f ( ϖ , x , U ) = u x e ϖ ( 1 u x ) i = 1 N u i e ϖ ( 1 u i )
x x = 1 , 2 , , N is the index of classes
Table 2. The revenue matrix.
Table 2. The revenue matrix.
ActionRecommendNot Recommend
Preference
Desired R p C p C m
Unwanted R a d C p C n 0
Table 3. The matrix of occurring probability.
Table 3. The matrix of occurring probability.
ActionRecommendNot Recommend
Preference
Desired p i u i ( 1 p i ) u i
Unwanted p i ( 1 u i ) ( 1 p i ) ( 1 u i )
Table 4. The optimal recommendation distribution.
Table 4. The optimal recommendation distribution.
Case β α p x *
Advertising system β β 0 α 1 γ u x
β 0 < β β a d 1 γ < α 1 u x e ϖ ( 1 u x ) i = 1 N u i e ϖ ( 1 u i )
β > β a d α > 1 NaN
Neutral system β β n e NaN u x
β > β n e NaNNaN
Noncommercial system β 0 < β β n o 0 α < 1 γ u x e ϖ ( 1 u x ) i = 1 N u i e ϖ ( 1 u i )
β β 0 α 1 γ u x
β > β n o α < 0 NaN
Table 5. Optimal recommendation distribution with parameters. ↓ is used to denote p x * < u x and ↑ is used to denote p x * > u x .
Table 5. Optimal recommendation distribution with parameters. ↓ is used to denote p x * < u x and ↑ is used to denote p x * > u x .
β ϖ x = 1 2 x t t + 1 x N 1 x = N P *
β n o 0 p x * < u x p x * < u x 1(0,0,...,0,1)
( β 0 , β n o ) ( , 0 ) p 1 * < u 1 p x * < u x / p N * > u N ( , , N 1 , , , N 2 )
( β 0 , β ˜ x ) ( , ϖ ˜ x ) // p x * < u x p N * > u N
( β ˜ x , β n o ) ( ϖ ˜ x , 0 ) // p x * > u x p N * > u N
β 0 0 p 1 * = u 1 p x * = u x p x * = u x p N * = u N ( u 1 , u 2 , , u N )
( β 0 , β ˜ x ) ( 0 , ϖ ˜ x ) p 1 * > u 1 p x * > u x // ( , , N 3 , , , N 4 )
( β ˜ x , β a d ) ( ϖ ˜ x , + ) p 1 * > u 1 p x * < u x //
( β 0 , β a d ) ( 0 , + ) p 1 * > u 1 / p x * < u x p N * < u N
β a d + 1 p x * < u x p x * < u x 0(1,0,...,0,0)
Table 6. The auxiliary variables in optimal recommendation distribution.
Table 6. The auxiliary variables in optimal recommendation distribution.
Case R p + C n + C m R ad β 0 β no β ad γ u
D1,U1−51/20.3
D2,U11012/0.3
D1,U2−50.675/2.250.365
D2,U2101.653/0.365

Share and Cite

MDPI and ACS Style

Liu, S.; Dong, Y.; Fan, P.; She, R.; Wan, S. Matching Users’ Preference under Target Revenue Constraints in Data Recommendation Systems. Entropy 2019, 21, 205. https://doi.org/10.3390/e21020205

AMA Style

Liu S, Dong Y, Fan P, She R, Wan S. Matching Users’ Preference under Target Revenue Constraints in Data Recommendation Systems. Entropy. 2019; 21(2):205. https://doi.org/10.3390/e21020205

Chicago/Turabian Style

Liu, Shanyun, Yunquan Dong, Pingyi Fan, Rui She, and Shuo Wan. 2019. "Matching Users’ Preference under Target Revenue Constraints in Data Recommendation Systems" Entropy 21, no. 2: 205. https://doi.org/10.3390/e21020205

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop