Next Article in Journal
Machine Learning-Based Prediction Framework for Complex Neuromorphic Dynamics of Third-Order Memristive Neurons at the Edge of Chaos
Previous Article in Journal
Universal Latent Representation in Finite Ring Continuum
Previous Article in Special Issue
A Hyperbolic Graph Neural Network Model with Contrastive Learning for Rating–Review Recommendation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Positive-Unlabeled Learning in Implicit Feedback from Data Missing-Not-At-Random Perspective

1
KUKA Robotics China Co., Ltd., Shanghai 201702, China
2
Department of Biostatistics, School of Public Health, Peking University, Beijing 100871, China
3
School of Artificial Intelligence, Anhui University, Hefei 230601, China
*
Author to whom correspondence should be addressed.
Entropy 2026, 28(1), 41; https://doi.org/10.3390/e28010041 (registering DOI)
Submission received: 4 November 2025 / Revised: 14 December 2025 / Accepted: 17 December 2025 / Published: 29 December 2025
(This article belongs to the Special Issue Causal Inference in Recommender Systems)

Abstract

The lack of explicit negative labels issue is a prevalent challenge in numerous domains, including CV, NLP, and Recommender Systems (RSs). To address this challenge, many negative sample completion methods are proposed, such as optimizing sample distribution through pseudo-negative sampling and confidence screening in CV, constructing reliable negative examples by leveraging textual semantics in NLP, and supplementing negative samples via sparsity analysis of user interaction behaviors and preference inference in RS for handling implicit feedback. However, most existing methods fail to adequately address the Missing-Not-At-Random (MNAR) nature of the data and the potential presence of unmeasured confounders, which compromise model robustness in practice. In this paper, we first formulate the prediction task in RS with implicit feedback as a positive-unlabeled (PU) learning problem. We then propose a two-phase debiasing framework consisting of exposure status imputation, followed by debiasing through the proposed doubly robust estimator. Moreover, our theoretical analysis shows that existing propensity-based approaches are biased in the presence of unmeasured confounders. To overcome this, we incorporate a robust deconfounding method in the debiasing phase to effectively mitigate the impact of unmeasured confounders. We conduct extensive experiments on three widely used real-world datasets to demonstrate the effectiveness and potential of the proposed methods.

1. Introduction

Positive-unlabeled (PU) learning addresses scenarios where training data comprises only labeled positive entries and unlabeled entries, the latter of which may be a mixture of unobserved true negatives or unrecognized positives [1,2]. Such challenges are commonly seen in various real-world machine learning tasks [3,4,5,6], like in computer vision [7], natural language processing [8], and healthcare analytics [9]. Ignoring such mixture and simply treating all unlabeled ones as negative entries would lead to biased estimation and hurt the performance [10].
Among these domains, a recommender system (RS), which plays a key role in many modern websites and mobile applications by filtering out information that may be of interest to users, is another representative scenario [11,12,13]. In general, there are two types of user feedback in RS for training the prediction model, explicit feedback and implicit feedback. The explicit feedback (e.g., ratings) directly reflects users’ preferences but requires additional effort and cost to collect, making it often inaccessible in practice [14]. In contrast, the implicit feedback (e.g., click or purchase) only partially reveals user preferences but can be easily obtained by just recording user behavior logs [11,15,16,17], but users usually interact with what interest them only.
Studying a recommendation model with implicit feedback is challenging, primarily due to two core issues: missing-negative-feedbak (MNF) and the problem of missing-not-at-random (MNAR) [18,19]. For example, in the field of information retrieval, assume a click denotes positive feedback, then unlabeled ones cannot know whether unclicked events are negative feedback (not relevant) or unexposed feedback, i.e., the true negative and potentially positive feedback are entangled together in the unclicked events [20]. In addition, since the users tend to click the more preferred items (selection bias) and the system prefers to recommend popular items (exposure bias), the user’s feedback is not missing completely at random. This MNAR problem is further intensified by unmeasured confounders, hidden factors that affect both exposure and feedback [21]. For instance, a user’s latent financial status may sway preference towards affordable items and simultaneously affect their exposure via platform ranking policies. Similarly, an item’s inherent but unmeasured popularity can create a feedback loop where it is both preferred more and shown more frequently. Such confounders create spurious correlations, leading to over-recommending popular or low-price items. Ignoring this problem would incur bias and lead to sub-optimal performance [22,23,24]. Our work addresses those biases while accounting for these hidden factors under the MNAR setting.
Plenty of literature focus on addressing such MNF issues in RS. For instance, some works treat unclicked events as negatives and assign them lower weights [20,25,26]; others sample a few representative items from unclicked events to approximate true negatives [27,28,29]. Until [30], the study of RS implicit feedback was formally formulated as a PU learning problem. In recent years, there has been increasing interest in tackling MNF together with MNAR problems. Based on causal inference, refs. [14,31] propose a new ideal loss and an inverse propensity score (IPS) method for unbiased learning. Refs. [32,33] extend IPS by combining a joint-learning method and handling the bias in missing feedback, respectively. Other than selection bias in implicit feedback, refs. [18,34] further point out others, like exposure bias and bias due to sensitive privacy attributes, together with their corresponding debiasing methods.
Despite these advances of existing debiasing methods for the PU/MNF problem, we identify two critical limitations, not just for implicit feedback in RS: (1) Previous studies have not fully explored the MNAR problem. Works within the PU framework, like [35,36], were limited to addressing the selected-at-random (SAR, or equivalently, missing-at-random, MAR) setting. Furthermore, they solely presented IPS estimators, with no exploration of double robust (DR) methods, a classic extension of IPS. Even those causal RS works studying the MNF issue [14,32,33] addressed the MNAR problem via an IPS estimator, and they also omitted a specification of the DR method. Given that the DR method [37] and its variants [38,39,40,41] have achieved state-of-the-art performance for explicit feedback unbiased learning, developing DR-based methods for PU scenarios with MNAR is highly desirable; (2) no existing methods explicitly addressed unmeasured confounders. While methods in [4,15,30,42] tackled biased feedback with the MNF issue, they did not consider unmeasured confounders, which are unavoidable in RS due to technical difficulties or privacy restrictions [43,44,45], etc. Our theoretical analysis shows that propensity score-based methods become biased under unmeasured confounders, making it necessary to develop targeted solutions for implicit feedback in RS.
To address these research gaps, we first formulate the implicit feedback problem as a PU learning task to enhance prediction robustness [2,4,30,35].
A key insight of this formulation is that all the previous debiasing methods for explicit feedback can be extended to implicit feedback, provided that exposure status (or equivalently, treatment) is properly imputed. This insight motivates our proposed two-phase debiasing framework, which consists of a treatment imputation phase and a debiasing phase. The treatment imputation step is treated as a one-class classification problem, and we adopt a commonly used method, Support Vector Data Description (SVDD) [46]. For the debiasing phase, beyond applying IPS/DR explicit-feedback debiasing methods given MNAR, we also account for the impact of unmeasured confounders. Notably, the proposed framework provides a flexible paradigm that seamlessly bridges the debiasing methods for explicit and implicit feedback, especially enabling DR methods for implicit feedback with the PU learning problem. Our main contributions are summarized as follows:
  • We propose a two-phase debiasing framework that provides a PU learning paradigm with corresponding IPS and DR estimators to predict preference using implicit feedback.
  • We reveal the risk of unmeasured confounders for implicit feedback and propose a debiasing method to robustly mitigate the bias.
  • We conduct extensive experiments to validate the effectiveness of the proposed framework and methods.

2. Related Works

Various biases affect recommender systems [47], both in explicit feedback data [48] and implicit feedback data [14], including popularity bias [49], model selection bias [50], user self-selection bias [51], position bias [52], conformity bias [53], etc. To address these, causal inference methods have been widely adopted to develop debiasing estimators. The IPS approach and its self-normalized variant (SNIPS) were pioneering methods for debiasing explicit feedback [22]. Subsequent work introduced the DR estimator [37], which provides robustness against misspecification of either the propensity or imputation model, and has smaller variance than the IPS method [16,54]. Enhanced DR methods were later proposed, such as MRDR [38], SDR [55], TDR [56], and typically, a robust deconfounding approach (RD-IPS/RD-DR) that handles unmeasured confounders through adversarial training [57]. For a review, [47] provided a thorough discussion of recent progress on debiasing tasks in RS, and [23] established a unified causal analysis framework and formal causal definitions of various biases in RS based on a violation of assumptions adopted in causal analysis.
In parallel, implicit feedback poses a unique challenge due to the absence of a true negative label and the mixture of potential positives and negatives among unlabeled entries. Early heuristic approaches included WMF [15], which down-weights unobserved samples; BPR [42], which employs pairwise ranking; and ExpoMF [58], which incorporates exposure modeling. However, these methods lack unbiasedness guarantees under PU data, and they also discuss the additional MNAR problem. Later, propensity-based methods emerged to tackle the MNF imbalanced issue and discuss unbiasedness properties: RMF [14] presented IPS loss, UBPR further handled bias in missing feedback [59], and CJMF [32] combined it with joint learning; while [18,34,60] additionally handled MNAR due to selection bias, privacy bias and bias due to privacy attributes, respectively. However, these methods were not formalized as PU learning and did not specify the corresponding DR estimators.
WeaklyRec [30], as a tipping point, formally framed implicit RS feedback as a PU learning problem, together with PURL [61], which improved upon it via class prior estimation. They both relied on weak supervision (not propensity-based methods) and did not address any additional bias or MNAR issues. In general PU learning, two paradigms dominate [30]: (1) Re-weighting for unbiased estimation of the ideal loss, including uPU [62] and its convex/non-negative variants [2,3,6], further under the impact of imbalanced data and given selection bias; (2) Pseudo-labeling, using techniques like confidence score [63], generative models [64], clustering analysis [65], and reinforcement learning [66] to identify reliable negatives/positives from unlabeled data. Although some PU methods offered debiasing estimators with an unbiased property via propensity-based methods [3,6,35,36], most only considered MAR/SAR settings, with refs. [4,63] considering additional selection bias. In summary, the critical gap persists. Our proposed framework aims to address the following limitations: debiasing methods for the PU learning problem (not just for RS implicit feedback), a lack of formalized DR loss, and insufficient discussion on the potential impact of MNAR and unmeasured confounders.

3. Problem Setup

Let U = { u } , I = { i } and D = U × I denote the sets of users, items and user-item pairs, respectively. For each user–item pair, let x u , i represent its feature vector and y u , i { 0 , 1 } denote the implicit feedback, where y u , i = 1 or 0 indicates whether user u clicks item i or not. In the setting of implicit feedback, y u , i = 1 represents a positive feedback, but we cannot determine whether y u , i = 0 represents a true negative feedback (i.e., the user dislikes the item), or a potential positive feedback (i.e., the item is relevant but never shown to the user). To resolve this ambiguity, we introduce two unobserved variables, exposure status o u , i and relevance r u , i . o u , i = 1 or 0 represents whether item i is exposed to user u or not; r u , i = 1 or 0 represents whether user u and item i are relevant or not. With these notations, the existing works [4,14,30,35] assume a user clicks an item only if the item is both exposed and relevant, leading to the below assumption:
Assumption 1.
y u , i = o u , i · r u , i , for all ( u , i ) D .
Under Assumption 1, if we intervene to set o u , i = 1 , then r u , i = y u , i , i.e., the observed feedback directly reveals true relevance. Unobserved entries could be missing data with either positive labels (i.e., user and item are truly relevant) or negative labels (i.e., user and item are non-relevant). In real-world applications, each user can only interact with (e.g., clicks, views) a very limited number of items, making such missing data ubiquitous.

PU Learning for Recommendation System

Unlike most prior literature, we formulate the implicit feedback problem using the positive-unlabeled framework following the work of [30], which is also a widely adopted framework in weak- or semi-supervised learning [4,6,35,36]. In a PU learning problem, a full dataset D consists of entries arising from two marginal distributions: p P ( x ) = p ( x | r = 1 ) , the distribution of features for truly relevant (positive) user–item pairs, and p N ( x ) = p ( x | r = 0 ) , the distribution of features for truly irrelevant (negative) user–item pairs. The overall data distribution of the full dataset D can be expressed as a mixture of these two distributions:
p D ( x ) = π · p P ( x ) + ( 1 π ) · p N ( x ) ,
where π = P ( r = 1 | x ) denotes the class prior of positive label.
Notably, the true relevance r is hidden, and only the partially labeled y is observed, and the implicit feedback dataset follows the core PU property, i.e., all observed labels are positive feedback and correspond to true positives, but the unlabeled ones are mixed with true negatives and unexposed positives. Therefore, the goal of PU learning in this paper is to train a binary classification model f ( x ) that predicts the true relevance r u , i . Let r ^ u , i be the prediction of r u , i .
Owing to this reformulation of the data distribution, we can define the ideal loss for implicit feedback, which matches the ideal loss used in explicit feedback studies if we could observe all pairs of users’ preferences [2]. This ideal loss decomposes as
E L i d e a l = E p D δ r ( r ^ ) = π E p P δ 1 ( r ^ ) + ( 1 π ) E p N δ 0 ( r ^ )
where δ r ( r ^ ) = l ( r , r ^ ) , l ( · ) is a pre-specified loss, e.g., the cross-entropy loss l ( r u , i , r ^ u , i ) = r u , i log r ^ u , i + ( 1 r u , i ) log ( 1 r ^ u , i ) . For brevity, we simplify the user-item level loss to δ u , i , r u , i . Empirically, we aim to train a model that minimizes the risk
L i d e a l = 1 | D | ( u , i ) D r u , i δ u , i , 1 + ( 1 r u , i ) δ u , i , 0 .
This ideal loss aligns with those defined in explicit feedback literature if all r u , i were observable. In practice, however, L i d e a l is infeasible, because we can not observe all. A common workaround is to construct an unbiased estimator of L i d e a l unbiasedly, like IPS [22], but this is challenging, since classical debiasing methods for explicit feedback require true negative samples, which are unavailable in implicit feedback.
Conversely, if o u , i can be properly inferred, then the implicit feedback data on the exposed events { ( u , i ) D : o u , i = 1 } would become explicit feedback data by Assumption 1, and thus the debiasing approaches for explicit feedback can apply to implicit feedback. This insight inspires us to decompose the whole task into two phases: (1) impute the unobserved exposure status o u , i ; (2) perform debiased learning using the imputed exposure information. We detail this framework in the next section.

4. Proposed Method

In this section, we propose a two-phase debiasing framework for implicit feedback that provides a flexible paradigm seamlessly linking the debiasing approaches for explicit feedback and those for implicit feedback.

4.1. PU Learning for Implicit Feedback

For data with a PU structure, the unlabeled data distribution is a mixture of positive and negative data distribution:
p U = π p P ( x ) + ( 1 π ) p N ( x ) .
In practice, however, we only observe the label y rather than the true relevance r. Therefore, we alternatively calculate δ y ( r ^ ) = l ( y , r ^ ) , simplifying the notation of user–item level loss to δ u , i , y u , i . Based on this formulation, the mixed negative predicted errors can be expressed in terms of the unlabeled and positive entries:
( 1 π ) E p N δ 0 ( r ^ ) = E p U δ 0 ( r ^ ) π E p P δ 0 ( r ^ ) .
We subsequently derive a surrogate of the ideal risk expressed in terms of PU implicit feedback,
E p D L P U = π E p P δ 1 ( y ^ ) E p P δ 0 ( r ^ ) + E p U δ 0 ( r ^ ) .

4.2. Two-Phase Debiasing Framework

The proposed two-phase debiasing framework consists of two stages: the treatment imputation phase and the debiasing phase. Figure 1 presents the structure of the proposed framework.
In the first treatment imputation phase, unlike previous studies [25,27] where negative sampling methods of y u , i were used, we aim to do positive sampling of o u , i . Then for the exposed events { ( u , i ) D : o u , i = 1 } , we know the exact value of r u , i = y u , i according to Assumption 1. This leads to the data structure similar to explicit feedback, as shown in the middle of Figure 1. The advantages of this phase is three-fold: (1) it provides a flexible paradigm that can link the debiasing methods in explicit feedback and those in implicit feedback, whereas existing methods like [14,18,32] did not have DR losses due to the lack of treatment. As a result, they need to handle the missingness of r u , i and o u , i simultaneously; (2) it provides a chance for obtaining more accurate estimates of propensities; (3) it gives a more accurate imputation of r u , i through the fact that y u , i = o u , i r u , i .
After imputing the treatment, although we can know the explicit values of r u , i for exposed events, the values of r u , i for unexposed events are still missing. Ignoring this missing problem will incur bias and lead to sub-optimal performance. Thus, it is necessary to consider the debiasing phase. Due to the treatment imputation phase, all previous debiasing methods for explicit feedback, such as DR-JL [37] and MRDR [38], can be applied in parallel to implicit feedback. More importantly, we also consider the existence of unmeasured confounders, which invalidate most existing debiasing methods and have not been studied in the literature of implicit feedback.
Another potential strength is accounting for unmeasured confounders in the debiasing phase. In the treatment imputation phase, it is inevitable to introduce bias induced by inaccurate imputation, leading to a degradation of prediction performance. In such a task, we could take the bias as the influence of unmeasured confounders, which is intuitively reasonable. On the other hand, unmeasured information should not dominate the observed ones. Thus, while the imputation phase may introduce some bias, it provides good starting points for the debiasing phase.

4.3. Treatment Imputation Phase

The treatment imputation phase targets the problem of missing treatment. Missing treatment means missing o u , i and, as a result, the inability to identify the true relevance of negative samples in exposed events. As mentioned in Section 1, one of the biggest differences between explicit and implicit feedback is the mixture of truly negative and potentially positive feedback. Fortunately, suppose we have the true value of o u , i , we can identify the true negative samples on the exposed events according to Assumption 1, and this inspires the first phase, the treatment imputation.
Since, according to Assumption 1, we have only samples with o u , i = 1 for the group of y u , i = 1 and we want to distinguish the value of o u , i for the observed negative samples, treatment imputation is a one-class classification (OCC) problem. A range of well-established OCC methods can be utilized to solve this problem: one-class support vector machines (OC-SVM) [67], support vector data description (SVDD) [46], density-based methods [68], principal component analysis (PCA) [69], (one-class) K-means [70], autoencoder-based reconstruction methods [71], density-based isolation forests [72], etc. These methods all aim to capture the characteristic pattern of positive samples to classify unlabeled data into the labeled class or the unlabeled class. In this work, we adopt SVDD, one of the most widely used OCC methods, to deal with this problem. The idea behind SVDD is to find an optimal hyper-sphere of radius R that includes as many positive samples as possible. This hyper-sphere has clear physical interpretations. Its center a represents the “typical feature vector” of exposed user–item pairs, such as common attributes of items that are recommended and interacted with, while the radius R quantifies the similarity threshold for exposure. This transparency allows tracing why an unlabeled sample is imputed as a true negative.
In our framework, we first train an SVDD classifier and then predict the exposure status for the negative samples. If a negative sample is in this hyper-sphere, then we label it with o u , i = 1 , and set o u , i = 0 for those outside the hyper-sphere. Then, for the newly labeled samples with o u , i = 1 , we know r u , i = 0 according to Assumption 1, and we can form a new training set using these samples as well as the observed positive samples, which completes the phase of treatment imputation. Essentially, the SVDD classifier assumes that user–item pairs with similar features should have similar exposure status, which is similar to the idea of collaborative filtering used in model recommender systems, where similar users are assumed to have similar behavior.

4.4. Debiasing Phase

As discussed in Section 3, debiasing approaches previously developed for explicit feedback can be applied to implicit feedback after the treatment has been imputed. To accommodate a wider range of application scenarios, we consider two cases in the debiasing phase: the absence of unmeasured confounders and the presence of unmeasured confounders.

4.4.1. In the Absence of Unmeasured Confounders

When no unmeasured confounders are present, we develop our method based on the idea of propensity-based approaches for learning PU data with missing-at-random (MAR) or selected-at-random (SAR) labeling mechanisms [14]. In PU learning, we define the propensity score as the positively labeled probability conditioned on the true positives:
p u , i = P ( y u , i = 1 | r u , i = 1 , x u , i ) ,
since the exposure indicator is not fully observable, but we will later show that it is equal to the normal propensity defined in the causal inference framework under certain conditions.
Recall that for explicit feedback training, the inverse propensity score (IPS) [22] and doubly robust (DR) methods [37,38,73] are two main debiasing strategies to address the MNAR problem. However, their formulation cannot be directly applied when working with implicit feedback that exhibits the PU problem, because the probability of labeling negative examples is zero [35], i.e., the exposure indicator is not fully observed [14]. Thus, we need to adjust the weighting strategy. Alternatively, the key insight is that among true positives, we expect 1 / p u , i to be labeled, while ( 1 / p u , i 1 ) true positives remain unexposed and thus unlabeled, and true negatives are not labeled. Therefore, based on Equation (6) and Assumption 1, the IPS loss is formulated as
L I P S = 1 | D | ( u , i ) D y u , i 1 p ^ u , i δ u , i , 1 + 1 1 p ^ u , i δ u , i , 0 + ( 1 y u , i ) δ u , i , 0 = 1 | D | ( u , i ) D y u , i p ^ u , i δ u , i , 1 + 1 y u , i p ^ u , i δ u , i , 0 .
As discussed in Section 1, some studies of PU learning have shown the corresponding IPS estimator, but have not adequately addressed it specifically for debiasing implicit feedback in recommender systems.
We define δ u , i i m p = l ( r u , i i m p , r ^ u , i ) as the imputed loss function, where r u , i i m p is the imputed value for r u , i provided by an imputation model. The corresponding loss can then be decomposed as
E p D L D R = π E p P δ 1 ( r ^ ) δ i m p E p P δ 0 ( r ^ ) δ i m p + E p U δ 1 ( r ^ ) δ i m p + E p D δ i m p .
by Equations (1) and (4). Note that the imputation loss δ u , i i m p = l ( r u , i i m p , r ^ ) should not be canceled out here, and we mark it as δ u , i , r ^ u , i i m p to avoid confusion.
Unlike IPS, the DR loss has not been discussed in the setting of PU learning before, and we propose to construct it as follows
L D R = 1 | D | ( u , i ) D y u , i 1 p ^ u , i δ u , i , 1 δ u , i , 1 i m p + 1 1 p ^ u , i δ u , i , 0 δ u , i i m p + ( 1 y u , i ) δ u , i , 0 δ u , i , 0 i m p + δ u , i i m p = 1 | D | ( u , i ) D y u , i p ^ u , i δ u , i , 1 δ u , i , 1 i m p + 1 y u , i p ^ u , i · δ u , i , 0 δ u , i , 0 i m p + δ u , i i m p .
Note that the imputation loss δ u , i i m p should not be canceled out here, since it corresponds implicitly to a label y u , i , which cannot co-occur for one piece of user–item observed data. It is well known that the unbiasedness of both IPS and DR relies on the assumption of no unmeasured confounders, i.e.,
Assumption 2.
o u , i r u , i x u , i .
Under Assumption 2, from the perspective of causal inference for explicit feedback, L I P S is unbiased with respect to L i d e a l if p ^ u , i = p u , i , and L D R is unbiased if either p ^ u , i = p u , i or δ u , i i m p = δ u , i [23,57]. In addition, we note that the existing propensity score-based debiasing methods [14,18] for implicit feedback also rely on Assumptions 1 and 2. This is because they invoke the following assumption
P ( y u , i = 1 | x u , i ) = P ( r u , i = 1 | x u , i ) · P ( o u , i = 1 | x u , i ) ;
otherwise, P ( o u , i r u , i = 1 | x u , i ) is not equal to P ( r u , i = 1 | x u , i ) · P ( o u , i = 1 | x u , i ) in general. In addition, according to Assumptions 1 and 2, we also have
p u , i = P ( y u , i = 1 | r u , i = 1 , x u , i ) = P ( o u , i = 1 | r u , i = 1 , x u , i ) = P ( o u , i = 1 | x u , i ) ,
which matches the propensity definition in the causal recommendation framework. Then
P ( y u , i = 1 | x u , i ) = π u , i · p u , i .
Proposition 1.
In the absence of unmeasured confounders, under Assumptions 1 and 2, the IPS loss of implicit feedback is unbiased if p ^ u , i accurately estimates p u , i .
Proof. 
Denote π u , i = P ( r u , i = 1 | x u , i ) . Under Assumptions 1 and 2, we have Equation (11) and E [ Y = 1 | x ] = π · p , therefore
E o [ L I P S ] ) = 1 n i = 1 n π u , i p u , i 1 p ^ u , i δ u , i , 1 + 1 1 p ^ u , i δ u , i , 0 + 1 π u , i p u , i δ 0 y ^ u , i = 1 n i = 1 n π u , i p u , i p ^ u , i δ 1 y ^ u , i + 1 π u , i p u , i p ^ u , i δ u , i , 0
which equals the expectation of ideal loss (Equation (2)) when p ^ = p . □
Proposition 2.
In the absence of unmeasured confounders, under Assumptions 1 and 2, the DR loss of implicit feedback is unbiased if either δ u , i i m p accurately estimates δ u , i or p ^ ( x u , i ) accurately estimates p u , i .
Proof. 
Similarly, if there is no unmeasured confounding and the relationship between explicit and implicit feedback is y u , i = o u , i · r u , i ,
E o [ L D R ] ) = E o [ 1 | D | ( u , i ) D π u , i p u , i p ^ u , i δ u , i , 1 δ u , i , 1 i m p + 1 π u , i p u , i p ^ u , i · δ u , i , 0 δ u , i , 0 i m p + π u , i δ u , i , 1 i m p + ( 1 π u , i ) δ u , i , 0 i m p ]
where the expectation of δ u , i i m p also implicitly corresponds to a relevance r u , i , and its expectation is similar to that of the ideal loss. The bias of L D R equals to
Bias ( L D R ) = E o [ L D R ] E o [ L i d e a l ] = 1 n i = 1 n π u , i p u , i p ^ u , i 1 δ u , i , 1 δ u , i , 1 i m p + π u , i 1 p u , i p ^ u , i δ u , i , 0 δ u , i , 0 i m p .
Therefore, the bias reduces to zero when p ^ = p or δ i m p = δ , showing the double robustness property. □
Moreover, a variance analysis for the IPS estimator was not conducted in a previous study on positive-unlabeled learning [35,74], let alone the DR estimator. Thus, we provide the variance of the IPS and DR estimators in the following proposition.
Proposition 3.
In the absence of any unmeasured confounder, under Assumptions 1 and 2, the variance of the IPS loss of implicit feedback is
V L I P S = 1 | D | ( u , i ) D π u , i p u , i ( 1 π u , i p u , i ) p ^ u , i 2 δ u , i , 1 δ u , i , 0 2 .
Proof. 
First, we define Z u , i = y u , i 1 p ^ u , i δ u , i , 1 + 1 1 p ^ u , i δ u , i , 0 + ( 1 y u , i ) δ u , i , 0 . Subsequently, V Z u , i can be written as
V X u , i = E Z u , i 2 ( a ) E Z u , i 2 ( b )
For simplicity, we drop the subscript ( u , i ) in the following proof. Under Assumption 1 and 2, we have y B e r ( π p ) . Therefore,
Z 2 = y 2 p ^ 2 δ 1 2 + 2 y p ^ 1 y p ^ δ 1 δ 0 + 1 2 y p ^ 2 δ 0 2 = y p ^ 2 δ 1 2 + 2 y p ^ y p ^ 2 δ 1 δ 0 + 1 2 y p ^ + y p ^ 2 δ 0 2
where o 2 = o . Leading to
( b ) = δ 0 2 + 2 π p p ^ δ 1 δ 0 δ 0 2 + π p p ^ 2 δ 1 2 + δ 0 2 2 δ 1 δ 0 .
Next, ( a ) is calculated as
( a ) = π p p ^ δ 1 + 1 π p p ^ δ 0 2 = π p p ^ 2 δ 1 2 + 1 π p p ^ 2 δ 0 2 + 2 π p p ^ 1 π p p ^ δ 1 δ 0 .
Therefore,
V Z u , i = ( a ) ( b ) = π u , i p u , i ( 1 π u , i p u , i ) p ^ u , i 2 δ u , i , 1 δ u , i , 0 2 .
Since { Z u , i } is a linear combination of o and each Z u , i is an independent random variable, we have V L I P S = 1 | D | ( u , i ) D V Z u , i . □
Similarly, we show the variance of the DR estimator in the following proposition.
Proposition 4.
In the absence of any unmeasured confounder, under Assumptions 1 and 2, the variance of the DR loss of implicit feedback is
V L D R = 1 | D | ( u , i ) D π u , i p u , i ( 1 π u , i p u , i ) p ^ u , i 2 δ u , i , 1 δ u , i , 1 i m p δ u , i , 0 δ u , i , 0 i m p 2 .
Proof. 
Similarly, we define Z u , i = y u , i p ^ u , i δ u , i , 1 δ u , i , 1 i m p + 1 y u , i p ^ u , i · δ u , i , 0 δ u , i , 0 i m p + δ u , i i m p . Note that E ( δ i m p ) = r δ 1 i m p + ( 1 r ) δ 0 i m p . For simplicity, we omit the subscript ( u , i ) and let
C = δ i m p + δ 0 δ 0 i m p , D = 1 p ^ δ 1 δ 1 i m p δ 0 δ 0 i m p .
So Z = C + o D and Z 2 = C 2 + 2 C o D + o D 2 . We further denote a term
C = π δ 1 i m p + ( 1 r ) δ 0 i m p + δ 0 δ 0 i m p = δ 0 + π δ 1 i m p δ 0 i m p ,
Now the first and second moments are
E o [ Z ] = C + π p D = δ 0 + π δ 1 i m p δ 0 i m p + π p p ^ δ 1 δ 1 i m p δ 0 δ 0 i m p . E o Z 2 = C 2 + 2 C π p D + π p D 2 .
Then, the variance (with respect to o) is
V o ( Z ) = E o Z 2 E o [ Z ] 2 = π p ( 1 π p ) D 2 .
Substituting D explicitly, we get
V o ( Z ) = π p ( 1 π p ) p ^ 2 δ 1 δ 1 i m p δ 0 δ 0 i m p 2 .
For the variance of DR we take the sum of the variances of those independent variables. □

4.4.2. In the Presence of Unmeasured Confounders

As noted by [57,75], unmeasured confounders frequently appear in real-world recommender systems. For instance, financial status may affect user preferences and feedback but remains unobservable to most recommender systems, potentially leading to over-recommendation of cheap items due to this confounding effect [57]. Figure 2 depicts this scenario for implicit feedback within the causal inference framework, where H denotes the unmeasured confounders affecting both O and R through the path O H R . In the presence of unmeasured confounders h u , i , we specify the following Assumption 3 to substitute the previous assumption.
Assumption 3.
o u , i r u , i | ( x u , i , h u , i ) , o u , i / r u , i | x u , i .
Clearly, Assumption 3 represents a significant relaxation of Assumption 2 used in previous work [14,18]. Consequently, our framework only relies on minimal assumptions and applies to a wider range of situations. As discussed in Section 4.4.1, most existing propensity score-based methods (including IPS and DR) for both explicit feedback and implicit feedback become invalid when unmeasured confounders prevent us from disentangling the effect of o u , i on y u , i . We summarize this result in Theorem 1. The proof is provided in Appendix A.
Theorem 1.
In the presence of unmeasured confounders, the debiasing methods based on Equation (9) for implicit feedback are biased.
To address the impact of unmeasured confounders, we employ the idea of adversarial training based on the robust deconfounder (RD) framework [57]. Specifically, we define the true propensity score in PU setting as
p ˜ u , i = P ( y u , i = 1 r u , i = 1 , x u , i , h u , i ) ,
and refer to p u , i = P ( y u , i = 1 | r u , i = 1 , x u , i ) the nominal propensity score. Considering Assumptions 1 and 3, we have
p ˜ u , i = P ( y u , i = 1 r u , i = 1 , x u , i , h u , i ) = P ( o u , i = 1 x u , i , h u , i ) ,
analogous to Equation (11). Furthermore, under Assumption 1, denoting π u , i = P ( r u , i = 1 | x u , i , h u , i ) , and
P ( y u , i = 1 | x u , i , h u , i ) = P ( r u , i = 1 | x u , i , h u , i ) · P ( o u , i = 1 | x u , i , h u , i ) = π ˜ u , i · p ˜ u , i .
Theorem 2.
In the presence of unmeasured confounders, if p ^ u , i accurately estimates p ˜ u , i ,
(a) The IPS and DR losses for explicit feedback are unbiased.
(b) Under Assumptions 1–3, the debiasing methods based on Equation (16) for implicit feedback are unbiased.
Theorem 2 indicates that a true propensity score is critical to restore unbiasedness in propensity-based methods for both explicit and implicit feedback. The proof is provided in Appendix A. However, obtaining an accurate estimate of p ˜ u , i requires access to both measured confounders x u , i and unmeasured confounders h u , i , which is generally impossible without making stringent assumptions due to the unavailability of h u , i .
To mitigate the impact of unmeasured confounders on the estimation of the propensity score in the IPS/DR estimators, [57] employed the approach in [76] to discuss the robustness of recommendation debiasing under unmeasured confounders. Instead of completely removing the unmeasured confounding, the RD framework seeks to find the bounds of p ˜ u , i under a mild assumption and then applies an adversarial learning technique by varying the propensity scores with these bounds. Without loss of generality, we assume p u , i = P ( o u , i = 1 | x u , i ) = σ ( m ( x u , i ) ) and P ˜ ( o u , i = 1 | x u , i , h u , i ) = σ ( m ˜ ( x u , i , h u , i ) ) , where m and m ˜ are two arbitrary functions, σ ( x ) = e x p ( x ) / { 1 + e x p ( x ) } . The following Assumption 4 establishes the relationship between the nominal propensity score p u , i and the true propensity score p ˜ u , i .
Assumption 4.
| m ˜ ( x u , i , h u , i ) m ( x u , i ) | log ( Γ ) for Γ 1 .
The hyper-parameter Γ measures the influence of unmeasured confounding, where Γ = 1 means no influence, while a larger Γ indicates the model tolerates stronger impacts from unmeasured confounders. Under Assumption 4, we can derive the bounds of w ˜ u , i = 1 / p ˜ u , i given as in Lemma 1. The proof is provided in Appendix A, following the logic of [57].
Lemma 1.
Under Assumption 4, we have that
a u , i w ˜ u , i b u , i , a u , i = 1 + ( 1 / p u , i 1 ) / Γ , b u , i = 1 + ( 1 / p u , i 1 ) Γ .
Lemma 1 presents an uncertain set of the true propensity score. Based on Lemma 1, we define
W = { W R + | D | : a ^ u , i w u , i b ^ u , i } ,
where W = ( w 11 , , w | U | | I | ) , a ^ u , i and b u , i are estimates of a u , i and b u , i . The parameter Γ provides a mechanism to quantify assumptions about unmeasured confounding roughly. By defining explicit bounds [ a u , i , b u , i ] on the propensity scores, the method makes its sensitivity to unmeasured confounders quantitatively traceable, allowing practitioners to assess how recommendations might vary under different confounding scenarios.
To mitigate the influence of unmeasured confounders, the loss of RD-IPS, under Assumptions 1, 3, and 4 for the PU learning framework, is modified as
L R D I P S = max W W 1 | D | ( u , i ) D y u , i w u , i δ u , i , 1 + 1 w u , i δ u , i , 0 + ( 1 y u , i ) δ u , i , 0 .
Similarly, the loss of RD-DR, under Assumptions 1, 3 and 4 for the PU learning framework, is defined as
L R D D R = max W W 1 | D | ( u , i ) D y u , i w u , i δ u , i , 1 δ u , i i m p + 1 w u , i δ u , i , 0 δ u , i i m p + ( 1 y u , i ) δ u , i , 0 δ u , i i m p + δ u , i i m p
Intuitively, the RD-IPS/DR losses are robust to unmeasured confounders by controlling the worst-case scenario induced by unmeasured confounders. Moreover, a similar strategy can be applied to any propensity-based methods, such as DR-JL [37], MRDR [38], DR-MSE [24], etc.
We now analyze the generalization bounds of RD-IPS and RD-DR loss functions. Let H be the finite hypothesis space of the prediction model r ^ . For clarity, for any prediction model r ^ h H , we write L i d e a l as L i d e a l ( r ^ h ) to highlight its dependence on r ^ h . Similarly, we denote L R D I P S = L R D I P S ( r ^ h ) and L R D D R = L R D D R ( r ^ h ) .
Theorem 3.
Suppose that w ˜ u , i [ a ^ u , i , b ^ u , i ] , δ u , i C 1 , and w ˜ u , i C 2 . Then for any prediction model r ^ h H and η > 0 , we have a probability of at least 1 η ,
L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) + C 1 ( C 2 + 1 ) 2 log ( | H | / η ) | D | .
In addition, given the imputed error δ u , i i m p and assume that | δ u , i δ u , i i m p |   C 3 , then with a probability of at least 1 η ,
L i d e a l ( r ^ h ) L R D D R ( r ^ h ) + C 3 ( C 2 + 1 ) 2 log ( | H | / η ) | D | .
Proof. 
For any prediction model r ^ h H , since w ˜ u , i [ a ^ u , i , b ^ u , i ] , we have
L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) L i d e a l ( r ^ h ) 1 | D | ( u , i ) D y u , i w ˜ u , i δ u , i , 1 δ u , i i m p + 1 w ˜ u , i δ u , i , 0 δ u , i i m p ( 1 y u , i ) δ u , i , 0 δ u , i i m p + δ u , i i m p = 1 | D | ( u , i ) D ( r ^ u , i h y u , i w ˜ u , i ) δ u , i , 1 + ( y u , i w ˜ u , i r ^ u , i h ) δ u , i , 0 .
Under Assumption 3, the expectation of ( δ u , i , 1 δ u , i , 0 ) ( r ^ u , i h y u , i w ˜ u , i ) is 0. Note that | δ u , i , y u , i ( r ^ u , i h y u , i w ˜ u , i ) |   C 1 ( C 2 + 1 ) , and δ ^ , r h { 0 , 1 } . Applying Hoeffding’s inequality yields the following result,
P 1 | D | ( u , i ) D ( r ^ u , i h y u , i w ˜ u , i ) ( δ u , i , 1 δ u , i , 0 ) > ϵ exp ϵ 2 | D | 2 C 1 2 ( C 2 + 1 ) 2 .
Then
P L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) ϵ = 1 P L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) > ϵ 1 P sup y ^ h H [ L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) ] > ϵ 1 h = 1 | H | P L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) > ϵ 1 h = 1 | H | P 1 | D | ( u , i ) D ( r ^ u , i h y u , i w ˜ u , i ) ( δ u , i , 1 δ u , i , 0 ) > ϵ 1 | H | exp ϵ 2 | D | 2 C 1 2 ( C 2 + 1 ) 2 .
Letting | H | exp { ϵ 2 | D | / 2 C 1 2 ( C 2 + 1 ) 2 } = η leads to
ϵ = C 1 ( C 2 + 1 ) 2 log ( | H | / η ) | D | .
Thus, with a probability of at least 1 η ,
L i d e a l ( r ^ h ) L R D I P S ( r ^ h ) + C 1 ( C 2 + 1 ) 2 log ( | H | / η ) | D | .
Next, for the generalization error bound of RD-DR. Observe that
L i d e a l ( r ^ h ) L R D D R ( r ^ h ) L i d e a l ( r ^ h ) ( u , i ) D y u , i 1 p ^ u , i δ u , i , 1 δ u , i i m p + 1 1 p ^ u , i δ u , i , 0 δ u , i i m p + ( 1 y u , i ) δ u , i , 0 δ u , i i m p + δ u , i i m p = 1 | D | ( u , i ) D ( r ^ u , i h y u , i w ˜ u , i ) ( δ u , i , 1 δ u , i i m p ) + ( y u , i w ˜ u , i r ^ u , i h ) ( δ u , i , 1 δ u , i i m p ) .
Given the imputed error δ u , i i m p and assume that | δ u , i δ u , i i m p |   C 3 (upper bound exists since δ [ 0 , 1 ] ), the expectation of ( δ u , i δ u , i i m p ) ( r ^ u , i h y u , i w ˜ u , i ) is 0 under Assumption 3 and | ( δ u , i δ u , i i m p ) ( r ^ u , i h y u , i w ˜ u , i ) |   C 3 ( C 2 + 1 ) . Then by exactly the same arguments as the proof of IPS, we obtain that with probability at least 1 η ,
L i d e a l ( r ^ h ) L R D D R ( r ^ h ) + C 3 ( C 2 + 1 ) 2 log ( | H | / η ) | D | .
Theorem 3 holds for any r ^ h , including r ^ h * ( 1 ) and r ^ h ( 1 ) defined by r h * ( 1 ) = arg m i n r ^ h L R D I P S ( r ^ h ) and r ^ h ( 1 ) = arg m i n r ^ h L R D D R ( r ^ h ) . It shows that the generalization bound of the prediction model trained by the RD-IPS or RD-DR loss is bounded by its corresponding loss plus a negligible term. The negligible term vanishes at a rate of O ( 1 / | D | ) as | D | goes to infinity, which suggests that the loss of RD-IPS or RD-DR itself can well control the generalization bounds, even in the presence of unobserved confounders. This theoretically demonstrates the effectiveness of the proposed method.
Overall, the structural design of the proposed two-phase framework clarifies how the estimator operates under the presence of missing exposures and unmeasured confounding. The treatment-imputation module reconstructs latent exposure patterns by identifying user–item pairs that align geometrically with the observed logging mechanism by SVDD one-class classification, thereby producing an expanded training sample that reflects the selection dynamics of the underlying system. Although such reconstruction may introduce moderate discrepancies relative to the true exposure distribution, the subsequent debiasing phase explicitly regulates the influence of unmeasured confounding through bounded inverse-propensity adjustments. These bounds determine how uncertainty in the exposure mechanism propagates into the weighted optimization objective and compensates for potential inaccuracies created during imputation. Conversely, the debiasing phase benefits from the imputation phase because effective correction requires initial propensity values that retain meaningful structure from the logged data rather than being dominated by unobserved variation. The two phases thus form a complementary procedure in which the imputation step provides informative starting values and the debiasing step refines them to achieve a controlled adjustment of both exposure noise and hidden confounding. Since both phases operate at the level of exposure modeling and weighting rather than depending on specific prediction architectures, the framework integrates seamlessly with those propensity-based estimators originally designed for partially observed explicit feedback, yielding a coherent and transparent estimation pipeline.

5. Experiments

In this section, several experiments are conducted on real datasets to verify the efficiency of the proposed two-phase implicit feedback debiasing framework, which combines SVDD for treatment imputation and adversarial learning for debiasing. We aim to answer the following research questions (RQs):
RQ1. 
Does the proposed method outperform the existing debiasing methods on real-world datasets?
RQ2. 
How does each of the stages affect the debiasing performance of the proposed method?
RQ3. 
Does our method stably outperform for different SVDD sampling ratios on positive treatment imputations?
RQ4. 
How does the adversarial strength affect the debiasing performance?
RQ5. 
Given the model complexity, is there a potential overfitting issue?

5.1. Experimental Setup

  • Datasets. Measuring the debiasing performance requires that the dataset contains both missing-not-at-random (MNAR) ratings and missing-at-random (MAR) ratings. There are three public datasets that meet the requirements. The first is Coat Shopping (https://www.cs.cornell.edu/~schnabts/mnar/, accessed on 3 November 2025), which contains 6960 MNAR ratings and 4640 MAR ratings in total. Both MNAR ratings and MAR ratings are generated by 290 users to 300 items. Each user rates their favorite 24 products to generate the former, while each user randomly rates 16 products to make up of the latter. The second is Yahoo! R3 (https://www.kaggle.com/datasets/limitiao/yahoor3, accessed on 3 November 2025), which contains ratings from 15,400 users to 1000 items. Each user rates several items to generate the 311,704 MNAR ratings, while the first 5400 users are asked to randomly rate 10 items, which make up the 54,000 MAR ratings. The third is KuaiRec (https://kuairec.com/, accessed on 3 November 2025), a public large-scale dataset, collected via a video-sharing platform, which consists of 4,676,570 video watching ratio records from 1411 users and 3327 videos. To further demonstrate the generalizability of our method performance, we utilize its unbiased data, where a subset of users is asked to rate sampled items uniformly at random, providing 13,454 MAR ratings, leaving 201,171 interactions to serve as MNAR data.
  • Pre-processing. The difference between the rating prediction task in implicit and explicit feedback settings is that the implicit feedback cannot see the rating if the user does not like the item. Following previous studies [37,38,59], we binarize the ratings as negative if the rating is less than 3, otherwise as positive. Therefore, we remove the samples with ratings less than 3 from three datasets, leaving 3622 positive samples for the Coat dataset, 174,208 positive samples for the Yahoo! R3 dataset, and 105,186 positive samples for the KuaiRec dataset.
  • Baselines. To validate the efficiency of the proposed debiasing framework, we compare our methods with the following baselines:
  • Base model: Matrix Factorization (MF) [77] is used as the base model frequently in debiasing recommendations.
  • WMF: WMF [15] is a classic method for implicit feedback. It up-weights the loss of all positive feedback to reflect greater confidence in the positive feedback than the negative feedback.
  • BPR: BPR [42] is another classic method for implicit feedback. It assumes that unclicked items are less preferred than clicked items and, therefore, maximizes the posterior probability under this assumption.
  • Rel-MF: Rel-MF [14] is a debiasing method for implicit feedback, which is an extension of the classic IPS methods [22] for implicit feedback.
  • IPS [22]: IPS is a classic debiasing method that weights the loss function by the corresponding inverse propensity scores to reduce the bias.
  • DR [59]: DR is another classic debiasing method that includes both the propensity and the imputation model and aims to minimize the DR loss to reduce bias.
  • DR-JL [37]: DR-JL is based on the DR method. It applies a joint learning algorithm between the prediction and the imputation model.
  • MRDR-JL [38]: MRDR-JL is a variation of DR-JL, which changes the original DR imputation loss to MRDR imputation loss.
  • Negative Sampling for Baselines. Note that the existing methods require training with both positive samples and negative samples. However, we can only observe positive samples in the implicit feedback setting. Thus, we use the negative sampling method with the baseline methods. Specifically, we randomly select user–item pairs in the missing data and mark them as negative samples, and the estimated propensities are obtained by logistic regression.
  • Experimental protocols and details. Following the previous studies [22,60], MSE, AUC, NDCG@5 and NDCG@10 are used as our evaluation metrics. All experiments are implemented on PyTorch v2.6.0 with Adam as the optimizer (for all experiments, we use GeForce RTX 2060 as the computing resource). To mitigate overfitting, we employ L2 regularization via weight decay and early stopping during training, which is terminated when the relative loss decrease falls below a 1 × 10 4 tolerance threshold for 5 consecutive epochs. We tune the weight decay in { 1 × 10 5 , 2.5 × 1 × 10 5 , 5 × 1 × 10 5 , , 7.5 × 1 × 10 3 , 1 × 10 2 } and the learning rate in { 0.005 , 0.01 , 0.05 , 0.1 } . In addition, we tune the sample ratio between the imputed negative and positive samples in { 0.6 , 0.8 , 1.0 , 1.2 , 1.4 } in RQ3 and set it to 1.0 for other RQs. For the RD-based methods, we tune the adversarial strength gamma in { 1.0 , 1.05 , 1.1 , 1.15 , 1.2 } in RQ4 and set it to 1.1 for other RQs. We set the batch size to 128 for Coat, 2048 for Yahoo! R3 and KuaiRec, and use the default parameter values in the SVDD function in scikit-learn v1.3.0 package for the SVDD-based methods on both datasets.

5.2. Real-World Performance (RQ1)

We conducted experiments on three real datasets, Coat, Yahoo! R3 and KuaiRec, to verify the effectiveness of the proposed two-phase debiasing framework for implicit feedback. Table 1 compares the previous debiasing methods and their prediction performance when combined with the two-phase debiasing framework. Since the Naive method simply uses observed samples without estimating propensities, it cannot be combined with RD for unmeasured confounders. On the one hand, the models combined with the debiasing method all significantly outperform the Naive method, which illustrates the necessity of introducing the debiasing method, as well as the effectiveness and flexibility of the proposed debiasing framework for combining with various debiasing methods in explicit feedback. On the other hand, when dealing with the implicit feedback problem, we found that the debiasing method combining both phases can significantly improve the prediction performance in all metrics compared to the original debiasing methods used in explicit feedback, which is explained by the fact that the treatment imputation in the first stage can help the estimation of propensities and the accurate imputation of negative samples, while the second stage can further adjust for the unmeasured confounders using an RD technique based on the information provided in the first stage.
To ensure a direct and fair evaluation focused on debiasing performance, we follow the established practice in the debiasing literature [14] by employing MF as the predictive model backbone for all primary comparisons. This provides a consistent benchmark to isolate and evaluate the efficacy of the debiasing components. To further verify the robustness of our framework across different representation learners, we conducted an exploratory analysis using the VAE-CF method (Variational Autoencoder for Collaborative Filtering [78]) as an alternative on the Coat dataset. The MRDR-JL-based results are presented in Appendix B, Table A1. They indicate that the relative performance improvements achieved by our two-stage debiasing framework are maintained when using such advanced architecture, demonstrating its general applicability.

5.3. Ablation Study (RQ2)

We further perform an ablation study to investigate the effect of each phase on debiasing performance in implicit feedback. As shown in Figure 1, across all baseline methods, adding either of the two phases can significantly improve performance on MSE, AUC, and NDCG@K. For phase 1, it should be noted that the proposed SVDD uses positive sampling for treatment, rather than direct negative sampling of ratings as suggested by previous studies. This has the advantage that treatment imputation not only provides a more accurate estimate of propensities but also provides accurate inferences for the relevance of interest, i.e., for negative samples y u , i = 0 , the additional information of o u , i = 1 obtained by phase 1 can lead to the conclusion of r u , i = 0 . In addition, the SVDD algorithm can yield more convincing positive treatment samples than the previous random negative sampling method for the labeling of r u , i = 0 . For phase 2, by incorporating RD, not only can the direct effect of unmeasured confounders on propensity estimation be adjusted using adversarial techniques but also the reduction in debiasing performance due to inaccurate treatment imputation in phase 1 can be mitigated. Optimal prediction performance is achieved when both debiasing phases are combined, due to the simultaneous use of treatment imputation for propensity estimation and relevance prediction, as well as the adjustment of unmeasured confounders using RD.

5.4. In-Depth Analysis (RQ3 and RQ4)

  • Effect of SVDD Sampling Ratio in the First Stage. In the first stage, SVDD can provide more reliable imputations for positive treatment, leading to more accurate estimation of propensities and relevance predictions. Figure 3 demonstrates the effect of the various debiasing methods on AUC, NDCG@5, and NDCG@10 at different sample ratios of positive treatment imputations. Remarkably, the proposed method outperforms significantly over the baseline with all SVDD sampling ratios. In addition, predictive performance reaches an optimal for proper positive sampling ratios between 0.8 and 1.2. This is explained by the fact that too many or too few imputations of positive treatment can cause data imbalance, and the positive imputations by the SVDD method result in lower confidence as the number of imputations increases.
  • Sensitivity of Adversarial Strength in the Second Stage. For the second stage, we conducted repeated experiments to investigate the effect of different adversarial strengths of RD on the AUC, NDCG@5, and NDCG@10 of the various debiasing methods, and the results are shown in Figure 4, where the yellow dashed line shows the performance of the baseline method without RD. The proposed method stably outperforms the baselines for almost all adversarial strengths, validating the effectiveness of RD’s tuning for unmeasured confounders. In addition, a higher adversarial strength increases the variance, and a suitable adversarial strength between 1.05 and 1.15 leads to optimal performance. Such empirical results are compatible with the theoretical guarantee, where a larger adversarial strength not only has a stronger effect on the adjustment of unmeasured confounders but also increases the hypothesis space of the prediction model, leading to a higher variance as well as worsening the performance of the model given by the minimax method. When the adversarial strength is set to 1, it is equivalent to not adjusting for unmeasured confounders, which degenerates to debiasing methods without using RD techniques.

5.5. Overfitting Analysis (RQ5)

  • Overfitting Analysis. We evaluate whether the proposed two-phase debiasing framework leads to overfitting by tracking the IPS-weighted training and validation losses across epochs. The trajectories in Figure 5 show that SVDD-only variants within the IPS, DR-JL, and MRDR-JL families converge slowly and display clear upward trends in validation loss, reflecting that treatment imputation alone increases data coverage but does not correct the confounding structure of the logged data, making the models sensitive to pseudo-negative noise. Adding the RD phase mitigates this issue: RD-only variants converge more quickly and exhibit noticeably reduced overfitting, which is consistent with RD’s ability to regulate propensity adjustments and limit the influence of unmeasured confounders. The full two-phase method combining SVDD and RD further stabilizes the trajectories, reaching convergence more rapidly and maintaining lower validation losses near convergence.
The DR family shows a different pattern. Although RD-only and RD-SVDD variants converge faster than their SVDD-only counterparts, both exhibit larger increases in validation loss in later epochs. This stems from the DR objective’s separate estimation of propensity weighting and imputation components, where small mismatches between the two accumulate and amplify variance, making DR more sensitive to noisy pseudo-negatives than DR-JL or MRDR-JL. Nevertheless, RD-SVDD remains more stable than RD-only, indicating that incorporating treatment imputation continues to provide benefit within the DR family.
Overall, the loss trajectories demonstrate that the SVDD-RD framework and its DR-JL and MRDR extensions do not induce harmful overfitting. Since this loss corresponds directly to the optimization objective, the close agreement between training and validation losses shows that the proposed methods generalize reliably despite their increased structural complexity.
  • Runtime Analysis. In addition, we investigate the potential computational overhead issue and conduct a comprehensive runtime analysis across all three datasets. The total training time for each debiasing method is summarized in Table 2. A key observation is that our complete two-phase methods, such as RD-IPS-SVDD, RD-DR-SVDD, exhibit training times comparable to their base single-phase or non-robust counterparts, attributed to the stabilized training dynamics and early stopping. The results demonstrate that the superior prediction performance reported in Table 1 is achieved with a measurable yet manageable computational overhead.

6. Conclusions

The positive-unlabeled (PU) learning problem appears widely in modern machine learning tasks, including computer vision, natural language processing, and recommender systems, where the lack of explicit negative labels and the missing-not-at-random (MNAR) nature of feedback make unbiased estimation difficult. This work proposes a two-phase debiasing framework that addresses both missing exposure and unmeasured confounding and extends classical propensity-based estimators to PU settings. In the first phase, exposure imputation is formulated as a one-class classification problem, where SVDD provides quantitative signals describing how unlabeled entries relate to the observed logging mechanism. In the second phase, a robust deconfounding component introduces bounded propensity adjustments to correct for unmeasured confounders and to adjust for errors propagated from the imputation step. The resulting RD-version loss estimators restore unbiasedness under weakened assumptions and are supported by generalization guarantees that reflect the controlled impact of hidden confounding. Together, the two phases form a coherent procedure in which surrogate exposure reconstruction and confounding adjustment act through explicit and structurally defined components, well aligned with the PU learning problem.
In practical deployment, this framework is most effective where the observed labels reflect the true preference. The foundational y = o · r assumption can be violated by label noise, but we could still presume it holds for the true labels, which may not be able to be observed in this scenario. Extending the framework to jointly achieve debiasing and denoising, such as the approach of identifying the noise label transition matrix, is a valuable direction for future research. Learning the robustness parameter Γ for unmeasured confounding remains another practical question. One promising direction is by fusion with a small unbiased dataset, which is available in three real-world datasets. Moreover, the performance of our method relies on having sufficient data for reliable exposure imputation. Even though the DR estimator that our method is based on provides theoretical guarantees of double robustness, i.e., unbiased estimation if either imputation error or outcome is well predicted, the accuracy of the exposure prediction still influences the estimation variance.
Experiments on three real-world datasets verify the effectiveness of this design. Across classic propensity-based methods, adding either SVDD-based exposure reconstruction or RD-based adjustment improves performance, and combining both consistently yields the strongest results. The ablation analyses show that SVDD offers more informative exposure imputations than random negative sampling, while RD stabilizes weighting and reduces noise introduced during imputation. Sensitivity studies demonstrate that moderate SVDD sampling ratios and moderate RD adversarial strengths provide the best balance between reconstruction accuracy and variance control. Training–validation loss trajectories further confirm that the full two-phase variants converge efficiently and resist overfitting.
Taken together, this work contributes a general, theoretically grounded, and practically effective framework for PU learning that integrates exposure modeling and confounding adjustment into a unified estimation pipeline. Beyond implicit-feedback recommendation, the modular structure of the two phases allows the framework to be combined with a broad class of debiasing objectives and prediction models in PU settings where exposure uncertainty and unmeasured confounding coexist. The results highlight the importance of combining the treatment reconstruction with the debiasing step and demonstrate how this joint consideration yields predictable behavior, improved robustness, and consistent gains across metrics and datasets.

Author Contributions

Conceptualization, T.X.; Methodology, S.W. and T.X.; Software, S.W.; Validation, S.W. and T.X.; Formal analysis, S.W.; Investigation, S.W.; Resources, T.X.; Data curation, S.W. and T.X.; Writing—original draft, S.W. and T.X.; Writing—review & editing, S.W., T.X. and L.Y.; Visualization, S.W.; Supervision, T.X. and L.Y.; Project administration, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are publicly available from open-access sources, including the Coat Shopping dataset accessed via https://www.cs.cornell.edu/~schnabts/mnar/ (accessed on 3 November 2025), the Yahoo! R3 dataset available at https://www.kaggle.com/datasets/limitiao/yahoor3 (accessed on 3 November 2025), and the KuaiRec dataset openly accessible through https://kuairec.com/.

Conflicts of Interest

Author Sichao Wang was employed by the company KUKA Robotics China Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Proofs

Proof of Theorem 1.
It suffices to show that Equation (9) does not hold under Assumption 3. This follows immediately from the truth that r u , i is associated with o u , i given x u , i due to the existence of unmeasured confounders h u , i (See Figure 2). □
Proof of Theorem 2.
The conclusion of Theorem 2(a) can be found in Theorem 3.1 of [57]. The result of Theorem 2(b) holds since Equation (16) is implied by Assumptions 1 and 3 directly, and Propositions 1 and 2. □
Proof of Lemma 1.
By definition, it is easy to see that
m ( x u , i ) = log p u , i 1 p u , i , m ˜ ( x u , i , h u , i ) = log p ˜ u , i 1 p ˜ u , i .
Following the sensitivity analysis logic in [57] and the principle mentioned in [76], we have that | m ˜ ( x u , i , h u , i ) m ( x u , i ) |   log Γ , and
1 Γ e m ˜ ( x u , i , h u , i ) m ( x u , i ) Γ .
Plugging Equation (A1) into (A2) yields that
1 Γ p u , i 1 p u , i / p ˜ u , i 1 p ˜ u , i Γ .
By setting w ˜ u , i = 1 / p ˜ u , i and after some simple algebra, we get that
1 + ( 1 / p u , i 1 ) / Γ w ˜ u , i 1 + ( 1 / p u , i 1 ) Γ .

Appendix B. Exploratory Experiment

Table A1. Exploratory results of MRDR-JL-based methods with VAE-CF backbone on the coat dataset. Best performance in bold.
Table A1. Exploratory results of MRDR-JL-based methods with VAE-CF backbone on the coat dataset. Best performance in bold.
MethodMSEAUCNDCG@5NDCG@10
+ MRDR-JL0.23120.63240.63970.7078
+ MRDR-JL-SVDD0.23090.64030.64320.7117
+ RD_MRDR-JL0.23130.64340.65770.7139
+ RD_MRDR-JL-SVDD0.22150.65210.66260.7252

References

  1. Denis, F.; Gilleron, R.; Letouzey, F. Learning from positive and unlabeled examples. Theor. Comput. Sci. 2005, 348, 70–83. [Google Scholar] [CrossRef]
  2. Du Plessis, M.; Niu, G.; Sugiyama, M. Convex formulation for learning from positive and unlabeled data. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1386–1394. [Google Scholar]
  3. Kiryo, R.; Niu, G.; du Plessis, M.C.; Sugiyama, M. Positive-unlabeled learning with non-negative risk estimator. Adv. Neural Inf. Process. Syst. 2017, 30, 1674–1684. [Google Scholar]
  4. Kato, M.; Teshima, T.; Honda, J. Learning from positive and unlabeled data with a selection bias. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  5. Bekker, J.; Davis, J. Learning from positive and unlabeled data: A survey. Mach. Learn. 2020, 109, 719–760. [Google Scholar] [CrossRef]
  6. Su, G.; Chen, W.; Xu, M. Positive-unlabeled learning from imbalanced data. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual, 19–26 August 2021; pp. 2995–3001. [Google Scholar]
  7. Chapel, L.; Alaya, M.Z.; Gasso, G. Partial optimal transport with applications on positive-unlabeled learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2903–2913. [Google Scholar]
  8. Vinay, M.; Yuan, S.; Wu, X. Fraud detection via contrastive positive unlabeled learning. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 1475–1484. [Google Scholar]
  9. Sengupta, S.; Loomba, J.; Sharma, S.; Chapman, S.A.; Brown, D.E. Determining risk factors for long COVID using positive unlabeled learning on electronic health records data from NIH N3C. In Proceedings of the 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 15–17 December 2023; pp. 430–436. [Google Scholar]
  10. Wang, W.; Feng, F.; He, X.; Nie, L.; Chua, T.S. Denoising implicit feedback for recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 373–381. [Google Scholar]
  11. Jannach, D.; Lerche, L.; Zanker, M. Recommending based on implicit feedback. In Social Information Access; Springer: Berlin/Heidelberg, Germany, 2018; pp. 510–569. [Google Scholar]
  12. Zhang, S.; Zhang, Y.; Chen, J.; Sui, H. Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, Toronto, ON, Canada, 3–7 August 2025. [Google Scholar]
  13. Lin, J.; Dai, X.; Shan, R.; Chen, B.; Tang, R.; Yu, Y.; Zhang, W. Large language models make sample-efficient recommender systems. Front. Comput. Sci. 2025, 19, 194328. [Google Scholar] [CrossRef]
  14. Saito, Y.; Yaginuma, S.; Nishino, Y.; Sakata, H.; Nakata, K. Unbiased recommender learning from missing-not-at-random implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 501–509. [Google Scholar]
  15. Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 263–272. [Google Scholar]
  16. Wang, H.; Chen, Z.; Liu, Z.; Li, H.; Yang, D.; Liu, X.; Li, H. Entire space counterfactual learning for reliable content recommendations. IEEE Trans. Inf. Forensics Secur. 2025, 20, 1755–1764. [Google Scholar] [CrossRef]
  17. Pan, H.; Zheng, C.; Wang, W.; Jiang, J.; Li, X.; Li, H.; Feng, F. Batch-Adaptive Doubly Robust Learning for Debiasing Post-Click Conversion Rate Prediction Under Sparse Data. ACM Trans. Inf. Syst. 2025, 1341–1350. [Google Scholar] [CrossRef]
  18. Lee, J.W.; Park, S.; Lee, J.; Lee, J. Bilateral Self-unbiased Learning from Biased Implicit Feedback. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022. [Google Scholar]
  19. Zhou, C.; Li, H.; Yao, L.; Gong, M. Counterfactual implicit feedback modeling. In Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, USA, 30 November–7 December 2025. [Google Scholar]
  20. Lian, D.; Chen, J.; Zheng, K.; Chen, E.; Zhou, X. Ranking-based Implicit Regularization for One-Class Collaborative Filtering. IEEE Trans. Knowl. Data Eng. 2021, 34, 5951–5963. [Google Scholar] [CrossRef]
  21. VanderWeele, T.J.; Shpitser, I. On the definition of a confounder. Ann. Stat. 2013, 41, 196. [Google Scholar] [CrossRef]
  22. Schnabel, T.; Swaminathan, A.; Singh, A.; Chandak, N.; Joachims, T. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1670–1679. [Google Scholar]
  23. Wu, P.; Li, H.; Deng, Y.; Hu, W.; Dai, Q.; Dong, Z.; Sun, J.; Zhang, R.; Zhou, X.H. On the opportunity of causal learning in recommendation systems: Foundation, estimation, prediction and challenges. arXiv 2022, arXiv:2201.06716. [Google Scholar] [CrossRef]
  24. Dai, Q.; Li, H.; Wu, P.; Dong, Z.; Zhou, X.H.; Zhang, R.; He, X.; Zhang, R.; Sun, J. A Generalized Doubly Robust Learning Framework for Debiasing Post-Click Conversion Rate Prediction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022. [Google Scholar]
  25. Bayer, I.; He, X.; Kanagal, B.; Rendle, S. A Generic Coordinate Descent Framework for Learning from Implicit Feedback. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1341–1350. [Google Scholar]
  26. Chen, J.; Lian, D.; Zheng, K. Improving one-class collaborative filtering via ranking-based implicit regularizer. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 37–44. [Google Scholar]
  27. Chen, T.; Sun, Y.; Shi, Y.; Hong, L. On Sampling Strategies for Neural Network-based Collaborative Filtering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 767–776. [Google Scholar]
  28. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017. [Google Scholar]
  29. Lian, D.; Liu, Q.; Chen, E. Personalized Ranking with Importance Sampling. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 1093–1103. [Google Scholar]
  30. Wang, H.; Chen, Z.; Wang, H.; Tan, Y.; Pan, L.; Liu, T.; Chen, X.; Li, H.; Lin, Z. Unbiased recommender learning from implicit feedback via weakly supervised learning. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
  31. Saito, Y. Unbiased Pairwise Learning from Biased Implicit Feedback. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, Virtual, 14–17 September 2020. [Google Scholar]
  32. Zhu, Z.; He, Y.; Zhang, Y.; Caverlee, J. Unbiased Implicit Recommendation and Propensity Estimation via Combinational Joint Learning. In Proceedings of the 14th ACM Conference on Recommender Systems, Virtual, 22–26 September 2020. [Google Scholar]
  33. Woong Lee, J.; Park, S.; Lee, J. Dual Unbiased Recommender Learning for Implicit Feedback. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021. [Google Scholar]
  34. Escobedo, G.; Penz, D.; Schedl, M. Debiasing Implicit Feedback Recommenders via Sliced Wasserstein Distance-based Regularization. In Proceedings of the Nineteenth ACM Conference on Recommender Systems, Prague, Czech Republic, 22–26 September 2025; pp. 1153–1158. [Google Scholar]
  35. Bekker, J.; Robberechts, P.; Davis, J. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In Proceedings of the Oint European Conference on Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September 2019; pp. 71–85. [Google Scholar]
  36. De Block, S.; Bekker, J. Bagging propensity weighting: A robust method for biased PU learning. In Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications 2022, Grenoble, France, 23 September 2022; pp. 23–37. [Google Scholar]
  37. Wang, X.; Zhang, R.; Sun, Y.; Qi, J. Doubly robust joint learning for recommendation on data missing not at random. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6638–6647. [Google Scholar]
  38. Guo, S.; Zou, L.; Liu, Y.; Ye, W.; Cheng, S.; Wang, S.; Chen, H.; Yin, D.; Chang, Y. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 275–284. [Google Scholar]
  39. Li, H.; Xiao, Y.; Zheng, C.; Wu, P.; Cui, P. Propensity matters: Measuring and enhancing balancing for recommendation. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 20182–20194. [Google Scholar]
  40. Li, H.; Zheng, C.; Wang, W.; Wang, H.; Feng, F.; Zhou, X.H. Debiased recommendation with noisy feedback. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024. [Google Scholar]
  41. Zhang, S.; Xia, T. CBPL: A unified calibration and balancing propensity learning framework in causal recommendation for debiasing. In Proceedings of the IJCAI 2025 Workshop Causal Learning RecSys, Montreal, QC, Canada, 16–22 August 2025. [Google Scholar]
  42. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
  43. Meng, X.; Wang, S.; Shu, K.; Li, J.; Chen, B.; Liu, H.; Zhang, Y. Personalized privacy-preserving social recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  44. Wang, W.; Feng, F.; He, X.; Wang, X.; Chua, T.S. Deconfounded recommendation for alleviating bias amplification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 1717–1725. [Google Scholar]
  45. Li, H.; Wu, K.; Zheng, C.; Xiao, Y.; Wang, H.; Geng, Z.; Feng, F.; He, X.; Wu, P. Removing hidden confounding in recommendation: A unified multi-task learning approach. Adv. Neural Inf. Process. Syst. 2023, 36, 54614–54626. [Google Scholar]
  46. Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
  47. Chen, J.; Dong, H.; Wang, X.; Feng, F.; Wang, M.; He, X. Bias and debias in recommender system: A survey and future directions. ACM Trans. Inf. Syst. 2023, 41, 1–39. [Google Scholar] [CrossRef]
  48. Wang, Z.; Chen, X.; Wen, R.; Huang, S.L.; Kuruoglu, E.; Zheng, Y. Information theoretic counterfactual learning from missing-not-at-random feedback. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 6–12 December 2020; pp. 1854–1864. [Google Scholar]
  49. Zhang, Y.; Feng, F.; He, X.; Wei, T.; Song, C.; Ling, G.; Zhang, Y. Causal intervention for leveraging popularity bias in recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 11–20. [Google Scholar]
  50. Li, H.; Zheng, C.; Ding, S.; Feng, F.; He, X.; Geng, Z.; Wu, P. Be aware of the neighborhood effect: Modeling selection bias under interference for recommendation. arXiv 2024, arXiv:2404.19620. [Google Scholar]
  51. Zheng, C.; Pan, H.; Zhang, Y.; Li, H. Adaptive structure learning with partial parameter sharing for post-click conversion rate prediction. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025. [Google Scholar]
  52. Ai, Q.; Bi, K.; Luo, C.; Guo, J.; Croft, W.B. Unbiased learning to rank with unbiased propensity estimation. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 385–394. [Google Scholar]
  53. Liu, Y.; Cao, X.; Yu, Y. Are you influenced by others when rating? Improve rating prediction by conformity modeling. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 269–272. [Google Scholar]
  54. Li, H.; Zheng, C.; Wang, S.; Wu, K.; Wang, E.; Wu, P.; Geng, Z.; Chen, X.; Zhou, X.H. Relaxing the accurate imputation assumption in doubly robust learning for debiased collaborative filtering. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
  55. Li, H.; Zheng, C.; Zhou, X.H.; Wu, P. Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. arXiv 2022, arXiv:2205.04701v2. [Google Scholar]
  56. Li, H.; Lyu, Y.; Zheng, C.; Wu, P. TDR-CL: Targeted doubly robust collaborative learning for debiased Recommendations. In Proceedings of the ICLR, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  57. Ding, S.; Wu, P.; Feng, F.; He, X.; Wang, Y.; Liao, Y.; Zhang, Y. Addressing Unmeasured Confounder for Recommendation with Sensitivity Analysis. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022. [Google Scholar]
  58. Liang, D.; Charlin, L.; McInerney, J.; Blei, D.M. Modeling user exposure in recommendation. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 951–961. [Google Scholar]
  59. Saito, Y. Doubly robust estimator for ranking metrics with post-click conversions. In Proceedings of the Fourteenth ACM Conference on Recommender Systems, Virtual, 22–26 September 2020; pp. 92–100. [Google Scholar]
  60. Yang, L.; Cui, Y.; Xuan, Y.; Wang, C.; Belongie, S.; Estrin, D. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver, BC, Canada, 2–7 October 2018; pp. 279–287. [Google Scholar]
  61. Wang, H.; Chen, Z.; Wang, H.; Tan, Y.; Pan, L.; Liu, T.; Chen, X.; Li, H.; Lin, Z. Unbiased recommender learning from implicit feedback via progressive proximal transport. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
  62. Plessis, M.C.d.; Niu, G.; Sugiyama, M. Analysis of learning from positive and unlabeled data. Adv. Neural Inf. Process. Syst. 2014, 27, 703–711. [Google Scholar]
  63. Northcutt, C.G.; Wu, T.; Chuang, I.L. Learning with confident examples: Rank pruning for robust classification with noisy labels. arXiv 2017, arXiv:1705.01936. [Google Scholar] [CrossRef]
  64. Hou, M.; Chaib-Draa, B.; Li, C.; Zhao, Q. Generative adversarial positive-unlabeled learning. arXiv 2017, arXiv:1711.08054. [Google Scholar]
  65. Gong, C.; Shi, H.; Liu, T.; Zhang, C.; Yang, J.; Tao, D. Loss decomposition and centroid estimation for positive and unlabeled learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 918–932. [Google Scholar] [CrossRef]
  66. Luo, C.; Zhao, P.; Chen, C.; Qiao, B.; Du, C.; Zhang, H.; Wu, W.; Cai, S.; He, B.; Rajmohan, S.; et al. Pulns: Positive-unlabeled learning with effective negative sample selector. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 11–15 October 2021; Volume 35, pp. 8784–8792. [Google Scholar]
  67. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 1999, 11, 1443–1471. [Google Scholar] [CrossRef]
  68. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  69. Gewers, F.L.; Ferreira, G.R.; Arruda, H.F.D.; Silva, F.N.; Comin, C.H.; Amancio, D.R.; Costa, L.d.F. Principal component analysis: A natural approach to data exploration. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
  70. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  71. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
  72. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
  73. Zhang, W.; Bao, W.; Liu, X.; Yang, K.; Lin, Q.; Wen, H.; Ramezani, R. Large-scale Causal Approaches to Debiasing Post-click Conversion Rate Estimation with Multi-task Learning. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 2775–2781. [Google Scholar]
  74. Li, X.L.; Liu, B. Learning from positive and unlabeled examples with different data distributions. In Proceedings of the European Conference on Machine Learning, Porto, Portugal, 3–7 October 2005; pp. 218–229. [Google Scholar]
  75. Wang, Y.; Liang, D.; Charlin, L.; Blei, D.M. The deconfounded recommender: A causal inference approach to recommendation. arXiv 2018, arXiv:1808.06581. [Google Scholar]
  76. Rosenbaum, P.R.; Rosenbaum, P.; Briskman. Design of Observational Studies; Springer: Berlin/Heidelberg, Germany, 2010; Volume 10. [Google Scholar]
  77. Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
  78. Liang, D.; Krishnan, R.G.; Hoffman, M.D.; Jebara, T. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018. [Google Scholar]
Figure 1. Two-phase debiasing framework for implicit feedback. 1: Observed positive label, 0: Unlabeled entry; ?: Entry with unknown status for explicit feedback; 1/0: Imputed label after treatment imputation; 1/0: Predicted label after debiasing.
Figure 1. Two-phase debiasing framework for implicit feedback. 1: Observed positive label, 0: Unlabeled entry; ?: Entry with unknown status for explicit feedback; 1/0: Imputed label after treatment imputation; 1/0: Predicted label after debiasing.
Entropy 28 00041 g001
Figure 2. A typical causal graph for implicit feedback with unmeasured confounders H.
Figure 2. A typical causal graph for implicit feedback with unmeasured confounders H.
Entropy 28 00041 g002
Figure 3. The effect of an SVDD treatment imputation positive sampling ratio on AUC, NDCG@5, and NDCG@10 for the proposed two-phase debiasing framework.
Figure 3. The effect of an SVDD treatment imputation positive sampling ratio on AUC, NDCG@5, and NDCG@10 for the proposed two-phase debiasing framework.
Entropy 28 00041 g003
Figure 4. AUC, NDCG@5 and NDCG@10 on MAR data with the different adversarial strength. The above are the IPS and DR methods and the bottom are the DR-based methods with the joint learning algorithm.
Figure 4. AUC, NDCG@5 and NDCG@10 on MAR data with the different adversarial strength. The above are the IPS and DR methods and the bottom are the DR-based methods with the joint learning algorithm.
Entropy 28 00041 g004
Figure 5. Training and validation loss curves for SVDD-only, RD-only, and RD-SVDD variants of the IPS, DR, DR-JL, and MRDR-JL estimators. The RD-SVDD configuration provides the most stable convergence across families. In the DR family, intrinsic variance of the DR objective produces more pronounced overfitting, though RD-SVDD remains more stable than RD-only.
Figure 5. Training and validation loss curves for SVDD-only, RD-only, and RD-SVDD variants of the IPS, DR, DR-JL, and MRDR-JL estimators. The RD-SVDD configuration provides the most stable convergence across families. In the DR family, intrinsic variance of the DR objective produces more pronounced overfitting, though RD-SVDD remains more stable than RD-only.
Entropy 28 00041 g005
Table 1. MSE, AUC, NDCG@5 (N@5), and NDCG@10 (N@10) on the MAR data of Coat, Yahoo! R3, and KuaiRec. We bold the outperforming IPS-based, DR-based, and MRDR-based models.
Table 1. MSE, AUC, NDCG@5 (N@5), and NDCG@10 (N@10) on the MAR data of Coat, Yahoo! R3, and KuaiRec. We bold the outperforming IPS-based, DR-based, and MRDR-based models.
MethodCoatYahoo!R3 KuaiRec
MetricsMSEAUCN@5N@10MSEAUCN@5N@10MSEAUCN@5N@10
Base Model (MF)0.24140.60380.62940.69980.20890.63290.71040.80620.18370.80950.53150.5498
+ WMF0.24470.62890.63560.70600.21050.64870.72790.81740.18180.81800.53480.5544
+ BPR-MF0.24920.61360.63780.70610.20570.64330.71020.80740.18480.81780.53420.5447
+ Rel-MF0.24040.63120.63680.70250.21280.66550.71380.80690.17860.82040.53570.5486
+ IPS0.23500.62030.62460.70050.20340.63280.70530.80500.17520.81330.51300.5374
+ IPS-SVDD0.23390.63010.63730.70370.21160.66410.72670.81520.17330.82050.52690.5472
+ RD-IPS0.23280.63270.63140.70990.20970.66580.72150.81250.17280.81930.52950.5521
+ RD-IPS-SVDD0.22800.64970.65320.71610.19340.66630.72970.81840.16910.82650.54010.5603
+ DR0.24730.62740.63520.70470.20490.63790.71610.81100.17750.83050.53370.5541
+ DR-SVDD0.24050.62990.64250.71380.20540.65650.72520.81660.17570.82870.53570.5533
+ RD-DR0.24380.63370.64260.71460.20570.66750.72270.81440.17870.83150.53710.5530
+ RD-DR-SVDD0.24100.63780.66450.73230.19980.67060.73580.82240.17770.83240.54580.5544
+ DR-JL0.23390.63160.62390.70280.20240.63290.70660.80500.17610.81820.52060.5369
+ DR-JL-SVDD0.23260.63680.62770.70520.20360.63460.71660.81050.17360.81600.52320.5444
+ RD-DR-JL0.23240.63420.64270.70970.19760.64190.70980.80640.17610.81820.52230.5463
+ RD-DR-JL-SVDD0.22880.65900.64490.71650.19580.64770.71660.81220.17330.81970.52320.5464
+ MRDR-JL0.23670.62670.63510.70800.20370.63130.70770.80620.17700.81420.52380.5394
+ MRDR-JL-SVDD0.23500.62730.64000.70910.19920.63770.71650.81160.17590.81660.53020.5489
+ RD-MRDR-JL0.23330.63630.64050.70600.20130.63690.71970.81270.17570.81630.52570.5387
+ RD-MRDR-JL-SVDD0.22880.64540.65590.72010.19900.64660.72170.81440.16890.81900.53310.5557
Table 2. Training duration (seconds) comparison on Coat, Yahoo!R3 and KuaiRec datasets.
Table 2. Training duration (seconds) comparison on Coat, Yahoo!R3 and KuaiRec datasets.
MethodCoatYahoo!R3 KuaiRec
MF7.6115.318.62
MF-SVDD4.379.594.71
IPS7.858.2313.35
IPS-SVDD6.2811.4411.42
RD-IPS3.5459.6651.84
RD-IPS-SVDD4.4858.0954.31
DR4.2850.8929.26
DR-SVDD4.8157.0826.30
RD-DR4.32104.2229.24
RD-DR-SVDD4.3799.0229.88
DR-JL10.9974.6188.13
DR-JL-SVDD20.5868.0469.75
RD-DR-JL16.90125.52125.16
RD-DR-JL-SVDD12.14132.56114.56
MRDR-JL10.6368.9747.56
MRDR-JL-SVDD13.8979.2478.00
RD-MRDR-JL17.20185.23165.26
RD-MRDR-JL-SVDD12.93189.68157.06
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Xia, T.; Yang, L. Positive-Unlabeled Learning in Implicit Feedback from Data Missing-Not-At-Random Perspective. Entropy 2026, 28, 41. https://doi.org/10.3390/e28010041

AMA Style

Wang S, Xia T, Yang L. Positive-Unlabeled Learning in Implicit Feedback from Data Missing-Not-At-Random Perspective. Entropy. 2026; 28(1):41. https://doi.org/10.3390/e28010041

Chicago/Turabian Style

Wang, Sichao, Tianyu Xia, and Lingxiao Yang. 2026. "Positive-Unlabeled Learning in Implicit Feedback from Data Missing-Not-At-Random Perspective" Entropy 28, no. 1: 41. https://doi.org/10.3390/e28010041

APA Style

Wang, S., Xia, T., & Yang, L. (2026). Positive-Unlabeled Learning in Implicit Feedback from Data Missing-Not-At-Random Perspective. Entropy, 28(1), 41. https://doi.org/10.3390/e28010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop