Absolute Value Inequality SVM for the PU Learning Problem
Abstract
:1. Introduction
 (1)
 Our approach employs TEDA for the extraction of reliable negative examples from the unlabeled dataset. Unlike alternative methods for negative example selection, TEDA operates without predefined assumptions, relying solely on the spatial distribution of the data. Furthermore, this method is not only computationally straightforward but also remains effective even with a limited number of positive examples.
 (2)
 Our approach is the first to use the convex absolute value inequality technique to solve PU problems. This technique enables the successive linearization algorithm to resolve the optimization model, thus reducing the computational costs.
 (3)
 Our approach adopts the hyperparameter optimization method HORD to set the optimal values of the parameters in the algorithm, thus reducing the amount of resources spent in manually adjusting the parameters.
2. Related Work
3. Preliminaries
3.1. Convex Absolute Value Inequality with SVM
3.2. Typicality and Eccentricity Data Analytics
3.3. HORD Algorithm
4. Proposed Approach
4.1. Reliable Negative Samples
Algorithm 1 Generation of reliable negative samples 

4.2. AVISVM Formulation for the Resulting SemiSupervised Problem
Algorithm 2 Successive linearization algorithm for (20) 

4.3. Performance Metric
4.4. Parameter Tuning
Algorithm 3 Parameter tuning by HORD 

4.5. AVISVM Algorithm
Algorithm 4 AVISVM algorithm for the PU learning problem 

5. Numerical Experiments
5.1. Algorithms for Comparison
 LUHC [9]: It exploits both the geometrical and the discriminant properties of the examples; thus, it can improve the classification performance. This algorithm follows the oneclass classification scheme.
 B${l}_{1}$NPSVM [14]: It replaces the ${L}_{2}$ norm in NPSVM with an ${L}_{1}$ regularization term to address the PU learning problem, achieving satisfactory results in both classification and feature selection aspects. This algorithm follows the biased support vector method.
 kNNSVM [20]: It sorts the unlabeled examples based on the sum of their distances to the knearest positive examples; it then selects the examples at the largest distance as reliable negative examples. The iterative SVM is used to train the classifier. This algorithm is one of the representative algorithms using the twostep strategy.
5.2. Experimental Results
5.2.1. Illustration on the Iris Dataset
5.2.2. Results on Other UCI Datasets
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
Dataset  # Examples  # Features  # Positive  # Negative 

Sonar  208  60  97  111 
Hearts  270  13  120  150 
Haberman  306  3  225  81 
BUPA  345  6  145  200 
Australian  690  14  383  307 
German  1000  24  700  300 
Banknote  1372  4  610  762 
Spambase  4601  57  1803  2788 
Twonorm  7400  20  3703  3697 
HTRU2  17898  8  1639  16259 
Dataset  LUHC  B${\mathit{l}}_{1}$NPSVM  KNNSVM  AVISVM 

$\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  
$\mathit{F}$Score  $\mathit{F}$Score  $\mathit{F}$Score  $\mathit{F}$Score  
Sonar  53.23  57.10  58.71  51.32 
0.512  0.607  0.555  0.567  
Hearts  62.47  60.49  65.43  74.44 
0.695  0.741  0.696  0.791  
Haberman  73.15  71.15  70.14  72.39 
0.584  0.757  0.809  0.832  
BUPA  45.77  47.12  44.33  43.27 
0.585  0.592  0.593  0.561  
Australian  55.56  67.21  72.22  83.09 
0.773  0.723  0.778  0.822  
German  65.67  70.10  68.24  70.67 
0.723  0.700  0.828  0.819  
Banknote  92.23  87.62  93.06  93.71 
0.921  0.900  0.913  0.933  
Spambase  52.10  50.36  49.71  54.64 
0.492  0.567  0.479  0.634  
Twonorm  97.84  90.14  92.18  97.39 
0.978  0.921  0.932  0.975  
HTRU2  54.64  45.54  52.10  43.96 
0.165  0.189  0.207  0.243 
Dataset  LUHC  B${\mathit{l}}_{1}$NPSVM  KNNSVM  AVISVM 

$\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  $\mathit{Acc}$ (%)  
$\mathit{F}$Score  $\mathit{F}$Score  $\mathit{F}$Score  $\mathit{F}$Score  
Sonar  54.84  57.58  64.35  54.11 
0.550  0.534  0.574  0.580  
Hearts  70.37  61.73  56.79  76.79 
0.699  0.762  0.724  0.802  
Haberman  76.09  73.80  65.22  73.91 
0.630  0.802  0.844  0.846  
BUPA  43.27  43.17  44.71  43.37 
0.595  0.602  0.596  0.569  
Australian  62.80  69.64  78.82  83.57 
0.754  0.821  0.829  0.843  
German  68.73  65.33  73.08  71.33 
0.738  0.728  0.828  0.823  
Banknote  94.66  88.92  94.64  96.87 
0.924  0.923  0.937  0.975  
Spambase  60.97  52.88  54.84  61.45 
0.493  0.585  0.669  0.674  
Twonorm  97.61  92.24  92.65  97.52 
0.978  0.961  0.940  0.977  
HTRU2  55.11  51.06  57.80  59.04 
0.267  0.255  0.289  0.304 
