PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning
Abstract
1. Introduction
- We bound the generalization gap of randomized pairwise learning algorithms that operate with an arbitrary data-dependent sampling, in a PAC–Bayesian framework, under a sub-exponential stability condition.
- We apply the above general result to Pairwise SGD and Pairwise SGDA with arbitrary sampling. For both of these algorithms, we verify the sub-exponential stability in both smooth and non-smooth problems.
- We exemplify how our bounds can inform algorithm design, and we demonstrate how to extract meaningful information from the resulting algorithms.
2. Related Work
3. Preliminaries
3.1. Pairwise Learning and U-Statistics
3.2. Connection with the PAC–Bayesian Framework
3.3. Connection with the Algorithmic Stability Framework
4. Main Results
4.1. Stability and Generalization of Pairwise SGD
- (1)
- At the t-th iteration, we have sub-exponential stability (Definition 2) with
- (2)
- If in addition the loss is also smooth (Assumption 3 holds), then with step size , at the t-th iteration, we have sub-exponential stability (Definition 2) with
- (1)
- After T iterations, we have
- (2)
- If in addition the loss is also smooth (Assumption 3 holds), then with step size , we have
4.2. Stability and Generalization of Pairwise SGDA
- (1)
- At the t-th iteration, we have sub-exponential stability (Definition 2) with
- (2)
- If in addition the loss is also smooth (cf. Assumption 5), then at the t-th iteration, sub-exponential stability (Definition 2) holds with
- (1)
- After T iterations, we have
- (2)
- If in addition the loss is also smooth (cf. Assumption 5), we have
5. Algorithmic Implications and Illustrative Experiments
Algorithm 1 Pairwise SGD-Q/Pairwise SGDA-Q |
|
Numerical Results in Pairwise Preference Learning
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Lemma 1
- , for all .
- There exists such that, for all ,
- , for all .
- For all λ such that ,
- almost surely (a.s.).
- a.s.
- For any with we have a.s., and for any with and , we have a.s.
Appendix B. Proofs for Pairwise SGD and Pairwise SGDA with Adaptive Sampling
Appendix B.1. Pairwise Stochastic Gradient Descent
Appendix B.2. Pairwise Stochastic Gradient Descent Ascent
- (1)
- If Assumption 4 holds, then
- (2)
- If Assumption 5 holds, then
References
- Lei, G.; Shi, L. Pairwise ranking with Gaussian kernel. Adv. Comput. Math. 2024, 50, 70. [Google Scholar] [CrossRef]
- Agarwal, S.; Niyogi, P. Generalization bounds for ranking algorithms via algorithmic stability. J. Mach. Learn. Res. 2009, 10, 441–474. [Google Scholar]
- Clémençon, S.; Lugosi, G.; Vayatis, N. Ranking and empirical minimization of U-statistics. In The Annals of Statistics; Institute of Mathematical Statistics: Beachwood, OH, USA, 2008; pp. 844–874. [Google Scholar]
- Cortes, C.; Mohri, M. AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 2004, 16, 313–320. [Google Scholar]
- Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef]
- Koestinger, M.; Hirzer, M.; Wohlhart, P.; Roth, P.M.; Bischof, H. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2288–2295. [Google Scholar]
- Xiong, F.; Gou, M.; Camps, O.; Sznaier, M. Person re-identification using kernel-based metric learning methods. In Proceedings of the European Conference Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 1–16. [Google Scholar]
- Guillaumin, M.; Verbeek, J.; Schmid, C. Is that you? Metric learning approaches for face identification. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 498–505. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2017, 14, 1–20. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Deep metric learning for person re-identification. In Proceedings of the 2014 IEEE 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 34–39. [Google Scholar]
- Feng, B.; Liu, Z.; Huang, N.; Xiao, Z.; Zhang, H.; Mirzoyan, S.; Xu, H.; Hao, J.; Xu, Y.; Zhang, M.; et al. A bioactivity foundation model using pairwise meta-learning. Nat. Mach. Intell. 2024, 6, 962–974. [Google Scholar] [CrossRef]
- Sellamanickam, S.; Garg, P.; Selvaraj, S.K. A pairwise ranking based approach to learning with positive and unlabeled examples. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, New York, NY, USA, 24–28 October 2011; CIKM ’11. pp. 663–672. [Google Scholar]
- Wang, Z.; Liang, P.; Bai, R.; Liu, Y.; Zhao, J.; Yao, L.; Zhang, J.; Chu, F. Few-shot fault diagnosis for machinery using multi-scale perception multi-level feature fusion image quadrant entropy. Adv. Eng. Inform. 2025, 63, 102972. [Google Scholar] [CrossRef]
- Zhao, P.; Zhang, T. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1–9. [Google Scholar]
- Allen-Zhu, Z.; Qu, Z.; Richtárik, P.; Yuan, Y. Even faster accelerated coordinate descent using non-uniform sampling. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1110–1119. [Google Scholar]
- Katharopoulos, A.; Fleuret, F. Biased importance sampling for deep neural network training. arXiv 2017, arXiv:1706.00043. [Google Scholar]
- Johnson, T.B.; Guestrin, C. Training deep models faster with robust, approximate importance sampling. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. [Google Scholar]
- Wu, C.Y.; Manmatha, R.; Smola, A.J.; Krahenbuhl, P. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2840–2848. [Google Scholar]
- Han, X.; Yu, X.; Li, G.; Zhao, J.; Pan, G.; Ye, Q.; Jiao, J.; Han, Z. Rethinking sampling strategies for unsupervised person re-identification. IEEE Trans. Image Process. 2022, 32, 29–42. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
- Zhou, S.; Lei, Y.; Kabán, A. Learning to Sample in Stochastic Optimization. In Proceedings of the 41st Confenence on Uncertainty in Artificial Intelligence, Rio de Janeiro, Brazil, 21–25 July 2025. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Zhou, M.; Patel, V.M. Enhancing adversarial robustness for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15325–15334. [Google Scholar]
- Wen, W.; Li, H.; Wu, R.; Wu, L.; Chen, H. Generalization analysis of adversarial pairwise learning. Neural Netw. 2025, 183, 106955. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Wang, Z.J.; Yao, B.; Yin, J. Geo-ALM: POI Recommendation by Fusing Geographical Information and Adversarial Learning Mechanism. Int. Jt. Conf. Artif. Intell. 2019, 7, 1807–1813. [Google Scholar]
- Zhang, L.; Duan, Q.; Zhang, D.; Jia, W.; Wang, X. AdvKin: Adversarial convolutional network for kinship verification. IEEE Trans. Cybern. 2020, 51, 5883–5896. [Google Scholar] [CrossRef]
- De la Pena, V.; Giné, E. Decoupling: From Dependence to Independence; Springer Science & Business Media: New York, NY, USA, 2012. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, AZ, USA, 2–4 May 2013. Workshop Track Proceedings, 2013. [Google Scholar]
- Beznosikov, A.; Gorbunov, E.; Berard, H.; Loizou, N. Stochastic gradient descent-ascent: Unified theory and new efficient methods. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Valencia, Spain, 25–27 April 2023; pp. 172–235. [Google Scholar]
- Zhou, S.; Lei, Y.; Kabán, A. Toward Better PAC-Bayes Bounds for Uniformly Stable Algorithms. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Volume 36. [Google Scholar]
- London, B. A PAC-bayesian analysis of randomized learning with application to stochastic gradient descent. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 2931–2940. [Google Scholar]
- Lei, Y.; Ledent, A.; Kloft, M. Sharper Generalization Bounds for Pairwise Learning. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–10 December 2020; Volume 33, pp. 21236–21246. [Google Scholar]
- Katharopoulos, A.; Fleuret, F. Not all samples are created equal: Deep learning with importance sampling. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm Sweden, 10–15 July 2018; pp. 2525–2534. [Google Scholar]
- Bousquet, O.; Elisseeff, A. Stability and generalization. J. Mach. Learn. Res. 2002, 2, 499–526. [Google Scholar]
- Hardt, M.; Recht, B.; Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1225–1234. [Google Scholar]
- Lei, Y.; Yang, Z.; Yang, T.; Ying, Y. Stability and Generalization of Stochastic Gradient Methods for Minimax Problems. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6175–6186. [Google Scholar]
- Farnia, F.; Ozdaglar, A. Train simultaneously, generalize better: Stability of gradient-based minimax learners. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 3174–3185. [Google Scholar]
- Shawe-Taylor, J.; Williamson, R.C. A PAC analysis of a Bayesian estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, Nashville, TN, USA, 6–9 July 1997; pp. 2–9. [Google Scholar]
- McAllester, D.A. Some pac-bayesian theorems. Mach. Learn. 1999, 37, 355–363. [Google Scholar] [CrossRef]
- London, B.; Huang, B.; Getoor, L. Stability and generalization in structured prediction. J. Mach. Learn. Res. 2016, 17, 7808–7859. [Google Scholar]
- Rivasplata, O.; Parrado-Hernández, E.; Shawe-Taylor, J.S.; Sun, S.; Szepesvári, C. PAC-Bayes bounds for stable algorithms with instance-dependent priors. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 9214–9224. [Google Scholar]
- Sun, S.; Yu, M.; Shawe-Taylor, J.; Mao, L. Stability-based PAC-Bayes analysis for multi-view learning algorithms. Inf. Fusion 2022, 86, 76–92. [Google Scholar] [CrossRef]
- Oneto, L.; Donini, M.; Pontil, M.; Shawe-Taylor, J. Randomized learning and generalization of fair and private classifiers: From PAC-Bayes to stability and differential privacy. Neurocomputing 2020, 416, 231–243. [Google Scholar] [CrossRef]
- Mou, W.; Wang, L.; Zhai, X.; Zheng, K. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints. In Proceedings of the Conference on Learning Theory, Stockholm, Sweden, 6–9 July 2018; pp. 605–638. [Google Scholar]
- Negrea, J.; Haghifam, M.; Dziugaite, G.K.; Khisti, A.; Roy, D.M. Information-theoretic generalization bounds for SGLD via data-dependent estimates. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
- Li, J.; Luo, X.; Qiao, M. On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Ralaivola, L.; Szafranski, M.; Stempfel, G. Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary β-mixing processes. J. Mach. Learn. Res. 2010, 11, 1927–1956. [Google Scholar]
- Viallard, P.; Germain, P.; Habrard, A.; Morvant, E. A General Framework for the Derandomization of PAC-Bayesian Bounds. ArXiv 2021, arXiv:2102.08649. [Google Scholar]
- Picard-Weibel, A.; Clerico, E.; Moscoviz, R.; Guedj, B. How good is PAC-Bayes at explaining generalisation? arXiv 2025, arXiv:2503.08231. [Google Scholar]
- Lei, Y.; Liu, M.; Ying, Y. Generalization Guarantee of SGD for Pairwise Learning. Adv. Neural Inf. Process. Syst. 2021, 34, 21216–21228. [Google Scholar]
- Zhang, J.; Hong, M.; Wang, M.; Zhang, S. Generalization bounds for stochastic saddle point problems. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 13–15 April 2021; pp. 568–576. [Google Scholar]
- Liu, T.Y. Learning to Rank for Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Guedj, B.; Pujol, L. Still no free lunches: The price to pay for tighter PAC-Bayes bounds. Entropy 2021, 23, 1529. [Google Scholar] [CrossRef]
- Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018; Volume 47. [Google Scholar]
- Van Handel, R. Probability in high dimension. In Lecture Notes; Princeton University: Princeton, NJ, USA, 2014. [Google Scholar]
- Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Algo. | Asm. | Time T and Step Size | Rates | |
---|---|---|---|---|
Pairwise SGD | L, C | |||
Theorem 1 (1) | ||||
L, S, C | ||||
Theorem 1 (2) | ||||
Pairwise SGDA | L, C | |||
Theorem 2 (1) | ||||
L, S, C | ||||
Theorem 2 (2) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, S.; Lei, Y.; Kabán, A. PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning. Entropy 2025, 27, 845. https://doi.org/10.3390/e27080845
Zhou S, Lei Y, Kabán A. PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning. Entropy. 2025; 27(8):845. https://doi.org/10.3390/e27080845
Chicago/Turabian StyleZhou, Sijia, Yunwen Lei, and Ata Kabán. 2025. "PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning" Entropy 27, no. 8: 845. https://doi.org/10.3390/e27080845
APA StyleZhou, S., Lei, Y., & Kabán, A. (2025). PAC–Bayes Guarantees for Data-Adaptive Pairwise Learning. Entropy, 27(8), 845. https://doi.org/10.3390/e27080845