Quantifying Privacy Risk of Mobile Apps as Textual Entailment Using Language Models
Abstract
1. Introduction
- First, we design a machine learning-based framework to determine the privacy risk of mobile applications in terms of requested permissions;
- Second, we implement the framework and assess the privacy risk of mobile applications with empirical studies.
- RQ1: How do various language models perform in quantifying the legitimacy of permissions formulated as textual entailment?
- RQ2: What is the impact of employing different loss functions and data augmentation strategies on handling imbalanced data within the context of this study?
- RQ3: How does training with multiple permission requests or utilizing pretraining with app descriptions influence the performance of the model?
2. Related Work
2.1. Security of Mobile Applications
2.2. Language Models for Natural Language Processing (NLP) Tasks
2.3. Strategies to Handle Imbalanced Dataset
3. Problem Formulation
4. Privacy Risk Quantification
- The use of more elaborated permission description to assist with fine-tuning and inference;
- Fine-tuning a model using a single permission versus multiple permissions together with (1) similar categories; (2) unrelated categories; and (3) different majority classes.
- Pretraining the model with either the collected data or similar data from previous studies [6] to make the model better suited to the task.
5. Experiment Setup
5.1. Data Collection
5.2. Data Labeling
5.3. Metrics
6. Experiment Results and Discussion
6.1. Baseline Performance
6.2. RQ1: How Do Various Language Models Perform in Quantifying the Legitimacy of Permissions Formulated as Textual Entailment?
6.3. RQ2: What Is the Impact of Employing Different Loss Functions and Data Augmentation Strategies on Handling Imbalanced Data Within the Context of This Study?
6.3.1. Effects of Different Loss Functions
6.3.2. Effects of Data Augmentation
6.4. RQ3: How Does Training with Multiple Permission Requests or Utilizing Pretraining with App Descriptions Influence the Performance of the Model?
6.4.1. Effects of Training with Multiple Permission Requests
6.4.2. Effects of Pretraining Models Using Data from the Target Domain
6.5. Overall Observations
7. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| ID | Entail | Not Entail | ID | Entail | Not Entail | ID | Entail | Not Entail |
|---|---|---|---|---|---|---|---|---|
| 1 | 93 | 489 | 14 | 92 | 507 | 25 | 26 | 568 |
| 2 | 375 | 214 | 15 | 139 | 460 | 26 | 27 | 571 |
| 5 | 559 | 27 | 16 | 584 | 16 | 27 | 585 | 5 |
| 6 | 569 | 20 | 17 | 167 | 432 | 29 | 18 | 566 |
| 7 | 438 | 140 | 18 | 166 | 434 | 30 | 15 | 90 |
| 8 | 353 | 246 | 19 | 123 | 476 | 31 | 137 | 457 |
| 9 | 579 | 21 | 20 | 388 | 197 | 32 | 43 | 541 |
| 10 | 82 | 518 | 21 | 589 | 11 | 33 | 130 | 459 |
| 11 | 237 | 363 | 22 | 469 | 131 | 34 | 42 | 555 |
| 12 | 382 | 218 | 23 | 282 | 311 | 35 | 134 | 448 |
| 13 | 26 | 574 | 24 | 173 | 420 | 38 | 573 | 18 |
| ID | |||||
|---|---|---|---|---|---|
| 1 | 0.6130 | 0.6606 | 0.7395 | 0.5752 | 0.7705 |
| 2 | 0.8246 | 0.8212 | 0.7452 | 0.5325 | 0.7530 |
| 5 | 0.9792 | 0.9767 | 0.6892 | 0.5470 | 0.7535 |
| 6 | 0.9884 | 0.9847 | 0.7282 | 0.6030 | 0.7474 |
| 7 | 0.8859 | 0.8755 | 0.7210 | 0.5123 | 0.7476 |
| 8 | 0.8116 | 0.8012 | 0.7612 | 0.5348 | 0.7635 |
| 9 | 0.9846 | 0.9806 | 0.5698 | 0.4749 | 0.7124 |
| 10 | 0.5558 | 0.5616 | 0.6901 | 0.5067 | 0.7565 |
| 11 | 0.8504 | 0.8531 | 0.8759 | 0.7595 | 0.8780 |
| 12 | 0.7680 | 0.7830 | 0.6787 | 0.4101 | 0.7053 |
| 13 | 0.3433 | 0.4009 | 0.4156 | 0.3626 | 0.6489 |
| 14 | 0.4892 | 0.5088 | 0.6266 | 0.4397 | 0.7162 |
| 15 | 0.7227 | 0.7327 | 0.8082 | 0.6463 | 0.8197 |
| 16 | 0.9846 | 0.9836 | 0.4611 | 0.3446 | 0.6515 |
| 17 | 0.5787 | 0.6434 | 0.6906 | 0.4790 | 0.7140 |
| 18 | 0.8003 | 0.8117 | 0.8580 | 0.7345 | 0.8637 |
| 19 | 0.7056 | 0.7299 | 0.7996 | 0.6492 | 0.8143 |
| 20 | 0.8887 | 0.8657 | 0.8082 | 0.6409 | 0.8032 |
| 21 | 0.9898 | 0.9849 | 0.0000 | −0.0017 | 0.5992 |
| 22 | 0.8832 | 0.8669 | 0.6163 | 0.3803 | 0.6728 |
| 23 | 0.8661 | 0.8530 | 0.8666 | 0.7445 | 0.8701 |
| 24 | 0.7525 | 0.7507 | 0.8159 | 0.6616 | 0.8314 |
| 25 | 0.4530 | 0.5106 | 0.5218 | 0.4656 | 0.7003 |
| 26 | 0.2043 | 0.1987 | 0.2671 | 0.2094 | 0.6282 |
| 27 | 0.9956 | 0.9931 | 0.0000 | 0.0000 | 0.8000 |
| 29 | 0.0625 | 0.0543 | 0.0995 | 0.0426 | 0.5289 |
| 30 | 0.4667 | 0.5762 | 0.5737 | 0.4648 | 0.6778 |
| 31 | 0.8612 | 0.8416 | 0.9065 | 0.8226 | 0.9232 |
| 32 | 0.4729 | 0.4941 | 0.5861 | 0.4549 | 0.7155 |
| 33 | 0.9290 | 0.9246 | 0.9530 | 0.9086 | 0.9568 |
| 34 | 0.5864 | 0.6977 | 0.7273 | 0.6043 | 0.7407 |
| 35 | 0.7720 | 0.7794 | 0.8436 | 0.7121 | 0.8546 |
| 38 | 0.9911 | 0.9889 | 0.8310 | 0.7287 | 0.8240 |
| avg. | 0.7291 | 0.7421 | 0.6447 | 0.5137 | 0.7558 |
| stdev. | 0.2425 | 0.2291 | 0.2462 | 0.2191 | 0.0936 |

| ID | MCC | AUROC | |||
|---|---|---|---|---|---|
| 1 | 0.7043 | 0.7880 | 0.8057 | 0.6957 | 0.8074 |
| 2 | 0.8473 | 0.8414 | 0.7706 | 0.5754 | 0.7772 |
| 5 | 0.9830 | 0.9763 | 0.5766 | 0.4880 | 0.7006 |
| 6 | 0.9867 | 0.9810 | 0.5498 | 0.4632 | 0.6732 |
| 7 | 0.8844 | 0.8723 | 0.6726 | 0.4846 | 0.7156 |
| 8 | 0.8148 | 0.8149 | 0.7704 | 0.5550 | 0.7745 |
| 9 | 0.9828 | 0.9789 | 0.5670 | 0.4600 | 0.6866 |
| 10 | 0.5091 | 0.5918 | 0.6525 | 0.4889 | 0.7058 |
| 11 | 0.9026 | 0.8990 | 0.9192 | 0.8416 | 0.9222 |
| 12 | 0.8383 | 0.8280 | 0.7627 | 0.5511 | 0.7709 |
| 13 | 0.0000 | 0.0000 | 0.0000 | −0.0046 | 0.4991 |
| 14 | 0.7101 | 0.6822 | 0.8093 | 0.6613 | 0.8530 |
| 15 | 0.7278 | 0.7567 | 0.8136 | 0.6616 | 0.8181 |
| 16 | 0.9863 | 0.9863 | 0.5593 | 0.4409 | 0.7015 |
| 17 | 0.7832 | 0.7825 | 0.8440 | 0.7082 | 0.8568 |
| 18 | 0.8039 | 0.8413 | 0.8622 | 0.7477 | 0.8575 |
| 19 | 0.7600 | 0.8327 | 0.8428 | 0.7255 | 0.8248 |
| 20 | 0.8409 | 0.8386 | 0.7410 | 0.5235 | 0.7572 |
| 21 | 0.9915 | 0.9875 | 0.2134 | 0.1867 | 0.6442 |
| 22 | 0.8910 | 0.8673 | 0.6372 | 0.4263 | 0.6733 |
| 23 | 0.8843 | 0.8994 | 0.8922 | 0.7893 | 0.8929 |
| 24 | 0.8035 | 0.8494 | 0.8610 | 0.7465 | 0.8538 |
| 25 | 0.3954 | 0.4186 | 0.5104 | 0.4109 | 0.7135 |
| 26 | 0.4584 | 0.5251 | 0.5643 | 0.4759 | 0.7012 |
| 27 | 0.9939 | 0.9923 | 0.0000 | −0.0025 | 0.7983 |
| 29 | 0.0750 | 0.0577 | 0.1077 | 0.0759 | 0.5635 |
| 30 | 0.5314 | 0.5304 | 0.5543 | 0.5208 | 0.7625 |
| 31 | 0.8713 | 0.8777 | 0.9147 | 0.8356 | 0.9145 |
| 32 | 0.3910 | 0.4190 | 0.5138 | 0.4098 | 0.7182 |
| 33 | 0.9049 | 0.9182 | 0.9373 | 0.8800 | 0.9330 |
| 34 | 0.4947 | 0.5548 | 0.6100 | 0.4859 | 0.7088 |
| 35 | 0.6880 | 0.7660 | 0.7744 | 0.6608 | 0.7926 |
| 38 | 0.9884 | 0.9869 | 0.4809 | 0.4049 | 0.6888 |
| avg. | 0.7402 | 0.7558 | 0.6391 | 0.5265 | 0.7594 |
| stdev. | 0.2552 | 0.2489 | 0.2520 | 0.2238 | 0.0991 |
| ID | Category | Descriptio (as in Play Store) | Permission Name (as in AndroidManifest.xml) | No. of Apps Requested 1 | Permission Description Details (as in https://developer.android.com/reference/android/Manifest.permission) 2 |
|---|---|---|---|---|---|
| 1 | Device and app history | retrieve running apps | android.permission.GET_TASKS | 8535 | Allows the app to retrieve information about currently and recently running tasks. This may allow the app to discover information about which applications are used on the device. |
| 2 | Wi-Fi connection information | view Wi-Fi connections | android.permission.ACCESS_WIFI_STATE | 101,727 | Allows the app to view information about Wi-Fi networking, such as whether Wi-Fi is enabled and the name of the connected Wi-Fi devices. |
| 5 | Storage | read the contents of your USB storage | android.permission.READ_EXTERNAL_STORAGE | 135,861 | Allows the app to test a permission for the SD card that will be available on future devices. |
| 6 | Storage | modify or delete the contents of your USB storage | android.permission.WRITE_EXTERNAL_STORAGE | 127,762 | Allows the app to write to the SD card. |
| 7 | Other | receive data from Internet | com.google.android.c2dm.permission.RECEIVE | 120,229 | Allows apps to accept cloud to device messages sent by the app’s service. Using this service will incur data usage. Malicious apps could cause excess data usage. |
| 8 | Other | full network access | android.permission.INTERNET | 207,864 | Allows the app to create network sockets and use custom network protocols. The browser and other applications provide means to send data to the internet, so this permission is not required to send data to the internet. |
| 9 | Other | prevent device from sleeping | android.permission.WAKE_LOCK | 172,854 | Allows the app to prevent the phone from going to sleep. |
| 10 | Other | disable your screen lock | android.permission.DISABLE_KEYGUARD | 3843 | Allows the app to disable the keylock and any associated password security. For example, the phone disables the keylock when receiving an incoming phone call, then re-enables the keylock when the call is finished. |
| 11 | Other | control vibration | android.permission.VIBRATE | 97,442 | Allows the app to control the vibrator. |
| 12 | Other | view network connections | android.permission.ACCESS_NETWORK_STATE | 198,058 | Allows the app to view information about network connections such as which networks exist and are connected. |
| 13 | Other | run at startup | android.permission.RECEIVE_BOOT_COMPLETED | 116,194 | Allows the app to have itself started as soon as the system has finished booting. This can make it take longer to start the phone and allow the app to slow down the overall phone by always running. |
| 14 | Microphone | record audio | android.permission.RECORD_AUDIO | 35,595 | Allows the app to record audio with the microphone. This permission allows the app to record audio at any time without your confirmation. |
| 15 | Location | approximate location (network-based) | android.permission.ACCESS_COARSE_LOCATION | 57,838 | Allows the app to obtain your approximate location. This location is derived by location services using network location sources such as cell towers and Wi-Fi. These location services must be turned on and available to your device for the app to use them. Apps may use this to determine approximately where you are. |
| 16 | Identity | add or remove accounts | android.permission.MANAGE_ACCOUNTS | 3259 | Allows the app to perform operations like adding and removing accounts, and deleting their password. |
| 17 | Camera | take pictures and videos | android.permission.CAMERA | 67,875 | Allows the app to take pictures and videos with the camera. This permission allows the app to use the camera at any time without your confirmation. |
| 18 | Other | close other apps | android.permission.RESTART_PACKAGES, android.permission.KILL_BACKGROUND_PROCESSES | 2838 | Allows the app to end background processes of other apps. This may cause other apps to stop running. |
| 19 | Other | pair with Bluetooth devices | android.permission.BLUETOOTH | 29,571 | Allows the app to view the configuration of the Bluetooth on the phone, and to make and accept connections with paired devices. |
| 20 | Other | change network connectivity | android.permission.CHANGE_NETWORK_STATE | 12,018 | Allows the app to change the state of network connectivity. |
| 21 | Other | use accounts on the device | android.permission.USE_CREDENTIALS | 5801 | Allows the app to request authentication tokens. |
| 22 | Other | connect and disconnect from Wi-Fi | android.permission.CHANGE_WIFI_STATE | 13,672 | Allows the app to connect to and disconnect from Wi-Fi access points and to make changes to device configuration for Wi-Fi networks. |
| 23 | Other | change your audio settings | android.permission.MODIFY_AUDIO_SETTINGS | 25,244 | Allows the app to modify global audio settings such as volume and which speaker is used for output. |
| 24 | Other | access Bluetooth settings | android.permission.BLUETOOTH_ADMIN | 14,764 | Allows the app to configure the local Bluetooth phone, and to discover and pair with remote devices. |
| 25 | Other | manage document storage | android.permission.MANAGE_DOCUMENTS | 1371 | Allows the app to manage document storage. |
| 26 | Other | read Google service configuration | com.google.android.providers.gsf.permission. READ_GSERVICES | 17,798 | Allows this app to read Google service configuration data. |
| 27 | Other | Google Play license check | com.android.vending.CHECK_LICENSE | 15,424 | Market license check |
| 29 | Device ID & call information | read phone status and identity | android.permission.READ_PHONE_STATE | 41,888 | Allows the app to access the phone features of the device. This permission allows the app to determine the phone number and device IDs, whether a call is active, and the remote number connected by a call. |
| 30 | Other | SmartcardService Permission label | org.simalliance.openmobileapi.SMARTCARD | 102 | Enables Android applications to communicate with Secure Elements, e.g. SIM card, embedded Secure Elements, Mobile Security Card or others. |
| 31 | Other | read battery statistics | android.permission.BATTERY_STATS | 630 | Allows an application to read the current low-level battery use data. May allow the application to find out detailed information about which apps you use. |
| 32 | Other | draw over other apps | android.permission.SYSTEM_ALERT_WINDOW | 23,800 | Allows the app to draw on top of other applications or parts of the user interface. They may interfere with your use of the interface in any application, or change what you think you are seeing in other applications. |
| 33 | Other | modify system settings | android.permission.WRITE_SETTINGS | 9642 | Allows the app to modify the system’s settings data. Malicious apps may corrupt your system’s configuration. |
| 34 | Other | send sticky broadcast | android.permission.BROADCAST_STICKY | 3112 | Allows the app to send sticky broadcasts, which remain after the broadcast ends. Excessive use may make the phone slow or unstable by causing it to use too much memory. |
| 35 | Other | allow Wi-Fi Multicast reception | android.permission. CHANGE_WIFI_MULTICAST_STATE | 6350 | Allows the app to receive packets sent to all devices on a Wi-Fi network using multicast addresses, not just your phone. It uses more power than the non-multicast mode. |
| 38 | Contacts | find accounts on the device | android.permission.GET_ACCOUNTS | 16,073 | Allows the app to obtain the list of accounts known by the phone. This may include any accounts created by applications you have installed. |
References
- Dagan, I.; Glickman, O.; Magnini, B. The pascal recognising textual entailment challenge. In Proceedings of the Machine Learning Challenges Workshop; Springer: Berlin/Heidelberg, Germany, 2005; pp. 177–190. [Google Scholar]
- Dagan, I.; Roth, D.; Zanzotto, F.; Sammons, M. Recognizing Textual Entailment: Models and Applications; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Felt, A.P.; Finifter, M.; Chin, E.; Hanna, S.; Wagner, D. A survey of mobile malware in the wild. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, Chicago, IL, USA, 17 October 2011; pp. 3–14. [Google Scholar]
- Lin, J.; Amini, S.; Hong, J.I.; Sadeh, N.; Lindqvist, J.; Zhang, J. Expectation and purpose: Understanding users’ mental models of mobile app privacy through crowdsourcing. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 501–510. [Google Scholar]
- Viennot, N.; Garcia, E.; Nieh, J. A measurement study of google play. In Proceedings of the 2014 ACM International Conference on Measurement and Modeling of Computer Systems, Austin, TX, USA, 16–20 June 2014; pp. 221–233. [Google Scholar]
- Feng, Y.; Chen, L.; Zheng, A.; Gao, C.; Zheng, Z. AC-Net: Assessing the consistency of description and permission in Android apps. IEEE Access 2019, 7, 57829–57842. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998. [Google Scholar]
- Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv 2020, arXiv:2004.10964. [Google Scholar]
- Liu, Y. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- He, P.; Liu, X.; Gao, J.; Chen, W. DeBERTa: Decoding-enhanced BERT with disentangled attention. arXiv 2020, arXiv:2006.03654. [Google Scholar]
- Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced language representation with informative entities. arXiv 2019, arXiv:1905.07129. [Google Scholar] [CrossRef]
- Zhou, J.; You, C.; Li, X.; Liu, K.; Liu, S.; Qu, Q.; Zhu, Z. Are all losses created equal: A neural collapse perspective. arXiv 2022, arXiv:2210.02192. [Google Scholar] [CrossRef]
- Gunel, B.; Du, J.; Conneau, A.; Stoyanov, V. Supervised contrastive learning for pre-trained language model fine-tuning. arXiv 2020, arXiv:2011.01403. [Google Scholar]
- Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. arXiv 2019, arXiv:1911.02855. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Ma, E.K. nlpaug: Data Augmentation for NLP. Available online: https://github.com/makcedward/nlpaug (accessed on 10 February 2025).
- He, P.; Gao, J.; Chen, W. DeBERTaV3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv 2021, arXiv:2111.09543. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar] [CrossRef]
- Wei, J.; Bosma, M.; Zhao, V.Y.; Guu, K.; Yu, A.W.; Lester, B.; Du, N.; Dai, A.M.; Le, Q.V. Finetuned language models are zero-shot learners. arXiv 2021, arXiv:2109.01652. [Google Scholar]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]




| ID | Value | ID | Value | ID | Value | ID | Value |
|---|---|---|---|---|---|---|---|
| 1 | 0.6862 | 12 | 0.6519 | 21 | 0.0000 | 31 | 0.5671 |
| 2 | 0.7265 | 13 | 0.4787 | 22 | 0.1589 | 32 | 0.2334 |
| 5 | 0.0000 | 14 | 0.7159 | 23 | 0.6813 | 33 | 0.4679 |
| 6 | 0.5931 | 15 | 0.6398 | 24 | 0.5899 | 34 | 0.0848 |
| 7 | 0.7439 | 16 | 0.1286 | 25 | 0.2000 | 35 | 0.4326 |
| 8 | 0.6000 | 17 | 0.6390 | 26 | 0.6667 | 38 | 0.0889 |
| 9 | 0.3671 | 18 | 0.7963 | 27 | 0.0000 | ||
| 10 | 0.2693 | 19 | 0.6561 | 29 | 0.2231 | ||
| 11 | 0.6217 | 20 | 0.3473 | 30 | 0.4211 |
| Model | P4 | MCC | AUROC |
|---|---|---|---|
| BERT + CE (Cross-Entropy Loss) | 0.0099 | 0.0451 | 0.0213 |
| BERT + SCL (Supervised Contrastive Loss) | −0.6063 | −0.4790 | −0.2397 |
| BERT + InfoNCE (Contrastive Loss) | −0.6824 | −0.6372 | −0.2882 |
| BERT + FL (Focal Loss) | −0.0031 | 0.0432 | 0.0095 |
| RoBERTa + CE (Cross-Entropy Loss) | 0.0407 | 0.0705 | 0.0282 |
| RoBERTa + SCL (Supervised Contrastive Loss) | −0.6970 | −0.5294 | −0.2483 |
| RoBERTa + InfoNCE (Contrastive Loss) | −0.6928 | −0.5611 | −0.2592 |
| RoBERTa + FL (Focal Loss) | −0.0184 | 0.0327 | 0.0016 |
| DeBERTa + CE (Cross-Entropy Loss) | 0.0831 | 0.1258 | 0.0672 |
| DeBERTa + SCL (Supervised Contrastive Loss) | −0.6970 | −0.5294 | −0.2483 |
| DeBERTa + InfoNCE (Contrastive Loss) | −0.5894 | −0.4479 | −0.2141 |
| DeBERTa + FL (Focal Loss) | 0.0403 | 0.1012 | 0.0480 |
| ERNIE + CE (Cross-Entropy Loss) | 0.0424 | 0.0769 | 0.0340 |
| ERNIE + SCL (Supervised Contrastive Loss) | −0.6799 | −0.6372 | −0.2949 |
| ERNIE + InfoNCE (Contrastive Loss) | −0.6970 | −0.5294 | −0.2483 |
| ERNIE + FL (Focal Loss) | −0.0153 | 0.0463 | 0.0194 |
| ID | Focal | SCL | InfoNCE | ID | Focal | SCL | InfoNCE |
|---|---|---|---|---|---|---|---|
| 1 | −0.0759 | −0.6991 | −0.6398 | 1 | −0.0541 | −0.6590 | −0.5308 |
| 2 | −0.0347 | −0.7452 | −0.6586 | 2 | −0.0646 | −0.5325 | −0.6421 |
| 5 | 0.0198 | −0.6892 | −0.5546 | 5 | 0.0381 | −0.5470 | −0.6031 |
| 6 | 0.0479 | −0.7282 | −0.5881 | 6 | 0.0598 | −0.6030 | −0.4833 |
| 7 | −0.0316 | −0.7126 | −0.6797 | 7 | −0.0394 | −0.5283 | −0.4834 |
| 8 | −0.0275 | −0.7612 | −0.7612 | 8 | −0.0544 | −0.5348 | −0.5348 |
| 9 | 0.1136 | −0.4814 | −0.5698 | 9 | 0.0836 | −0.4214 | −0.4749 |
| 10 | 0.0015 | −0.6388 | −0.6474 | 10 | −0.0040 | −0.5432 | −0.5699 |
| 11 | −0.0119 | −0.8759 | −0.8759 | 11 | −0.0266 | −0.8691 | −0.7595 |
| 12 | −0.0024 | −0.6787 | −0.6787 | 12 | 0.0125 | −0.4101 | −0.4619 |
| 13 | −0.0716 | −0.4156 | −0.3959 | 13 | −0.0975 | −0.3626 | −0.3628 |
| 14 | 0.0688 | −0.5888 | −0.6266 | 14 | 0.0786 | −0.3961 | −0.4397 |
| 15 | −0.0032 | −0.8082 | −0.6288 | 15 | 0.0137 | −0.6463 | −0.5279 |
| 16 | −0.0845 | −0.4611 | −0.3323 | 16 | −0.0500 | −0.3446 | −0.2564 |
| 17 | 0.0593 | −0.6350 | −0.6906 | 17 | 0.0620 | −0.5327 | −0.4790 |
| 18 | 0.0308 | −0.7791 | −0.6815 | 18 | 0.0503 | −0.9910 | −0.6563 |
| 19 | 0.0035 | −0.7996 | −0.5143 | 19 | −0.0100 | −0.6492 | −0.4310 |
| 20 | −0.0446 | −0.6560 | −0.8082 | 20 | −0.0770 | −0.5326 | −0.7291 |
| 21 | 0.1423 | 0.1106 | 0.0000 | 21 | 0.0987 | 0.0580 | −0.0862 |
| 22 | −0.0661 | −0.5829 | −0.6163 | 22 | −0.0180 | −0.4139 | −0.3803 |
| 23 | −0.0080 | −0.6674 | −0.8513 | 23 | −0.0096 | −0.8223 | −1.0030 |
| 24 | 0.0171 | −0.7652 | −0.8159 | 24 | 0.0316 | −0.8270 | −0.6616 |
| 25 | −0.3489 | −0.5218 | −0.5218 | 25 | −0.3139 | −0.4656 | −0.4656 |
| 26 | −0.0726 | −0.2671 | −0.2671 | 26 | −0.0580 | −0.2094 | −0.2094 |
| 27 | 0.0000 | 0.1683 | 0.0000 | 27 | −0.0017 | 0.0965 | 0.0000 |
| 29 | −0.0995 | −0.0995 | 0.0250 | 29 | −0.0488 | −0.0529 | 0.0453 |
| 30 | −0.2804 | −0.4430 | −0.4515 | 30 | −0.2467 | −0.3759 | −0.4019 |
| 31 | −0.0142 | −0.9065 | −0.7796 | 31 | −0.0175 | −0.8226 | −0.8298 |
| 32 | −0.1424 | −0.5861 | −0.5350 | 32 | −0.1029 | −0.4549 | −0.4254 |
| 33 | −0.0361 | −0.9530 | −0.9530 | 33 | −0.0645 | −0.9086 | −0.9949 |
| 34 | 0.0037 | −0.7110 | −0.7091 | 34 | −0.0152 | −0.5941 | −0.5947 |
| 35 | 0.0168 | −0.8283 | −0.6839 | 35 | 0.0296 | −0.8729 | −0.5803 |
| 38 | −0.1605 | −0.8276 | −0.5986 | 38 | −0.1261 | −0.8269 | −0.5426 |
| avg. | −0.0331 | −0.6071 | −0.5785 | avg. | −0.0285 | −0.5332 | −0.5017 |
| stdev. | 0.0969 | 0.2633 | 0.2391 | stdev. | 0.0860 | 0.2629 | 0.2343 |
| Model | MCC | AUROC | |
|---|---|---|---|
| RoBERTa + CE (Cross-Entropy Loss) (batch size = 4) | 0.0459 | 0.0513 | 0.0238 |
| RoBERTa + CE + Detailed Permission Description (batch size = 4) | 0.0532 | 0.0577 | 0.0305 |
| RoBERTa + SCL (Supervised Contrastive Loss) (batch size = 4) | −0.6447 | −0.5137 | −0.2520 |
| RoBERTa + SCL (Supervised Contrastive Loss) (batch size = 24) | −0.5748 | −0.4961 | −0.2755 |
| RoBERTa + InfoNCE Loss (batch size = 4) | −0.6447 | −0.5137 | −0.2445 |
| RoBERTa + InfoNCE Loss (batch size = 24) | −0.4431 | −0.4432 | −0.1592 |
| RoBERTa + InfoNCE Loss + Detailed Permission Description (batch size = 24) | −0.5075 | −0.4050 | −0.1797 |
| RoBERTa + FL (Focal Loss) (batch size = 4) | 0.0396 | 0.0428 | 0.0211 |
| RoBERTa + FL + Detailed Permission Description (batch size = 4) | 0.0509 | 0.0547 | 0.0282 |
| Model | MCC | AUROC | |
|---|---|---|---|
| DeBERTa + CE (Cross-Entropy Loss) (batch size = 4) | 0.1088 | 0.1186 | 0.0663 |
| DeBERTa + CE + Detailed Permission Description (batch size = 4) | 0.0632 | 0.0837 | 0.0377 |
| DeBERTa + CE (Cross-Entropy Loss) (batch size = 16) | 0.0753 | 0.0932 | 0.0472 |
| DeBERTa + CE + Detailed Permission Description (batch size = 16) | 0.0680 | 0.0781 | 0.0359 |
| DeBERTa + SCL (Supervised Contrastive Loss) (batch size = 4) | −0.5576 | −0.4464 | −0.2128 |
| DeBERTa + SCL (Supervised Contrastive Loss) (batch size = 16) | −0.6447 | −0.5137 | −0.2498 |
| DeBERTa + InfoNCE Loss (batch size = 4) | −0.5642 | −0.4519 | −0.2084 |
| DeBERTa + InfoNCE Loss (batch size = 16) | −0.4863 | −0.4842 | −0.2244 |
| DeBERTa + InfoNCE Loss + Detailed Permission Description (batch size = 16) | −0.6350 | −0.6241 | −0.2982 |
| DeBERTa + FL (Focal Loss) (batch size = 4) | 0.0922 | 0.1016 | 0.0564 |
| DeBERTa + FL + Detailed Permission Description (batch size = 4) | 0.0761 | 0.0928 | 0.0439 |
| Pretrained DeBERTa + CE (batch size = 4) | −0.2697 | −0.2375 | −0.1201 |
| Pretrained DeBERTa + CE (batch size = 8) | −0.0835 | −0.0987 | −0.0585 |
| Model | MCC | AUROC | |
|---|---|---|---|
| ERNIE + CE (Cross-Entropy Loss) (batch size = 4) | 0.0393 | 0.0439 | 0.0232 |
| ERNIE + CE + Detailed Permission Description (batch size = 4) | 0.0380 | 0.0469 | 0.0228 |
| ERNIE + CE (Cross-Entropy Loss) (batch size = 24) | 0.0482 | 0.0520 | 0.0175 |
| ERNIE + CE + Detailed Permission Description (batch size = 24) | 0.0511 | 0.0516 | 0.0247 |
| ERNIE + SCL (Supervised Contrastive Loss) (batch size = 4) | −0.6447 | −0.5137 | −0.2528 |
| ERNIE + SCL (Supervised Contrastive Loss) (batch size = 24) | −0.5201 | −0.4147 | −0.1984 |
| ERNIE + InfoNCE Loss (batch size = 4) | −0.5020 | −0.5220 | −0.2544 |
| ERNIE + InfoNCE Loss (batch size = 24) | −0.5003 | −0.5828 | −0.2915 |
| ERNIE + InfoNCE Loss + Detailed Permission Description (batch size = 24) | −0.6373 | −0.6404 | −0.3116 |
| ERNIE + FL (Focal Loss) (batch size = 4) | 0.0612 | 0.0658 | 0.0328 |
| ERNIE + FL + Detailed Permission Description (batch size = 4) | 0.0490 | 0.0475 | 0.0257 |
| Pretrained ERNIE + CE (batch size = 4) | −0.0166 | −0.0103 | −0.0087 |
| Pretrained ERNIE + CE (batch size = 24) | 0.0211 | 0.0251 | 0.0068 |
| MCC | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ID | SMOTE | SynAug 0.5% | SynAug 2% | SynAug 4% | ID | SMOTE | SynAug 0.5% | SynAug 2% | SynAug 4% |
| 1 | −0.0114 | −0.0747 | −0.1188 | −0.2298 | 1 | −0.0149 | −0.0715 | −0.1339 | −0.1796 |
| 2 | 0.0003 | N/A | N/A | N/A | 2 | 0.0025 | N/A | N/A | N/A |
| 5 | −0.1088 | 0.1185 | 0.0188 | 0.0995 | 5 | −0.0759 | 0.1515 | 0.0495 | 0.1271 |
| 6 | 0.0516 | 0.0479 | −0.0123 | 0.0471 | 6 | 0.0748 | 0.0598 | −0.0179 | 0.0717 |
| 7 | −0.0065 | −0.0843 | −0.0836 | −0.0369 | 7 | −0.0022 | −0.0518 | −0.0615 | −0.0396 |
| 8 | −0.0079 | N/A | N/A | N/A | 8 | 0.0022 | N/A | N/A | N/A |
| 9 | −0.1167 | 0.1037 | −0.0095 | 0.0599 | 9 | −0.1479 | 0.0791 | −0.0355 | 0.0343 |
| 10 | −0.0240 | −0.1059 | −0.0459 | −0.0865 | 10 | −0.0188 | −0.0757 | 0.0067 | −0.0342 |
| 11 | 0.0226 | N/A | N/A | N/A | 11 | 0.0412 | N/A | N/A | N/A |
| 12 | 0.0128 | N/A | N/A | N/A | 12 | 0.0649 | N/A | N/A | N/A |
| 13 | −0.1098 | −0.1100 | −0.0257 | −0.4156 | 13 | −0.1110 | −0.1285 | −0.0413 | −0.3626 |
| 14 | 0.0708 | −0.0162 | −0.1024 | −0.0156 | 14 | 0.0674 | −0.0060 | −0.0833 | −0.0032 |
| 15 | −0.0110 | −0.0340 | 0.0092 | −0.0059 | 15 | −0.0142 | 0.0043 | 0.0227 | 0.0048 |
| 16 | −0.0776 | 0.1245 | 0.0979 | 0.0980 | 16 | −0.0568 | 0.1152 | 0.0793 | 0.0980 |
| 17 | 0.0631 | −0.0933 | −0.0213 | −0.0060 | 17 | 0.0727 | −0.0397 | −0.0031 | 0.0084 |
| 18 | 0.0189 | 0.0232 | 0.0169 | 0.0458 | 18 | 0.0373 | 0.0411 | 0.0295 | 0.0802 |
| 19 | 0.0321 | −0.0899 | −0.0715 | −0.0542 | 19 | 0.0479 | −0.1171 | −0.1071 | −0.0493 |
| 20 | −0.0431 | N/A | N/A | N/A | 20 | −0.0656 | N/A | N/A | N/A |
| 21 | 0.3745 | 0.2996 | 0.0996 | 0.2996 | 21 | 0.2896 | 0.2896 | 0.0896 | 0.2896 |
| 22 | 0.0628 | −0.0026 | −0.0929 | −0.1442 | 22 | 0.0717 | −0.0141 | −0.0672 | −0.0742 |
| 23 | −0.0265 | N/A | N/A | N/A | 23 | −0.0549 | N/A | N/A | N/A |
| 24 | −0.0947 | −0.0038 | −0.0361 | −0.0955 | 24 | −0.1061 | −0.0027 | −0.0414 | −0.0980 |
| 25 | −0.2221 | −0.1411 | −0.2563 | −0.4688 | 25 | −0.1984 | −0.1583 | −0.2550 | −0.4492 |
| 26 | −0.0989 | −0.2671 | −0.0565 | 0.0431 | 26 | −0.0851 | −0.2094 | −0.0445 | 0.0286 |
| 27 | N/A | 0.0000 | 0.0000 | 0.0000 | 27 | N/A | −0.0017 | −0.0017 | −0.0017 |
| 29 | 0.0331 | −0.0995 | −0.0995 | −0.0995 | 29 | 0.0290 | −0.0426 | −0.0426 | −0.0470 |
| 30 | −0.2204 | −0.0847 | −0.1497 | −0.1049 | 30 | −0.1924 | −0.0179 | −0.0958 | −0.0542 |
| 31 | 0.0008 | −0.0321 | −0.0162 | −0.0160 | 31 | 0.0056 | −0.0457 | −0.0259 | −0.0270 |
| 32 | −0.0267 | −0.1066 | −0.0768 | −0.0783 | 32 | −0.0133 | −0.0873 | −0.0859 | −0.0767 |
| 33 | −0.0044 | −0.0303 | 0.0006 | −0.0084 | 33 | −0.0054 | −0.0452 | 0.0025 | −0.0131 |
| 34 | −0.0096 | −0.0431 | −0.1448 | −0.0315 | 34 | −0.0310 | −0.0766 | −0.1367 | −0.0426 |
| 35 | 0.0138 | 0.0067 | 0.0183 | 0.0151 | 35 | 0.0307 | 0.0145 | 0.0339 | 0.0430 |
| 38 | −0.1666 | 0.0096 | −0.1884 | −0.1970 | 38 | −0.1431 | 0.0184 | −0.1780 | −0.1917 |
| avg. | −0.0197 | −0.0254 | −0.0499 | −0.0514 | avg. | −0.0156 | −0.0155 | −0.0424 | −0.0355 |
| stdev. | 0.1046 | 0.1062 | 0.0802 | 0.1522 | stdev. | 0.0943 | 0.1000 | 0.0777 | 0.1424 |
| Permission Request Combination | IDs | ID | MCC | AUROC | |
|---|---|---|---|---|---|
| Same category and same majority class | 5, 6 | 5 | 0.0725 | 0.0861 | 0.0025 |
| 6 | −0.2179 | −0.1627 | −0.0403 | ||
| 7, 8 | 7 | 0.0333 | 0.0613 | 0.0153 | |
| 8 | 0.0144 | 0.0425 | 0.0200 | ||
| Unrelated categories but same majority class | 5, 7 | 5 | 0.0302 | 0.0905 | −0.0220 |
| 7 | −0.0167 | 0.0139 | −0.0245 | ||
| 5, 8 | 5 | 0.0810 | 0.1041 | 0.0305 | |
| 8 | −0.0027 | 0.0066 | 0.0009 | ||
| 6, 7 | 6 | −0.2955 | −0.2240 | −0.0232 | |
| 7 | −0.0071 | 0.0340 | −0.0173 | ||
| 6, 8 | 6 | −0.0302 | −0.0128 | 0.0085 | |
| 8 | −0.0366 | −0.0509 | −0.0162 | ||
| Same category but opposite majority classes | 5, 25 | 5 | −0.0405 | −0.0162 | −0.0739 |
| 25 | −0.2713 | −0.2741 | −0.1240 | ||
| 6, 25 | 6 | −0.0296 | 0.0185 | −0.0065 | |
| 25 | 0.0221 | −0.0408 | −0.0169 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, C.Y.T. Quantifying Privacy Risk of Mobile Apps as Textual Entailment Using Language Models. J. Cybersecur. Priv. 2025, 5, 111. https://doi.org/10.3390/jcp5040111
Ma CYT. Quantifying Privacy Risk of Mobile Apps as Textual Entailment Using Language Models. Journal of Cybersecurity and Privacy. 2025; 5(4):111. https://doi.org/10.3390/jcp5040111
Chicago/Turabian StyleMa, Chris Y. T. 2025. "Quantifying Privacy Risk of Mobile Apps as Textual Entailment Using Language Models" Journal of Cybersecurity and Privacy 5, no. 4: 111. https://doi.org/10.3390/jcp5040111
APA StyleMa, C. Y. T. (2025). Quantifying Privacy Risk of Mobile Apps as Textual Entailment Using Language Models. Journal of Cybersecurity and Privacy, 5(4), 111. https://doi.org/10.3390/jcp5040111

