Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps

Surma, Zaid Ahmad; Gowdar, Saiesha; Pandit, Harshvardhan J.

doi:10.3390/info15090551

Open AccessArticle

Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps

by

Zaid Ahmad Surma

^1,*,

Saiesha Gowdar

¹ and

Harshvardhan J. Pandit

^2,*

¹

School of Computing, Dublin City University, D09 V209 Dublin, Ireland

²

ADAPT Centre, Dublin City University, D09 V209 Dublin, Ireland

^*

Authors to whom correspondence should be addressed.

Information 2024, 15(9), 551; https://doi.org/10.3390/info15090551

Submission received: 15 August 2024 / Revised: 29 August 2024 / Accepted: 2 September 2024 / Published: 9 September 2024

Download

Browse Figures

Versions Notes

Abstract

This study investigates the effectiveness of Apple’s privacy labels, introduced in iOS 14, in promoting transparency around app data collection practices with respect to the GDPR. Specifically, we address two key research questions: (1) What special categories of personal data, as regulated by the GDPR, are collected and used by apps, and for which purposes? (2) What disparities exist between app-stated permissions and the apparent unnecessary data gathering across various categories in the iOS App Store? By analyzing a comprehensive dataset of 541,662 iOS apps, we identify common practices related to prevalent use of sensitive and special categories of personal data, revealing widespread instances of unnecessary data collection, misuse, and potential GDPR violations. Furthermore, our analysis uncovers significant inconsistencies between the permissions stated by apps and the actual data they gather, highlighting a critical gap in user privacy protection within the iOS ecosystem. These findings underscore the need for stricter regulatory oversight of app stores and the necessity of effective privacy notices to build accountability and trust and ensure transparency. This study offers actionable insights for regulators, app developers, and users towards creating secure and transparent digital ecosystems.

Keywords:

privacy; data protection; iOS; privacy labels; GDPR; smartphone apps; user privacy; sensitive information; special categories of personal data; regulatory compliance

1. Introduction

Privacy has become a major problem for society in modern times due to the pervasiveness of technology in our daily lives. While cellphones and associated apps have significantly improved communication and convenience, they have also given rise to serious concerns regarding the security of personal data. When people use a variety of apps, each requesting access to their personal information, the opaqueness of the regulations governing data usage is a serious worry. Privacy regulations are more important than ever since people are demanding more transparency and control over their personal data. Nevertheless, consumers frequently find it difficult to completely understand these activities due to the complex nature of typical privacy regulations. Privacy labels surfaced as a possible remedy in reaction to these issues. Privacy nutrition labels were first introduced by Kelley et al. [1] with the goal of providing a clear and succinct summary of privacy policies to improve users’ visual comprehension.

Apple introduced Apple Privacy Labels in the App Store in 2020, and Google released Data Safety Sections in the Google Play Store soon after. Apple launched its app privacy labels, which purportedly help users better understand an app’s privacy practices before they download the app on any Apple platform [2]. All software products available on the App Store now have privacy labels, even desktop applications. However, for the sake of this paper, we will only be discussing mobile apps. Apple divides data into fourteen categories, each with a unique name and icon, to make it easier for developers to summarize how an app handles private, sensitive data. These data types combine related or comparable pieces of information; for instance, the identifiers category includes User ID and Device ID [3]. Next, the privacy labels page groups the app’s data practices into three primary categories, each of which is displayed on a distinct card according to how it is used: “data used to track you”, “data linked to you”, and “data not linked to you” [4].

These labels are intended to make it easier for end users to understand how apps handle data [5]; they provide an alternative to wading through lengthy privacy rules that are rarely read. But just as they frequently do with privacy policies, there is a worry that users may ignore these new, potentially simplified privacy labels. This could result in a false sense of security or a lack of awareness regarding the impact on their privacy, which varies greatly from person to person. Furthermore, it is possible that developers may fail to disclose their true data practices if the labels comply with GDPR.

The EU General Data Protection Regulation (GDPR), in Article 9, defines special categories of data [6], which require higher protection due to their sensitive nature. These categories include data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation. The GDPR prohibits the processing of these categories unless one of the exceptions are fulfilled—such as asking for explicit consent. The GDPR, in Article 35 [7] also requires conducting a data protection impact assessment (DPIA) to assess the risks to the “rights and freedoms” of individuals that the processing of such data may produce.

Additionally, from the perspective of individuals, certain categories are considered particularly sensitive. These include data related to photos and videos, finance and navigation. Photos and videos can reveal a lot about a person’s private life, finance data include sensitive information about an individual’s economic activities and status, and navigation data can track a person’s location and movements. The importance of protecting such sensitive data highlights the need for effective privacy labels and regulations to ensure that users are fully aware of how their data are being collected and used.

The labels also bring up new research concerns, like examining the effectiveness, usefulness and usability of the labels in real-world settings and the discrepancy between the privacy choices that mobile app users have access to in permission managers and the disclosures made on the labels. However, there has not been much research conducted to determine how well people’s privacy concerns and inquiries are addressed by the content of the present mobile app privacy labels.

In light of the aforementioned findings, the following research questions were identified and are examined in this paper:

What sensitive categories of personal data, as might be understood by individuals, and special categories, as regulated by the GDPR, are collected and used by apps, and for which purposes?
What disparities exist between app-stated permissions and the apparent unnecessary data gathering across various app categories in the iOS App Store?

2. Background

Privacy labels on Apple’s App Store [8] provide users with clear information about the data collected by apps, including fourteen distinct types of data, such as identifiers, location data, contact info, user content and financial information. These data types, each represented with a unique name and icon, help users understand what personal information an app might collect. Data collected by apps are categorized into three main groups to clarify how personal information is handled (Figure 1):

Data Used to Track You: This includes data used for tracking user or device data across different companies for targeted advertising, such as browsing history or app usage patterns.

Data Linked to You: This involves data that can be directly tied to the user’s identity, including account, device or contact details like email addresses and phone numbers.

Data Not Linked to You: Covers anonymized data where user identity references have been removed, such as anonymized usage statistics.

Each category is designed to highlight the varying privacy risks associated with different types of data usage. The privacy labels also indicate the purpose behind data collection, which can range from analytics and app functionality to product personalization and advertising. Users can find detailed information about the collected data and their purposes by accessing the “details” section on the privacy labels page.

With the introduction of iOS 14.5, Apple required app developers to seek user consent for tracking through the App Tracking Transparency (ATT) framework, ensuring users have control over their privacy settings. Special Categories of Data Under GDPR Article 9 [9], certain personal data are classified as “special categories” due to their sensitive nature, requiring enhanced protection. These include data on racial or ethnic origin, political opinions, religious beliefs, genetic and biometric data, health information and data concerning sexual orientation.

Processing such data is generally prohibited unless explicit consent is obtained or if it is necessary for substantial public interest.

In contrast, “sensitive data” refers to personal data that, while not classified as special under the GDPR, still require protection due to its nature, such as financial data or contact details. The regulatory requirements for these data vary depending on the jurisdiction.

Special categories under the GDPR are strictly regulated due to their potential impact on individual rights and freedoms, whereas sensitive data, though important, do not fall under these stringent GDPR conditions unless they relate to the defined special categories. This distinction underlines that while all GDPR special categories are sensitive, not all sensitive data are categorized as special under the GDPR.

3. Literature Review and State of the Art

We cover four primary themes in this section’s discussion of material pertinent to our study: the literature about iOS privacy labels, iOS and Android label comparisons, whether or not privacy labels are helpful to users and why low-popularity apps have missing privacy labels.

3.1. iOS Privacy Labels

Understanding privacy labels is essential for users to make informed decisions. The four-level hierarchy of privacy labels provides a structured way to interpret the information presented. Users can start by checking the top-level privacy types to determine if an app collects any user data. If an app falls under the “No Data Collected” category, it means the app does not collect any user data [10]. However, numerous iOS applications continue to collect machine data that could be used to track users, according to Kollnig et al. (2022) [5]. According to a survey [5], 22.2% of apps said they did not gather user data, a claim that is frequently refuted by more investigation. Analysis revealed a disparity between stated and actual practices: 68.6% of these apps transferred data to a monitoring site upon first start, and 80.2% of these apps contained at least one tracker library. These apps, on average, had fewer tracking libraries and made less contact with tracking businesses than those that acknowledged collecting data; this indicates a notable lack of transparency in privacy procedures. Reference [3] also observed that on average, there were 37,450 newly added apps and 38,053 removed apps per week. During the end of the collection period, 60.5% of apps had a privacy label and the remaining 39.5% of apps did not have privacy labels. The author considered the apps that have privacy labels and noticed that they track user data, which has increased by 3733 apps on average each week for a total increase of 47,658 apps.

The number of apps that link data to users’ identities increased by 9169 apps on average each week for a total increase of 103,886 apps. But most new apps use the Data Not Collected privacy type. These increased on average 12,597 apps per week for a total increase of 125,333 apps [3]. Users have a difficult time finding apps that are truly concerned about their privacy, as seen by the limited percentage of these apps that were featured in the App Store charts. The discrepancy between apps’ stated permissions and their actual data collection practices offers a critical entry point for our research on unnecessary data gathering across various iOS App Store categories. This gap not only raises questions about the integrity of app privacy disclosures but also underscores the importance of scrutinizing the privacy practices of apps that claim not to collect data, revealing a crucial area for deeper investigation and analysis in the quest for genuine transparency and user protection in digital environments.

It is found that among free applications, 40.55% are used for developer advertising, 78.92% are used for collecting data and 36.41% are used for third-party advertising. Free applications perform 2.72 usage of linked data, which are anonymized, whereas paid applications perform 0.43 usage of linked data (Scoccia et al. 2022) [4]. Therefore, it can be said that paid applications have a better ratio than free applications on the Apple store. The results section from [11] shows that 51.6% of iOS applications did not have any privacy labels as of 2021. Though 35.5% of applications had already generated privacy labels, just 2.7% of iOS applications had developed privacy labels without the features of app updates (Li et al. 2022 [11]). Even, the changing rates appeared to be slower over time. The overall low level of adoption in terms of privacy labels makes this label system comparatively less useful for customers (users). The fact that “half of the iOS applications have only view labels” underscores a critical problem in how app developers approach the transparency and disclosure of their data collection practices. This situation is further compounded by developers’ passive attitude towards privacy [12] and a lack of awareness or misconceptions about how to create effective privacy labels [11]. Such attitudes and misunderstandings can lead to inadequate compliance with privacy regulations, including the General Data Protection Regulation (GDPR), which is especially critical when it comes to handling sensitive data categories.

3.2. How Developers Talk about Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit

Ref. [12] examines the discussions on personal data by Android developers on the /r/androiddev forum on Reddit, exploring how these discussions relate to user privacy. The paper employs qualitative analysis of 207 threads (4772 unique posts) to develop a typology of personal data discussions and identify when privacy concerns arise. The research highlights that developers rarely discuss privacy concerns in the context of specific app designs or implementation problems unless prompted by external events like new privacy regulations or OS updates. When privacy is discussed, developers often view these issues as burdensome, citing high costs with little personal benefit.

Ref. [12] suggest that privacy-related discussions are reactive rather than proactive among developers, who tend to address privacy concerns only when they are forced to do so by external factors. Risky data practices, such as sharing data with third parties or transferring data off the device, are frequently mentioned without corresponding discussions on privacy implications. The study concludes by offering recommendations for improving privacy practices, such as better communication of privacy rationales by Android OS and app stores, and encouraging more privacy-focused discussions in developer forums.

The research contributes to understanding the challenges developers face regarding privacy and highlights the need for better tools, guidance and community practices to support more privacy-conscious app development. The study underscores the importance of proactive privacy discussions and offers actionable suggestions for enhancing privacy practices within the Android development community.

3.3. Comparison of Privacy Labels in iOS and Google Store

There has been research that helps us to compare privacy labels in the Apple App Store and Google Play Store. Since these two are different platforms, privacy label policies also differ. A comparison between these two platforms can help the users understand how both platforms handle data collection and privacy practices. The comparison can also help users understand which platform can help users prioritize their privacy better.

In [10,13], the authors analyzed the privacy labels on the Apple App Store and Google Play Store to understand how mobile apps handle user privacy. Reference [10] compares the privacy labels on both platforms; however, ref. [13] deep-dives into the practices that are reported in privacy labels along with privacy label comparisons on both platforms. Considering both the research papers are based on a comparison of privacy labels on both platforms, it is observed that [13] provides more detailed insights based on app popularity, age rating and price, as well. Since the authors were performing the comparison between both platforms, it was essential first to identify the apps that were present on both platforms. In [10], the authors were only able to compare 822 apps that are present on both the platforms and the findings showed us that there were mismatches in terms of data types collected on both iOS and Apple iOS App Store. The precise location is collected more on iOS and platforms and less on Android platforms. The approximate location is collected more by Android than iOS. This tends to explain that iOS is more oriented towards the precise location and Android is oriented towards the approximate location. Another mismatch can be seen regarding device ID and user ID. When it comes to device ID, Android collects it more than iOS, and when it comes to user ID, then iOS collects more than Android.

The below figure shows the difference between data types that are collected for both platforms in [10].

As we can see in Figure 2, there is a big gap between precise location, coarse location, user ID and device ID. Also, when it comes to sensitive info, iOS has only 7 apps that match the disclosure, while the other 16 differ. In [13], the authors filtered out the data and performed a comparison analysis on 100k apps. The authors found that while comparing the Apple privacy labels with Google Play Store privacy labels, 60% of the cross-listed apps had at least one inconsistency. We further say that inconsistencies are highest for sensitive information, browsing history, and email or text message data types [13]. The authors also identified an inconsistency wherein developers report data collection for two different purposes. In an app called Twitch TV, the purpose of data type purchase history on Google Play Store is app functionality. On the iOS App Store, it is mentioned as analytics and personalization purposes.

The overall conclusion in Paper 8 suggests that Apple’s privacy label does not distinguish between data collection and sharing. Apple’s privacy label is more explicit about data practices like link ability, third-party advertising and tracking. In contrast, data safety sections lack these details but do inform the users about the safety of their data (data encryption) and the choices that they have with developers (data deletion option) [13]. In terms of security practices on Google Play Store, the authors revealed that 23% of apps do not provide any details of their security practices [12] and 65% of apps encrypt the data they collect or share while in transit [13]. It was also found that 42% declare data collection on the Google Play Store, while 58% do so on the iOS platform. In terms of popularity, we noticed that for the Google Play Store, 76% of the high-popularity apps have privacy labels while for low-popularity apps, only 42% have privacy labels. The app KineMaster—Video Editor, a video editing application with over 400M+ downloads on Google Play Store, claims not to collect any data in the Play Store, but on the App Store, it asserts the collection of sensitive data such as location and identifiers [13]. In terms of price, it was noticed that in Google Play Store, 68% of the paid apps have labels, whereas for free apps, only 46% have labels; when it comes to Apple iOS App Store, a similar trend was observed where paid apps have greater numbers of privacy labels compared to free apps [13]. Here, we can conclude that, apart from data privacy label comparison, [13] deep-dived into more granular details as compared to [10]. Both research papers showed the data practices performed on both platforms; however, the authors did cover whether the data practices that are disclosed by both platforms are in line with the GDPR. Also, refs. [10,13] did not provide any findings that would help us identify whether any of the apps were performing unnecessary data collection.

3.4. Do Privacy Labels Help Users?

It is very important to understand whether privacy labels can answer the questions users have. The main purpose for the introduction of privacy labels is to allow for users to know an app’s data practices. For the same reason, a research paper was published by Shikun Zhang and Norman Sadeh from Carnegie Mellon University [14] in 2023 to analyze whether privacy labels resolve users’ privacy questions. The authors used a corpus of questions that were published in a research paper [15] combining computational and legal perspectives. The authors first tried to understand the nature of those questions and transformed them into themes, ending with 18 themes and 67 codes. The study aimed to evaluate whether the questions asked by users regarding privacy can be answered using the privacy labels on Google Play Store and Apple iOS App Store. The authors analyzed each code to determine whether questions under each sub-theme could be answered using the privacy labels provided by Google or Apple. The authors of [14] found that the most common questions asked by users were in relation to data collection. Other themes comprised app security, data sharing, data selling, permissions and app-specific privacy features. it was found that approximately 40% of question themes could be answered by the labels. Google Play labels provided more coverage, addressing additional data types and security-related questions as compared to iOS labels. However, iOS labels provided more information regarding data selling practices. Several question themes, such as permissions, data retention, external access, account requirements and cookies policy, were not addressed by either iOS or Google Play labels. These themes represent areas where users’ privacy concerns may not be adequately addressed by current label designs. The findings provide insights into potential improvements needed in label design to better align with users’ mental models and address their privacy concerns effectively. Based on the study, we saw only 40% of the question themes could be answered by current privacy labels, which shows us that there is still room for improvement. Reference [14] indicates that there are significant gaps in addressing users’ privacy concerns; thus, we attempt to fill this research gap through an understanding of user perception. In our research, we explore the formats of privacy labels across different apps and identify inconsistencies. As a part of our research, we also employ campaigns/surveys that help us understand user perception and raise awareness among users in terms of understanding privacy labels, and empower users to make more informed decisions about their app usage.

3.5. Why Do Low-Popularity Apps Have Privacy Labels Missing?

One striking observation revealed in a research paper [13] shows us that for apps with high popularity, 76% of them had privacy labels; however, for apps with low popularity only 42% of them had privacy labels. While the popular apps provide robust privacy disclosure, a disparity exists where the apps with less popularity lack privacy label disclosures. Reference [13] makes clear that for less popular apps, half of the apps had privacy labels missing. This brings out the need to analyze the factors that influence privacy labelling practices with apps that are less popular. Reference [16], published in 2022, helps us understand the challenges faced by small enterprise businesses regarding privacy labels. The authors conducted three interviews with three SMEs from the retail, culture and media sectors. Two sessions were arranged for each SME where the objective of the first session was to introduce the SMEs to online SERIOUS tools, which helped in creating a sensible SERIOUS privacy label. The second session was based on an interview with each SME where the questions were based on the first session, use of privacy labels and facilitation of privacy label deployment.

The results of the second session highlighted some points: The first was that since the participants belonged to different sectors, so should they use separate privacy labels. The second point that was brought to light is that these SMEs did not have in-house capacity for privacy label generation and thus used online third parties. Also, it was noticed that the SMEs did not have a specific person responsible for handling privacy labels. The authors recommended that service providers develop automated systems, tools and architectures that help estimate privacy practices/labels based on the operational behavior of the corresponding service [16]. The authors mentioned that privacy labels and the tools used to develop them need to be adopted by both label-issuing enterprises and label-consuming parties. The authors also expressed the need for further research on the possible ways of enhancing label adoption in label-issuing enterprises.

The research paper shows us that there is still a need for better privacy labels. SMEs providing online services and applications need to offer transparency regarding each privacy practice. Moreover, there is a need to introduce a trusted party (third party) to monitor and supervise ongoing processes.

3.6. How Usable Are iOS Privacy Labels?

Reference [17] provides a comprehensive analysis of the usability and effectiveness of Apple’s iOS app privacy labels, introduced with iOS 14, within the broader landscape of privacy communication. These labels were designed as a more accessible alternative to traditional privacy policies, which are often criticized for their length, complexity and general lack of engagement from users. The study situates itself within the “Notice and Choice” framework of U.S. privacy law, which traditionally relies on users being informed and making choices based on detailed privacy policies.

The survey begins by reviewing the key literature on privacy notices, identifying essential criteria for effective privacy communication: readability, comprehensibility, salience, relevance and actionability. It highlights the shortcomings of traditional privacy policies and explores the emergence of alternative approaches, such as standardized and simplified notices, which aim to make privacy information more digestible and actionable for users.

Ref. [17] then focuses specifically on the concept of privacy labels, which are intended to function similarly to nutrition labels on food products, offering a concise summary of how an app handles user data. The authors in [17] discuss the theoretical potential of these labels to improve user understanding and control over their privacy, particularly in the mobile app ecosystem, where privacy concerns are especially pertinent.

To evaluate the real-world effectiveness of iOS privacy labels, the authors conducted an empirical study involving in-depth interviews with 24 iPhone users. The findings reveal a range of user experiences, with many participants expressing confusion and frustration with the labels. Despite their intended purpose, the labels often failed to clearly communicate the necessary information or to empower users to make informed privacy decisions. Common issues included misunderstandings of the labels’ content, perceived inconsistencies with app behaviors, and a general lack of actionable guidance.

The survey concluded by offering recommendations for enhancing the design and implementation of privacy labels. These included using clearer and more straightforward language, better integrating the labels with app permission settings, and refining the visual and interactive aspects of the labels to make them more user-friendly. The paper’s findings contribute to the ongoing discussion on improving privacy notice design, particularly in the context of mobile applications, and highlight the persistent challenges in creating privacy tools that effectively bridge the gap between technical information and user understanding.

3.7. Helping Mobile Application Developers Create Accurate Privacy Labels

Reference [18] explores the complex challenge of ensuring that mobile application developers can produce accurate privacy labels, a requirement introduced by Apple in December 2020. These privacy labels are designed to inform users about the data collection and sharing practices of apps they use. However, many developers struggle to complete these labels accurately due to a lack of expertise in privacy regulations and the complexities introduced by third-party software development kits (SDKs) and libraries, which are often integral to app development.

The authors identify several key obstacles that developers face when attempting to create these privacy labels. One significant issue is the difficulty in understanding the data collection behaviors of third-party components within their apps. These components can introduce data practices that developers might not be fully aware of, leading to inaccuracies in the privacy labels. Additionally, developers often lack the necessary privacy expertise to correctly interpret and implement the requirements for these labels, resulting in widespread inaccuracies that could have legal, regulatory, and reputational consequences.

To address these challenges, the authors developed and evaluated a tool called Privacy Label Wiz (PLW). PLW is an enhanced version of an earlier tool, Privacy Flash Pro, and is designed to help iOS developers create more accurate privacy labels by integrating static code analysis with interactive user prompts. The tool scans the app’s codebase to identify potential data collection practices and then guides the developer through a series of questions and prompts to clarify and confirm these practices. This process is intended to help developers better understand their apps’ data flows and ensure that the privacy labels they produce accurately reflect these practices.

Reference [18] details the iterative development process of PLW, which involves gathering feedback from semi-structured interviews with developers. These interviews provided valuable insights into the difficulties developers face and informed several key design decisions for PLW. For example, the tool was designed to integrate seamlessly into developers’ existing workflows, minimizing disruption and making it easier for developers to use it effectively. The authors also discuss the tool’s evaluation, which showed that PLW could significantly improve the accuracy of the privacy labels generated by developers.

In addition to describing the tool and its development, the paper makes several broader contributions to privacy engineering. It highlights the need for tools that are tailored to the specific challenges developers face when working with privacy regulations and underscores the importance of aligning these tools with typical software development practices. The paper concludes with suggestions for future work, including further refinement of tools like PLW and expanding support for other mobile platforms beyond iOS.

Overall, the study emphasizes the importance of providing developers with the right tools and resources to help them navigate the complexities of privacy regulations, thereby improving the accuracy of privacy labels and enhancing user trust in mobile applications.

3.8. Keeping Privacy Labels Honest

Reference [19] explores the effectiveness and reliability of Apple’s privacy labels, which were introduced in December 2020. These labels require app developers to disclose the types of data their apps collect and the purposes for which the data are used. The study primarily investigates whether these privacy labels accurately reflect the data collection practices of the apps and whether developers comply with these self-declared labels. The authors conducted an exploratory statistical analysis of 11,074 apps across 22 categories from the German App Store. They found that a significant number of apps either did not provide privacy labels or self-declared that they did not collect any data. A subset of 1687 apps was selected for a “no-touch” traffic collection study. This involved analyzing the data transmitted by these apps to determine if they matched the information disclosed in their privacy labels. The study revealed that at least 276 of these apps violated their privacy labels by transmitting data without declaring it. Reference [19] also assessed the apps’ compliance with the General Data Protection Regulation (GDPR), particularly regarding the display of privacy consent forms. Numerous potential violations of the GDPR were identified. The authors developed infrastructure for large-scale iPhone traffic interception and a system for automatically detecting privacy label violations through traffic analysis. Ref. [19] concluded that Apple’s privacy labels are often inaccurate, with many apps transmitting data not disclosed in their labels. The findings suggest that there is no validation of these labels during the Apple App Store approval process, leading to potential privacy violations and non-compliance with GDPR. The paper emphasizes the need for more rigorous enforcement and verification of privacy labels to protect users’ data effectively. The study provides a critical evaluation of the effectiveness of privacy labels and highlights significant gaps in their implementation and enforcement.

3.9. ATLAS: Automatically Detecting Discrepancies between Privacy Policies and Privacy Labels

Ref. [20] introduces a novel tool, the ATLAS (Automated Privacy Label Analysis System), which is designed to identify discrepancies between privacy policies and privacy labels in iOS apps using advanced natural language processing (NLP) techniques. The study reveals a concerning finding, that 88% of the apps analyzed with both available privacy policies and labels exhibit at least one discrepancy, with an average of 5.32 potential issues per app. These discrepancies often involve the types of data collected, the purposes for data use, and data-sharing practices, pointing to significant gaps between what apps disclose in their privacy labels and what is outlined in their privacy policies.

The ATLAS serves as a critical resource for developers, regulators and researchers, providing a way to automatically detect and address these inconsistencies, thereby improving privacy transparency and compliance in the mobile app ecosystem. The study highlights the potential of the ATLAS to enhance user trust by ensuring that privacy labels accurately reflect the practices detailed in privacy policies, thus supporting regulatory efforts to protect user privacy.

In conclusion, ref. [20] underscores the importance of addressing the identified discrepancies to improve the accuracy of privacy disclosures in mobile apps. The researchers suggest that the ATLAS could be further developed to cover more platforms and languages, potentially broadening its impact. They also call for stronger regulatory oversight to ensure that privacy labels are not just a formality but a true reflection of an app’s data practices. The authors believe that by using tools like the ATLAS, the industry can move towards greater transparency and accountability, ultimately fostering a more privacy-respecting digital environment.

4. Study Design

The main aim of our study is to explore how iOS apps on the Apple App Store handle special, sensitive data, providing insights that can aid regulators, app store management and users in making better-informed decisions. This study consists of 541,662 apps published on iOS app store as of November 2023. The goal of this study is to analyze privacy labels to identify any limitation they pose.

Initially, we conducted a literature review to identify existing research papers on privacy labels. Although we found relevant studies, none of them included attached data. This led us to contact other academics who were also collecting data on the Apple App Store. These academics provided us with data in JSON format, collected in November 2023, encompassing privacy information for 541,662 iOS apps. Each app’s privacy details are stored in separate JSON files, specifying the categories of data collected. To promote further research, we will upload these data to GitHub in the interest of reproducibility and further reuse of the data.

The JSON files we received comprised 541,662 individual JSON files, one for each app. We developed a Python function to extract information from these JSON files and convert it into a structured format suitable for analysis, combining respective fields such as data linked to you, data not linked to you and data tracking you. This process involved parsing each JSON file to identify various privacy-related aspects, such as the types of data collected (e.g., location, contact info), the purposes of data collection (e.g., analytics, app functionality) and the categories of data usage (e.g., data used to track you, data linked to you, data not linked to you). We then used binary encoding to indicate the presence or absence of specific data types and purposes within each app’s privacy label. For instance, if an app collected location data, it was encoded as “1”; otherwise, it was encoded as “0”. Similarly, data used for analytics were encoded as “1” if applicable. The encoded data were then stored in a .csv file for further analysis.

The choice of binary encoding was driven by its simplicity and clarity, which aids in straightforward statistical analyses and visualizations, and ensures consistency across the dataset, making comparisons between different apps and categories feasible. Additionally, this method enhances analytical flexibility, allowing for various analytical techniques such as frequency analysis, cross-tabulations, and visual representations. This comprehensive approach enables a thorough examination of data practices. Lastly, the encoded data in the .csv file were stored in an SQLite database, which facilitated efficient querying and management, helping aggregate data and extract meaningful insights regarding privacy practices across different app categories.

The collected and structured data were then analyzed by identifying specific queries through analysis of the literature and performing a requirements analysis. In this exercise, we analyzed the information available to us in the app store and the privacy label, and formulated questions whose answers would illuminate the state of data collection and privacy within the app’s use of data and the impact on privacy. To ensure the questions were diverse and reflected a well-grounded research approach, we compared them with the existing literature that analyzed iOS apps to identify which questions were repeated (i.e., prior work that had also investigated them) and which were novel. For the questions that were repeated, we sought to validate existing analysis in terms of findings, and for novel ones, we attempted to formulate theories as to their reason for occurrences and impacts.

5. Analysis and Results

The results of our analysis are shown in this section, arranged in accordance with the relevant research question. Table 1 shows us the information related to our dataset.

5.1. RQ1: Use of Sensitive and Special Categories

5.1.1. “Data Linked to You” Using Sensitive and Special Categories

Based on the GDPR, we found that the “medical” and “health and fitness” categories contain special categories of personal data, while the “finance”, “photo and video” and “navigation” categories contain sensitive categories of personal data. Here is a detailed breakdown of the data usage within these categories:

Medical: In this category (Table 2), data linked to users is distributed across various purposes, with the majority (83.4%) being used for developer advertising. This significant portion, amounting to 109,917 apps, highlights the emphasis on supporting in-app advertisements and marketing efforts by developers. Additionally, 9.6% (12,610 apps) of the data ensures app functionality, crucial for the app’s operational effectiveness. Analytics data account for 3.3% (4373 apps), used to understand user interaction and improve the app experience. Minor portions are dedicated to other purposes (1.0%, 1340 apps), product personalization (2.3%, 3089 apps) and third-party advertising (0.4%, 463 apps), ensuring a tailored user experience and supporting external marketing.

Health and Fitness: In contrast, the health and fitness category demonstrates a more balanced data distribution. App functionality is the largest segment, comprising 53.1% (65,535 apps) of the data, emphasizing the importance of maintaining a smooth and effective service. Analytics data make up 21.3% (26,242 apps), crucial for monitoring and enhancing user interaction and performance. Product personalization, accounting for 12.3% (15,149 apps), plays a significant role in tailoring the user experience. Developer advertising constitutes 9.4% (11,588 apps), and other purposes take up 1.9% (2290 apps). Third-party advertising is relatively minimal, at 2.1% (2609 apps), indicating a lesser focus on external marketing compared to internal app functionality and personalization efforts.

In comparison to these special categories, the sensitive categories of finance, photo and video, and navigation exhibit different patterns of data usage.

Finance: In the finance category (Table 3), data linked to users are distributed as follows: 47.87% (46,201 apps) ensures app functionality, highlighting the critical need for operational effectiveness. Developer advertising constitutes 13.26% (12,789 apps), supporting in-app advertisements and marketing efforts. Analytics data account for 19.25% (18,580 apps), used to understand user interaction and improve the app experience. Other purposes cover 3.83% (3700 apps), while product personalization takes up 13.86% (13,379 apps) to tailor the user experience. Third-party advertising represents a smaller portion at 1.93% (1861 apps).

Photo and Video: In this category, the data distribution is as follows: App functionality comprises 34.97% (3048 apps), essential for the app’s smooth operation. Developer advertising accounts for 7.33% (639 apps), and analytics to enhance user interaction make up 26.54% (2313 apps). Other purposes represent 2.93% (255 apps), while product personalization is 9.66% (842 apps). Third-party advertising constitutes a significant portion at 18.56% (1618 apps), indicating a strong emphasis on external marketing.

Navigation: In the navigation category, data usage is primarily focused on app functionality, which makes up 51.56% (4519 apps), reflecting the importance of operational efficiency. Developer advertising accounts for 6.32% (554 apps), and analytics for performance enhancement constitute 18.74% (1643 apps). Other purposes cover 3.62% (317 apps), while product personalization represents 12.29% (1077 apps). Third-party advertising is relatively small at 7.47% (655 apps).

In summary, while both special categories (medical, and health and fitness) and sensitive categories (finance, photo and video, and navigation) prioritize app functionality and developer advertising, their focus and distribution patterns vary. The medical category is heavily skewed towards advertising, with minimal emphasis on personalization and other purposes. The health and fitness category balances its data usage across functionality, analytics and personalization, with a smaller proportion dedicated to advertising. In contrast, the sensitive categories like finance, photo and video, and navigation exhibit a more varied distribution, reflecting different operational, marketing and user engagement strategies employed by apps in these categories. These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.

In the landscape of iOS app categories, medical apps and health and fitness apps stand out for their notably abundant data collection practices. These categories gather significantly more data compared to other app categories such as finance, navigation, and photo and video. This trend suggests a strong emphasis on analytics, app functionality, and advertising, leading to privacy concerns among users. The findings reveal that medical apps and health and fitness apps collect significantly more data—almost 10–15 times more than navigation apps and photo and video apps. Such extensive data collection raises serious privacy concerns, underlining the need for app developers to adopt transparent data usage policies and enhance user consent mechanisms. Users are advised to exercise caution with permissions and settings when using these apps due to their high data collection rates.

5.1.2. “Data Not Linked to You” Using Sensitive and Special Categories

For data not linked to you, we used the same approach as outlined in Section 5.1.1 to identify special and sensitive categories of data.

Medical Category: In the medical category (Table 4), data not linked to users are distributed across various purposes. The majority (47.09%) are used for app functionality, amounting to 6042 apps. Analytics data account for 35.10% (4505 apps), used to understand user interaction and improve the app experience. Minor portions are dedicated to other purposes (5.67%, 728 apps), product personalization (7.64%, 980 apps) and third-party advertising (2.43%, 312 apps). Developer advertising constitutes the smallest portion (2.06%, 264 apps). The total number of data in the apps is 12,831, with app functionality being the highest count at 6042.

Health and Fitness category: In the health and fitness category, data not linked to users are distributed with app functionality taking the largest share at 46.61% (16,631 apps). Analytics data account for 32.63% (11,641 apps), used for monitoring and enhancing user interaction. Product personalization makes up 8.72% (3110 apps) and developer advertising constitutes 3.29% (1173 apps). Other purposes cover 4.08% (1454 apps), and third-party advertising represents 4.68% (1670 apps). The total number of data in the apps is 35,679, with app functionality being the highest count at 16,631.

In comparison to these special categories, the sensitive categories in finance, photo and video, and navigation show following patterns:

Finance Category: In the finance category (Table 5), data not linked to users are distributed as follows: 45.09% (13,681 apps) ensure app functionality, highlighting the critical need for operational effectiveness. Analytics data constitute 32.23% (9780 apps), used to understand user interaction and improve the app experience. Developer advertising accounts for 3.20% (972 apps), while other purposes cover 9.01% (2732 apps). Product personalization takes up 7.05% (2140 apps), and third-party advertising represents a smaller portion at 3.42% (1037 apps). The total number of data in the apps is 30,342, with app functionality being the highest count at 13,681.

Photo and Video: In the photo and video category, the data distribution is as follows: App functionality comprises 31.61% (3579 apps) of the total. Analytics data make up 37.26% (4218 apps), highlighting their importance in understanding user behavior. Developer advertising accounts for 5.16% (584 apps), while other purposes cover 3.19% (361 apps). Product personalization represents 6.00% (679 apps), and third-party advertising constitutes 16.79% (1900 apps). The total number of data in the apps is 11,321, with analytics being the highest count at 4218.

Navigation: In the navigation category, data not linked to users are mainly focused on app functionality, which makes up 45.61% (3149 apps). Analytics data represent 33.14% (2288 apps), crucial for improving app performance. Developer advertising accounts for 2.74% (189 apps), and other purposes cover 4.01% (277 apps). Product personalization constitutes 6.68% (461 apps), while third-party advertising represents 7.82% (540 apps). The total number of data in the apps is 6904, with app functionality being the highest count at 3149.

While both special categories (medical, and health and fitness) and sensitive categories (finance, photo and video, and navigation) prioritize app functionality and developer advertising, their focus and distribution patterns vary. The medical category is more focused on app functionality and analytics, with minimal emphasis on advertising and other purposes. The health and fitness category balances its data usage across functionality, analytics and personalization, with a smaller proportion dedicated to advertising. In contrast, the sensitive categories like finance, photo and video, and navigation exhibit a more varied distribution, reflecting different operational, marketing and user engagement strategies employed by apps in these categories. These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.

5.1.3. “Data Used to Track You” Using Sensitive and Special Categories

Based on the GDPR, we analyzed the data used to track users across various categories. Here is a detailed breakdown of the data usage within these categories:

Medical Category: In the medical category (Table 6), data used to track users are distributed across various purposes. Identifiers make up the largest portion with 27.18% (296 apps). Usage data follow closely with 26.91% (293 apps), crucial for understanding user interaction. Diagnostics data account for 11.84% (129 apps), while contact information constitutes 10.28% (112 apps). Location data are 10.19% (111 apps), and other data types make up 3.95% (43 apps). Minor portions are dedicated to user content (2.30%, 25 apps), purchases (2.02%, 22 apps), search history (1.28%, 14 apps), sensitive information (1.10%, 12 apps), browsing history (0.92%, 10 apps), financial information (0.83%, 9 apps), health and fitness (0.64%, 7 apps) and contacts (0.55%, 6 apps). The total number of data in the apps is 1089.

Health and Fitness Category: In the health and fitness category, identifiers represent the largest segment at 30.14% (1502 apps), followed by usage data at 28.27% (1409 apps). Diagnostics data constitute 11.44% (570 apps) and contact information accounts for 10.96% (546 apps). Location data make up 9.37% (467 apps), while purchases are 3.97% (198 apps). Other data types represent 1.59% (79 apps), user content is 1.34% (67 apps), health and fitness data are 0.82% (41 apps), sensitive information is 0.60% (30 apps), search history is 0.58% (29 apps), browsing history is 0.46% (23 apps), financial information is 0.26% (13 apps) and contacts are 0.20% (10 apps). The total number of data in the apps is 4984.

Photo and Video: In the photo and video category (Table 7), identifiers constitute the largest portion at 36.04% (973 apps). Usage data follow with 34.48% (931 apps). Diagnostics data make up 12.07% (326 apps) and location data are 9.26% (250 apps). Purchases are 2.89% (78 apps), other data types account for 2.22% (60 apps), contact information is 1.59% (43 apps) and user content is 0.74% (20 apps). Browsing history constitutes 0.41% (11 apps), contacts are 0.15% (4 apps), search history is 0.11% (3 apps), sensitive information is 0.04% (1 apps) while financial information and health and fitness data are not tracked (0 apps). The total number of data in the apps is 2700.

Finance Category: In the finance category, identifiers represent the largest segment at 30.36% (857 apps), followed by usage data at 29.86% (843 apps). Diagnostics data make up 11.19% (316 apps), and location data are 10.55% (298 apps). Contact information constitutes 8.96% (253 apps), user content is 2.16% (61 apps) and other data types are 2.05% (58 apps). Financial information represents 1.77% (50 apps), purchases are 1.31% (37 apps), browsing history is 0.60% (17 apps), contacts are 0.60% (17 apps), search history is 0.46% (13 apps), sensitive information is 0.07% (2 apps) and health and fitness data are minimal at 0.04% (1 apps). The total number of data in the apps is 2823.

Navigation Category: In the additional navigation category, identifiers make up the largest portion with 28.65% (310 apps). Usage data follow with 25.23% (273 apps), and location data are 20.89% (226 apps). Diagnostics data constitute 11.37% (123 apps), while contact information represents 5.82% (63 apps). Other data types make up 3.42% (37 apps), purchases are 1.38% (15 apps), search history is 1.20% (13 apps) and user content is 0.92% (10 apps). Browsing history, contacts and financial information each constitute 0.37% (4 apps), and health and fitness and sensitive information data are not tracked (0 apps). The total number of data in the apps is 1082.

The analysis reveals distinct patterns in data usage for tracking users across different categories. In the special categories, the medical category primarily uses identifiers and usage data, reflecting a focus on personal identification and user interaction. Similarly, the health and fitness category is dominated by identifiers and usage data, with additional emphasis on diagnostics and contact information to monitor and enhance user experience. In the sensitive categories, finance relies heavily on identifiers and usage data, essential for secure and efficient financial transactions, with significant roles for diagnostics and location data to ensure operational efficiency. The photo and video category also sees identifiers and usage data as predominant, highlighting the need for user identification and interaction tracking, while diagnostics and location data support app performance and user engagement. The additional finance category mirrors this distribution, with identifiers and usage data as the largest segments, crucial for secure financial operations and user experience monitoring. These differences highlight the distinct approaches taken by apps in managing and utilizing user data to meet their specific operational and marketing objectives.

5.2. RQ2: Disparity between App-Stated Permissions and Apparent Unnecessary Data Gathering

To address RQ2, we began by analyzing a comprehensive set of app categories available on the iOS App Store. Our study encompasses a total of 25 distinct app categories, including books, music, travel, social networking, shopping, games, entertainment, reference, medical, lifestyle, sports, finance, education, business, news, navigation, health and fitness, photo and video, utilities, productivity, food and drink, graphics and design, weather, magazines and newspapers, and developer tools. This broad classification enables us to explore data tracking practices across a wide array of application types, providing a thorough examination of how different categories justify or do not justify their data collection practices. By considering such a diverse range of app categories, we aim to gain a nuanced understanding of data tracking trends and justifications within the mobile app ecosystem.

Categories such as sports, education, books, medical, business, news, utilities, reference, productivity, graphics and design, magazines and newspapers, and developer tools typically do not require location tracking to deliver their core functionalities. Therefore, we specifically examined the data tracking practices for apps within these categories using the dataset from the iOS App Store’s “Data Used to Track You” feature. The rationale behind excluding location tracking for these categories is that their primary functions are not inherently dependent on geographic information. For instance, an app designed for productivity or reference purposes does not need to access a user’s location to offer its services effectively. By focusing on these categories, we aimed to identify and understand the types of data that are being collected and assessed whether such practices align with the actual needs of the app’s functionality.

Additionally, we also investigated categories such as travel, social networking, entertainment, navigation, health and fitness, photo and video, and weather. Although location tracking may seem justifiable for some of these categories, we assessed them to determine whether it was indeed necessary or overextended. Our analysis aimed to reveal the extent and nature of data tracking within these categories to understand better how often location data are collected and whether their use aligns with the intended functionality of the apps.

5.2.1. Justification for Tracking

Books: Justified data tracking includes usage data, identifiers, purchases and diagnostics. However, unjustified tracking includes location (18.83%), contact info (8.83%), other data (3.83%), browsing history (2%), sensitive info (0.17%) and financial info (0.17%). The high percentage of location tracking is especially concerning, as book apps typically do not need location data to function effectively.

Music: Justified data tracking consists of usage data, identifiers, user content, purchases and diagnostics. However, the high percentages of location (38.17%) and contact info (5.52%) tracking are unjustified, given that music apps generally do not need such data. Additionally, other data (3.27%), browsing history (1.52%), search history (1.31%), contacts (0.15%), financial info (0.07%) and sensitive info (0.07%) also appear to be tracked unnecessarily.

Games: Justified data tracking consists of usage data, identifiers, contact info, user content, purchases, and diagnostics. However, the tracking of location (33.69%), other data (16%), browsing history (0.68%), search history (0.40%), financial info (0.24%), contacts (0.09%), sensitive info (0.06%) and health and fitness (0.04%) is largely unjustified for game apps.

Reference: Justified data tracking consists of usage data, identifiers, contact info, user content, purchases and diagnostics. However, tracking location (40.35%), other data (3.53%), browsing history (1.56%), search history (1.14%), sensitive info (0.31%) and contacts (0.21%) is unwarranted.

Medical: Justified data tracking includes health and fitness, location, contacts, identifiers, usage data, diagnostics and sensitive info. Unjustified tracking includes location (19.51%), other data (7.56%), user content (4.39%), browsing history (1.76%) and financial info (1.58%).

Lifestyle: Justified data tracking consists of usage data, identifiers, contact info, purchases and diagnostics. Unjustified tracking involves location (21.50%), other data (3.48%), browsing history (2.48%), search history (2.31%), financial info (1.23%), sensitive info (0.66%), contacts (0.63%) and health and fitness (0.11%).

Sports: Justified data tracking includes usage data, identifiers, contact info, purchases, diagnostics, and health and fitness. Unjustified tracking includes location (28.38%), other data (3.54%), browsing history (2.97%), search history (0.95%), financial info (0.32%), sensitive info (0.32%) and contacts (0.19%).

Finance: Justified data tracking consists of financial info, usage data, identifiers, contact info, purchases and diagnostics. Unjustified tracking involves location (19.91%), other data (3.87%), browsing history (1.14%), contacts (1.14%), search history (0.87%), sensitive info (0.13%) and health and fitness (0.07%).

Education: Justified data tracking includes usage data, identifiers, contact info, purchases, diagnostics, other data, and health and fitness. Unjustified tracking includes location (28.25%), other data (6.30%), browsing history (1.40%), search history (0.75%), financial info (0.25%), contacts (0.22%) and sensitive info (0.10%).

Business: Justified data tracking consists of usage data, identifiers, contact info, purchases, diagnostics and other data. Unjustified tracking involves location (26.39%), other data (5.46%), search history (2.89%), browsing history (2.48%), financial info (1.26%), contacts (0.95%), sensitive info (0.41%) and health and fitness (0.18%).

News: Justified data tracking includes usage data, identifiers, contact info and other data. Unjustified tracking includes location (24.54%), browsing history (2.67%), other data (2.33%), search history (1.39%), contacts (0.54%), financial info (0.10%) and sensitive info (0.05%).

Utilities: Justified data tracking includes usage data, identifiers, diagnostics, other data and contacts. Unjustified tracking includes location (22.37%), other data (4.95%), financial info (0.67%) and search history (0.67%).

Productivity: Justified data tracking consists of usage data, identifiers, diagnostics, other data and contacts. Unjustified tracking involves location (24.59%), other data (4.31%), purchases (3.25%), browsing history (1.06%), search history (0.69%), financial info (0.37%), sensitive info (0.37%) and health and fitness (0.11%).

Graphics and Design: Justified data tracking consist of usage data, identifiers, diagnostics, user content and location. Unjustified tracking involves location (13.08%), purchases (6.27%), other data (5.72%), user content (1.36%), search history (0.54%) and financial info (0.27%).

Magazines and Newspaper: Justified data tracking includes location, usage data, identifiers, diagnostics and user content. However, unjustified tracking involves location (13.81%), purchases (6.67%), search history (3.81%), user content (1.90%), other data (0.95%) and browsing history (0.48%).

Developer tools: Justified data tracking consists of usage data, identifiers, diagnostics, user content and contacts. However, unjustified tracking involves location (14.89%), other data (4.26%), search history (2.13%), user content (2.13%) and contacts (2.13%).

Travel: Justified data tracking includes location, usage data, identifiers, purchases, contact info, user content and diagnostics. Despite this, other data (3.82%), search history (2.57%), browsing history (1.45%), contacts (1.18%) and sensitive info (0.20%) are tracked beyond what is typically needed for travel-related functionality.

Social Networking: Justified data tracking consists of location, contact info, identifiers, user content, search history, usage data, purchases, diagnostics and contacts. However, the presence of other data (5.25%), financial info (0.91%), sensitive info (0.76%), browsing history (0.76%) and health and fitness (0.23%) tracking is concerning and seems unnecessary.

Shopping: Justified data tracking includes location, contact info, identifiers, user content, search history, usage data, purchases, financial info and diagnostics. Unjustified tracking includes contacts (5.18%), browsing history (2.22%), other data (2.06%), sensitive info (0.32%) and health and fitness (0.04%), which are not essential for shopping apps.

Entertainment: Justified data tracking includes usage data, identifiers, contact info, user content, purchases, diagnostics and location. The tracking of other data (3.92%), browsing history (2.76%), search history (1.31%), financial info (0.46%), contacts (0.23%), sensitive info (0.12%) and health and fitness (0.04%) appears to be unnecessary.

Navigation: Justified data tracking consists of location, usage data, identifiers and diagnostics. Unjustified tracking involves other data (7.52%), search history (2.64%), financial info (0.81%) and contacts (0.81%).

Health and Fitness: Justified data tracking includes health and fitness, usage data, identifiers, diagnostics, location and sensitive info. Unjustified tracking includes financial info (0.53%), other data (3.21%), contacts (0.41%) and search history (1.18%).

Photo and Video: Justified data tracking consists of usage data, identifiers, diagnostics, user content and location. Unjustified tracking involves other data (4.28%), contacts (0.29%), search history (0.21%) and sensitive info (0.07%).

Food and Drink: Justified data tracking includes purchases, usage data, location, identifiers and diagnostics. Unjustified tracking includes other data (1.48%), browsing history (3.67%), contacts (0.19%) and sensitive info (0.15%).

Weather: Justified data tracking includes location, usage data, identifiers, diagnostics and user content. Unjustified tracking includes other data (2.60%), purchases (2.60%), contacts (2.02%), search history (0.58%) and browsing history (0.58%).

Stickers: We noticed that nothing is tracked for the stickers app category.

5.2.2. High Incidence of Location Tracking

Categories with High Incidence: Sports, education, books, medical, business, news, utilities, reference, productivity, graphics and design, magazines and newspapers, and developer tools.
Observation: In these categories, location tracking seems largely unjustified, as the core functionalities of these apps typically do not require location data.

5.2.3. Tracking Using “Other Information”

Categories Affected: In categories such as travel, social networking, entertainment, navigation, health and fitness, photo and video, and weather, “other information” is the most frequently tracked data type.
Concern: The term “other information” is vague and non-specific, raising concerns about the transparency and necessity of the data being collected.

5.3. Identified Issues in Privacy Labels and App Listings

5.3.1. Missing Privacy Policy URL

Out of a total of 541,662 iOS apps analyzed, 237 apps were identified to have a significant gap in their privacy labels. Specifically, these 237 apps are categorized under “data linked to you” and “data tracking you,” yet they lack the mandatory privacy policy URLs. This omission raises several critical concerns regarding user privacy and compliance with regulatory standards.

5.3.2. Missing Privacy Labels

Among the 541,662 iOS apps that were examined, 83,618 were found to have no privacy labels at all. This significant discrepancy suggests that user data management procedures lack openness and compliance on a widespread basis. To promote openness and build user trust, privacy labels are crucial for telling consumers about the data that apps collect and how they are used.

6. User Survey on App Usage and Privacy Concerns

To identify how our analysis of the privacy labels corresponded to actual use of apps by individuals, we created a user survey to identify which apps are commonly used by people and what privacy concerns they have. Through this survey, we hoped to identify which of our analyzed privacy labels—and their shortcomings—had the most impact on individuals, and which of the identified privacy concerns were legitimate and could be addressed by the privacy label.

This study focused on people between the ages of 20 and 35 to identify app usage trends and privacy issues among Irish app users. This particular age range was selected to provide important insights into the habits and privacy concerns of this digitally active demographic of young adults who are frequently early adopters of technology and digital trends. Fifty individuals in this age range who lived in Ireland participated in this study to precisely document the preferences and behaviors specific to this demographic. Online questionnaires were used to gather data, and to successfully reach and engage the intended audience, the surveys were distributed via social media platforms. Quantitative techniques were used in the response analysis to enable a thorough investigation of common app usage patterns and major privacy issues among those taking part.

A number of the respondents’ primary privacy concerns are highlighted in the poll responses. The following lists the main privacy issues in brief:

The survey reveals the most frequently used categories of apps among respondents (Figure 3): Social media: 63.6% of respondents primarily use social media apps, such as Facebook, Instagram, Twitter and Snapchat. Entertainment: 17.1% favor entertainment apps, including streaming services and gaming apps. Shopping: 7.3% of respondents predominantly use shopping apps. News and information, health and fitness, travel and navigation: Each of these categories is used by 2.4% of respondents, indicating a lower frequency of use compared to other app categories.

Concern About Privacy: (Figure 4) The degree of concern about privacy among app users is as follows: Very concerned: 50% of respondents are very concerned about their personal information’s privacy. Somewhat concerned: 45.2% express some concern, showing that a significant majority hold reservations about privacy. Other levels of concern: A minority are slightly concerned, neutral, or not concerned, underlining that privacy remains a predominant issue for most.

Familiarity with Privacy Labels: (Figure 5) Respondents’ familiarity with privacy labels in the iOS App Store is categorized as follows: Very familiar: 39% are well-acquainted with the privacy labels. Heard of them but do not know much: 43.9% have heard of them but lack detailed knowledge. Not familiar: 17.1% are not familiar with privacy labels, suggesting a need for increased awareness and educational efforts. Many respondents expressed significant concerns about the privacy of their personal information while using apps. Specifically, they were worried about how their data are collected, stored and used. Key concerns included the types of data being collected, the potential for these data to be shared with third parties, and how securely the data are stored to prevent access or breaches. Additionally, there was apprehension regarding whether apps track users’ activities across other applications and services, and whether these practices comply with existing privacy regulations. To address these concerns, respondents seek greater transparency about privacy practices and more control over their data through app settings. This indicates a strong desire for assurances that apps are adhering to legal and ethical standards in handling personal information.

Detailed App Analysis: In response to the survey, we conducted a detailed analysis of the three most frequently mentioned apps by our respondents, which collectively represent 55% of the total usage among participants: Leap Top Up, TFI Live, and AIB. Here is a brief overview of each app: Leap Top Up is an application that allows for users to manage their Leap Card, a smart card used for public transportation across Ireland. It provides functionalities such as easy top-ups and balance checks without tracking user data, catering to privacy-conscious consumers. TFI Live, operated by Transport for Ireland, offers real-time bus route information and uses a user’s location to provide relevant route directions, but it does not engage in continuous location tracking. This approach prioritizes user privacy and challenges the conventional norms of location-based services. Lastly, the AIB app from Allied Irish Banks delivers a comprehensive suite of mobile banking services, including balance checks, fund transfers and bill payments. It is specifically designed to protect user data and ensure transaction security without tracking user activities, thus building trust among its users. These apps demonstrate how various sectors are increasingly considering user privacy in their service offerings.

7. Conclusions and Future Work

This work analyzed a large corpus of iOS apps (n = 541,662) and identified the prevalence of sensitive and special categories of personal data being used based on the application of the GDPR. Our work shows how a large number of apps use such sensitive/special categories and in many cases do so without a sufficient apparent justification for why the app needs to use that information. Our work also shows the prevalence of using these categories to track individuals without any transparent information, in contravention of the requirements of the GDPR.

This study identified significant flaws in the implementation of iOS privacy labels, revealing substantial data collection under the guise of app functionality and data analytics, with the health and fitness category posing particular concerns due to high levels of data collection linked and unlinked to users. Such excessive data gathering poses a grave risk if breached, potentially leading to misuse of sensitive health information, identity theft, fraud and unwanted exposure of private data, as underscored by the 2018 MyFitnessPal breach compromising 150 million accounts. Users must proactively safeguard their data by managing app permissions, reviewing privacy labels before downloading apps, and disabling the “Allow Apps to Request to Track” option in privacy settings. Additionally, this study observed unjustified location tracking in categories such as sports, education and business. The TFI Live app serves as a positive example of delivering location-based services without intrusive tracking, suggesting many apps engage in unnecessary data collection practices. This calls for Apple’s App Store to bolster its app review process, ensuring developers provide compelling justifications for location tracking, with regular audits to ensure compliance with privacy standards.

Our research advances the state of the art by providing an empirical analysis of data collection practices across app categories, confirming and extending the findings of previous work by Scoccia et al. (2022) [4]. We highlight inconsistencies in privacy labels and user behavior regarding privacy settings, emphasizing the need for improved transparency and user vigilance. In conclusion, our study underscores the urgent need for better privacy practices in the app ecosystem, offering insights and recommendations to create a more secure and privacy-conscious environment, ultimately aiming to enhance user trust and protect sensitive information from potential breaches and misuse.

Future Work: This study identified numerous areas that should be the focus of future research. First and foremost, longitudinal studies are required to assess how privacy label efficacy changes over time and impact developer practices and user behavior. Furthermore, broadening the scope of the investigation to encompass a greater number of app categories would offer a more thorough understanding of data-gathering procedures used throughout the App Store. Examining different privacy label designs and user education techniques could improve knowledge and control over app permissions. The effects of improved app review procedures and regulatory modifications on privacy label accuracy and developer compliance should also be investigated in future studies. Lastly, researching how well- publicized data breaches affect user privacy settings and trust could provide information about how to increase security and transparency.

Author Contributions

Conceptualization: all authors (Z.A.S., S.G., H.J.P.); Methodology: all authors (Z.A.S., S.G., H.J.P.); Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Visualization, Writing—Original Draft Preparation: Z.A.S. and S.G.; Writing—Review and Editing: all authors (Z.A.S., S.G., H.J.P.); Supervision and Project Administration: H.J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was approved by the Ethics Committee of Dublin City University for studies involving a survey of humans.

Informed Consent Statement

Informed consent was obtained from all participants involved in the user survey.

Data Availability Statement

Data were shared with us by authors Balash, David; Ali, Mir Masood; Wu, Xiaoyuan; Kanich, Chris; and Aviv, Adam (2022): Longitudinal Analysis of Privacy Labels in the Apple App Store. https://doi.org/10.48550/arXiv.2206.02658 [3].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kelley, P.G.; Bresee, J.; Cranor, L.F.; Reeder, R.W. A “Nutrition Label” for Privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (SOUPS ’09), Mountain View, CA, USA, 15–17 July 2009; Association for Computing Machinery: New York, NY, USA, 2009. 12p. [Google Scholar] [CrossRef]
Xiao, Y.; Li, Z.; Qin, Y.; Guan, J.; Bai, X.; Liao, X.; Xing, L. Lalaine: Measuring and Characterizing Non-Compliance of Apple Privacy Labels at Scale. arXiv 2022, arXiv:2206.06274. [Google Scholar] [CrossRef]
Balash, D.G.; Ali, M.M.; Wu, X.; Kanich, C.; Aviv, A.J. Longitudinal Analysis of Privacy Labels in the Apple App Store. arXiv 2022. [Google Scholar] [CrossRef]
Scoccia, G.L.; Autili, M.; Stilo, G.; Inverardi, P. An empirical study of privacy labels on the Apple iOS mobile app store. In Proceedings of the 9th IEEE/ACM International Conference on Mobile Software Engineering and Systems, Pittsburgh, PA, USA, 17–24 May 2022; pp. 114–124. [Google Scholar] [CrossRef]
Kollnig, K.; Shuba, A.; Van Kleek, M.; Binns, R.; Shadbolt, N. Goodbye tracking? Impact of iOS app tracking transparency and privacy labels. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022; pp. 508–520. [Google Scholar]
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (2016) OJ L 119/1. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679 (accessed on 14 August 2024).
Article 35—Data Protection Impact Assessment. Available online: https://gdprhub.eu/Article_35_GDPR#:~:text=Article%2035%20requires%20the%20controller,and%20freedoms%20of%20natural%20persons (accessed on 26 July 2024).
Apple Inc. App Privacy Details on the App Store. 2020. Available online: https://developer.apple.com/app-store/app-privacy-details/ (accessed on 26 July 2024).
Data Protection Commission. Special Category Data. 2024. Available online: https://gdpr-info.eu/art-9-gdpr/ (accessed on 26 July 2024).
Rodriguez, D.; Jain, A.; Del Alamo, J.M.; Sadeh, N. Comparing Privacy Label Disclosures of Apps Published in both the App Store and Google Play Stores. In Proceedings of the 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Delft, The Netherlands, 3–7 July 2023; pp. 150–157. [Google Scholar] [CrossRef]
Li, Y.; Chen, D.; Li, T.; Agarwal, Y.; Cranor, L.F.; Hong, J.I. Understanding iOS privacy nutrition labels: An exploratory large-scale analysis of app store data. In Proceedings of the CHI Conference on Human Factors in Computing Systems Extended Abstracts, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–7. [Google Scholar] [CrossRef]
Li, T.; Louie, E.; Dabbish, L.; Hong, J.I. How Developers Talk About Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit. Proc. ACM Hum.-Comput. Interact. 2021, 4, 1–28. [Google Scholar] [CrossRef]
Khandelwal, R.; Nayak, A.; Chung, P.; Fawaz, K. Comparing Privacy Labels of Applications in Android and iOS. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society (WPES ’23), Copenhagen, Denmark, 26 November 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 61–73. [Google Scholar] [CrossRef]
Zhang, S.; Sadeh, N. Do Privacy Labels Answer Users’ Privacy Questions? In Proceedings of the Symposium on Usable Security and Privacy (USEC), San Diego, CA, USA, 27 February 2023. [Google Scholar] [CrossRef]
Ravichander, A.; Black, A.W.; Wilson, S.; Norton, T.; Sadeh, N. Question answering for privacy policies: Combining computational and legal perspectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4949–4959. [Google Scholar] [CrossRef]
Bargh, M.S.; van de Mosselaar, M.; Rutten, P.; Choenni, S. On using privacy labels for visualizing the privacy practice of SMEs: Challenges and research directions. In Proceedings of the 23rd Annual International Conference on Digital Government Research, Virtual Event, 15–17 June 2022; pp. 166–175. [Google Scholar] [CrossRef]
Zhang, S.; Feng, Y.; Yao, Y.; Cranor, L.F.; Sadeh, N. How Usable Are iOS App Privacy Labels? Proc. Priv. Enhancing Technol. 2022, 2022, 204–228. [Google Scholar] [CrossRef]
Gardner, J.; Feng, Y.; Reiman, K.; Lin, Z.; Jain, A.; Sadeh, N. Helping Mobile Application Developers Create Accurate Privacy Labels. In Proceedings of the 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Genoa, Italy, 6–10 June 2022; pp. 212–230. [Google Scholar] [CrossRef]
Koch, S.; Wessels, M.; Altpeter, B.; Olvermann, M.; Johns, M. Keeping Privacy Labels Honest. Proc. Priv. Enhancing Technol. 2022, 2022, 486–506. Available online: https://api.semanticscholar.org/CorpusID:251384903 (accessed on 26 July 2024). [CrossRef]
Jain, A.; Rodriguez, D.; del Alamo, J.; Sadeh, N. ATLAS: Automatically Detecting Discrepancies between Privacy Policies and Privacy Labels. In Proceedings of the 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Delft, The Netherlands, 3–7 July 2023; pp. 94–107. [Google Scholar] [CrossRef]

Figure 1. An example of Apple’s App Store privacy labels.

Figure 2. The number of data items collected in privacy labels based on 100k apps [10].

Figure 3. Most used categories of apps.

Figure 4. Concern about privacy.

Figure 5. Familiarity with privacy labels.

Table 1. The table below provides information about the dataset.

Category	Count
Data linked to you	123,984
Data not linked to you	136,630
Data used to track you	62,638
No privacy label	83,618
No data collected	134,792
Total	541,662

Table 2. Special category distribution in “data linked to you” with percentage breakdown.

Category	Medical	Health and Fitness
Analytics	3.32% (4373)	21.27% (26,242)
App functionality	9.56% (12,610)	53.09% (65,535)
Developer advertising	83.42% (109,917)	9.39% (11,588)
Other purposes	1.02% (1340)	1.86% (2290)
Product personalization	2.34% (3089)	12.27% (15,149)
Third-party advertising	0.35% (463)	2.11% (2609)
Total	131,792	123,413

Table 3. Sensitive category distribution in “data linked to you” with percentage breakdown.

Category	Finance	Photo and Video	Navigation
Analytics	19.25% (18,580)	26.54% (2313)	18.74% (1643)
App functionality	47.89% (46,201)	34.96% (3048)	51.54% (4519)
Developer advertising	13.25% (12,789)	7.33% (639)	6.32% (554)
Other purposes	3.83% (3700)	2.93% (255)	3.62% (317)
Product personalization	13.86% (13,379)	9.66% (842)	12.28% (1077)
Third-party advertising	1.93% (1861)	18.54% (1618)	7.47% (655)
Total	96,510	8715	8765

Table 4. Special category distribution in “data not linked to you” with percentage breakdown.

Category	Medical	Health and Fitness
Analytics	35.10% (4505)	32.63% (11,641)
App functionality	47.08% (6042)	46.63% (16,631)
Developer advertising	2.06% (264)	3.29% (1173)
Other purposes	5.67% (728)	4.08% (1454)
Product personalization	7.64% (980)	8.72% (3110)
Third-party advertising	2.43% (312)	4.68% (1670)
Total	12,831	35,679

Table 5. Sensitive category distribution in “data not linked to you” with percentage breakdown.

Category	Finance	Photo and Video	Navigation
Analytics	32.23% (9780)	37.25% (4218)	33.14% (2288)
App functionality	45.09% (13,681)	31.61% (3579)	45.62% (3149)
Developer advertising	3.20% (972)	5.16% (584)	2.74% (189)
Other purposes	9.00% (2732)	3.19% (361)	4.01% (277)
Product personalization	7.05% (2140)	5.99% (679)	6.68% (461)
Third-party advertising	3.42% (1037)	16.78% (1900)	7.82% (540)
Total	30,342	11,321	6904

Table 6. Special category distribution in “data used to track you” with percentage breakdown.

Data	Health and Fitness	Medical
Identifiers	30.14% (1502)	27.18% (296)
Usage data	28.27% (1409)	26.92% (293)
Diagnostics	11.44% (570)	11.84% (129)
Contact info	10.96% (546)	10.28% (112)
Location	9.37% (467)	10.19% (111)
Other data	1.59% (79)	3.95% (43)
User content	1.34% (67)	2.30% (25)
Purchases	3.97% (198)	2.02% (22)
Search history	0.58% (29)	1.29% (14)
Sensitive info	0.60% (30)	1.10% (12)
Browsing history	0.46% (23)	0.92% (10)
Financial info	0.26% (13)	0.83% (9)
Health and fitness	0.82% (41)	0.64% (7)
Contacts	0.20% (10)	0.55% (6)
Total	4984	1089

Table 7. Sensitive category distribution in “data used to track you” with percentage breakdown.

Data	Photo and Video	Finance	Navigation
Identifiers	36.04% (973)	30.36% (857)	28.65% (310)
Usage data	34.48% (931)	29.86% (843)	25.23% (273)
Diagnostics	12.07% (326)	11.19% (316)	11.37% (123)
Location	9.26% (250)	10.56% (298)	20.89% (226)
Purchases	2.89% (78)	1.31% (37)	1.39% (15)
Other data	2.22% (60)	2.05% (58)	3.42% (37)
Contact info	1.59% (43)	8.96% (253)	5.82% (63)
User content	0.74% (20)	2.16% (61)	0.92% (10)
Browsing history	0.41% (11)	0.60% (17)	0.37% (4)
Contacts	0.15% (4)	0.60% (17)	0.37% (4)
Search history	0.11% (3)	0.46% (13)	1.20% (13)
Sensitive info	0.04% (1)	0.07% (2)	0% (0)
Financial info	0% (0)	1.77% (50)	0.37% (4)
Health and fitness	0% (0)	0.04% (1)	0% (0)
Total	2700	2823	1082

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Surma, Z.A.; Gowdar, S.; Pandit, H.J. Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps. Information 2024, 15, 551. https://doi.org/10.3390/info15090551

AMA Style

Surma ZA, Gowdar S, Pandit HJ. Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps. Information. 2024; 15(9):551. https://doi.org/10.3390/info15090551

Chicago/Turabian Style

Surma, Zaid Ahmad, Saiesha Gowdar, and Harshvardhan J. Pandit. 2024. "Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps" Information 15, no. 9: 551. https://doi.org/10.3390/info15090551

APA Style

Surma, Z. A., Gowdar, S., & Pandit, H. J. (2024). Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps. Information, 15(9), 551. https://doi.org/10.3390/info15090551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Examining the Integrity of Apple’s Privacy Labels: GDPR Compliance and Unnecessary Data Collection in iOS Apps

Abstract

1. Introduction

2. Background

3. Literature Review and State of the Art

3.1. iOS Privacy Labels

3.2. How Developers Talk about Personal Data and What It Means for User Privacy: A Case Study of a Developer Forum on Reddit

3.3. Comparison of Privacy Labels in iOS and Google Store

3.4. Do Privacy Labels Help Users?

3.5. Why Do Low-Popularity Apps Have Privacy Labels Missing?

3.6. How Usable Are iOS Privacy Labels?

3.7. Helping Mobile Application Developers Create Accurate Privacy Labels

3.8. Keeping Privacy Labels Honest

3.9. ATLAS: Automatically Detecting Discrepancies between Privacy Policies and Privacy Labels

4. Study Design

5. Analysis and Results

5.1. RQ1: Use of Sensitive and Special Categories

5.1.1. “Data Linked to You” Using Sensitive and Special Categories

5.1.2. “Data Not Linked to You” Using Sensitive and Special Categories

5.1.3. “Data Used to Track You” Using Sensitive and Special Categories

5.2. RQ2: Disparity between App-Stated Permissions and Apparent Unnecessary Data Gathering

5.2.1. Justification for Tracking

5.2.2. High Incidence of Location Tracking

5.2.3. Tracking Using “Other Information”

5.3. Identified Issues in Privacy Labels and App Listings

5.3.1. Missing Privacy Policy URL

5.3.2. Missing Privacy Labels

6. User Survey on App Usage and Privacy Concerns

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI