Impact, Compliance, and Countermeasures in Relation to Data Breaches in Publicly Traded U.S. Companies

: A data breach is the unauthorized disclosure of sensitive personal data, and it impacts millions of individuals annually in the United States, as reported by Privacy Rights Clearinghouse. These breaches jeopardize the physical safety of the individuals whose data are exposed and result in substantial economic losses for the affected companies. To diminish the frequency and severity of data breaches in the future, it is imperative to research their causes and explore preventive measures. In pursuit of this goal, this study considers a dataset of data breach incidents affecting companies listed on the New York Stock Exchange and NASDAQ. This dataset has been augmented with additional information regarding the targeted company. This paper employs statistical visualizations of the data to clarify these incidents and assess their consequences on the affected companies and individuals whose data were compromised. We then propose mitigation controls based on established frameworks such as the NIST Cybersecurity Framework. Additionally, this paper reviews the compliance scenario by examining the relevant laws and regulations applicable to each case, including SOX, HIPAA, GLBA, and PCI-DSS, and evaluates the impacts of data breaches on stock market prices. We also review guidelines for appropriately responding to data leaks in the U.S., for compliance achievement and cost reduction. By conducting this analysis, this work aims to contribute to a comprehensive understanding of data breaches and empower organizations to safeguard against them proactively, improving the technical quality of their basic services. To our knowledge, this is the first paper to address compliance with data protection regulations, security controls as countermeasures, financial impacts on stock prices, and incident response strategies. Although the discussion is focused on publicly traded companies in the United States, it may also apply to public and private companies worldwide.


Introduction
Data leakage severely threatens the operations of enterprises, including private corporations and government agencies.The loss of sensitive information can result in substantial reputational damage and financial losses and jeopardize the long-term stability of an organization.Commonly leaked data include employee and customer information, intellectual property, and confidential medical records.Recently, numerous high-profile data breaches have cost companies billions of dollars [1].This trend has been further amplified by the rapid growth of data in the digital age, making data leaks more frequent than ever, particularly with the rise of Industry 4.0.As an illustration of the scale of cybersecurity threats, in January 2024, 26 billion records were exposed, marking it as the largest known data breach, termed the Mother of All Breaches (MOAB).This breach comprised 12 TB of data leaked from platforms such as LinkedIn, Twitter, Weibo, and Tencent [2].Thus, preventing the

Contributions and Limitations of the Work
In this work, we assessed the statistics of data exposures that have affected publicly listed U.S. companies and we used the observed patterns and study cases as the foundation of a discussion regarding the following: (i) compliance with data protection regulations in both the geographical and sectoral scopes; (ii) security, technical, and administrative controls applicable to protecting the data while in use, in transit, and at rest for different attack vectors and throughout their entire life cycle: create, store, use, share, archive, and destroy; (iii) guidelines for adequately responding to this kind of incident; (iv) the consequences of an exfiltration for both the data subject and the data owner.To our knowledge, this is the first study to comprehend these four areas.
For this analysis, we have also improved the dataset proposed by [6] with the industrial sector of the target companies.As a consequence, the paper evaluates the frequency and impact of data breaches on different economic sectors, providing information regarding the prevalence of these incidents and advancing the discourse on prioritizing resources allocation.This is presented in Section 4. In Section 5, we review the literature on effective security measures to safeguard sensitive information against the studied attack vectors, providing a roadmap for organizations to enhance their resilience against data breaches.The paper also discusses compliance aspects, reviewing the role of data protection regulations in protecting organizations against data breaches.
Although it did not interfere with the compliance, mitigation, and response discussion, the dataset used was restricted to companies in the U.S., and the statistics presented may not represent the data breach trends worldwide.Additionally, the study's focus on publicly listed companies may not fully represent the broader corporate landscape, as privately held firms may face distinct cybersecurity challenges.Another limitation of this work was the potential under-reporting of data breaches, as not all incidents may have been publicly disclosed.

Structure of the Work
The structure of the remaining sections in this paper is as follows: Section 2 reviews relevant literature pertaining to data breaches; Section 3 concisely presents the dataset used along with the new attribute proposed; exploratory data analysis is presented in Section 4; Section 5 correlates the observed statistics of the dataset with various cybersecurity factors; Section 6 concludes the paper and outlines potential directions for future research.

Related Works
In the literature review, an examination of prior research that has investigated the assessment of data breaches was undertaken, aligning with certain aspects of this study.Reinforcing the importance of the study of data breaches in the United States, Sood and Cor [8] estimated that 82.84% of Americans have had their data breached at some time, with each person being victimized by three breaches on average.
To attain a better perception regarding data breaches, studies have used exploratory data analysis to reveal meaningful observations about this type of security incident.In that regard, Hammouchi et al. [9] used the PCR dataset, in the time range between 2005 and 2018, to assess several data breach metrics, such as inter-arrival time and trend analysis.They found that breaches ensuing hacking activity are the most significant regarding leaked data and financial impact.They also concluded that the most targeted economic sectors were medical and manufacturing/technology/communications.This paper bases its discussion on a similar analysis, although restricted to publicly traded companies and with additional investigations, such as geographical statistics.
Similarly, Raghupathi et al. [10] used a dataset gathered from the U.S. Department of Health and Human Services (HHS) Office for Civil Rights (OCR) to visualize patterns and trends in data breaches related to medical and health data.They also noticed the noteworthy contribution of hacking-type breaches in companies in this branch.Text mining methods have also been applied in the HHS OCR dataset, using co-occurrences of keywords to identify characteristics, vulnerabilities, impacts, and responses to insider threats in the healthcare industry [11].Specifically to companies in the medical and health sector, data protection policies and controls implemented must be compliant with the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH).
Due to the sensitiveness of medical data, several authors have studied the risks pertaining to healthcare privacy globally, such as in India [12], in China [13], and in general [14].In this work, we discuss the compliance and regulatory aspects of data protection applicable in the United States for different types of data, including medical, and in effect in different U.S. States.Recent studies have demonstrated that compliance with these standards assists in cybersecurity risk management [15].
With respect to risk management and assessment, Algarni et al. [16] stated that cybersecurity risks had not yet been formally addressed in regard to data breaches and proposed a model for estimating the likelihood and cost of a data breach, which are key components for a quantitative risk assessment.They remarked that not using encryption and not involving the Business Continuity Management team in the incident response raise both the cost and the probability of a data breach.Additionally, they observed the increased probability and cost of exposures due to malicious attacks compared to human error and system glitches and that the healthcare/pharmaceutical and technology/software industries endure greater costs.It has also been concluded that the longer the retention time of sensitive data, the higher the cost of data leakage.Bayesian frameworks have also been used in cyber security data breach risk management, with a suggestion for insurers to use this model to price their products [17].
To anticipate data breaches, Barati and Yankson [18] employed Poisson and negative binomial models in the Privacy Rights Clearinghouse dataset.Their findings suggest that their proposed system demonstrates the ability to predict data breach incidents with minimal deviation from the actual numbers.
After a breach has occurred, an inquiry must be conducted to contain it, identify the root cause, and mitigate the incident.Considering these response steps, Masuch et al. [19] studied the impact on companies' stock prices after different approaches adopted by them: either justify the breach, in which case the severity of the incident is minimized, or apologize for it, admitting guilt.The results have shown that a justification typically does not impact the stock market, whereas an apology negatively influences it.Other authors have also studied the reaction of a company's stock prices after it breaches data [1,20,21].It has been observed that companies subjected to a data breach tend to increase cybersecurity risk factor disclosures, a mandatory notification outlined by the US Securities and Exchange Commission (SEC), especially in the case of larger breaches [22].We also present guidelines for effectively responding to these incidents and briefly show possible financial consequences on affected companies and the challenges regarding studying stock price reaction to cybersecurity breaches.
Studying the customer's reaction after having their data leaked through a company is also relevant.Lulandala [23] studied the change of behavior of Facebook users approaching advertisements after a data leak.The authors have remarked that these security incidents reduce customers' trust and lower ad engagement.
Molitor et al. [24] conducted research to identify the key terms and legal implications of data breaches through Machine Learning (ML) and text mining in litigation cases.The results showed that the litigants were concerned about significant topics such as identity theft, hackers, negligence, insurance, phone devices, credit cards, and privacy.
Schlackl et al. [25] reviewed relevant academic papers about data breaches, emphasizing their antecedents and consequences, summarizing the literature on what influences a data breach and the subsequent repercussions.Schlackl et al. [25] also indicated which fields are more studied than others.Equivalently, Patterson et al. [26] systematically reviewed the literature on data breaches and identified future research, pointing out that organizations have not fully maximized the potential benefits of learning from incidents and have not conducted thorough evaluations to determine the effectiveness of their learning processes.Ref. [27] also reviewed the literature on data breaches, focusing on risk management and briefly reviewing data breach causes, prevention, containment, and recovery.
Addressing data exfiltration countermeasures, Ullah et al. [28] reviewed 108 papers, mapping the countermeasures to different attack vectors from malicious external threats, and they concluded that there is no emphasis in research on the data in rest and in transit states and on investigative countermeasures.In this work, we promote discussion on all data states and on the improvement of accountability and digital forensics, encompassing both internal and external threats.Aslam et al. [29] discussed countermeasures and compliance of data breaches, mainly centered on Smart Cities and the Internet of Things (IoT), presenting a taxonomy of prevention and the consequences of these incidents, seen in Figure 1.This work addresses all prevention taxonomy elements, while focusing on the consequences for asset prices in the impact part.The impact on customers is discussed in specific study cases and in the works reviewed.Also, despite the emphasis of this paper on publicly listed U.S. companies, its contributions and the promoted discussion apply to public and private organizations worldwide.

Data Breach Datasets
Other authors have contributed to the field by publishing datasets focused on data breaches.For instance, Neto et al. [30] created a dataset encompassing global data breach incidents that transpired between 2018 and 2019.Among their findings, they observed a significantly higher frequency of data breach incidents in Europe, with approximately 160,000 reported incidents during that period, in contrast to the United States, where there were approximately 10,000 such incidents.
Considering the sensitiveness of medical data breaches, Ronquillo et al. [31] published a dataset comprised of healthcare data breaches in the United States, observing that, in this sector, hacking activities were responsible for roughly 25% of the incidents but compromised nearly 85% of the records.This indicates that hack data breaches caused an excellent average of records leaked per incident in this dataset.
Park [32] provides a dataset on data breaches that occurred between 2012 and 2016 in California, including information such as whether the company was sued, the duration of time the free credit monitor service was provided to the affected customers, the economy sector of the breached company, the attack vector of the incident, and the size of the breach.
Data breaches and ransomware attacks that occurred in Australia between 2004 and early 2020 were provided by Tsen et al. [33], along with information on technical and administrative countermeasures employed by the affected organizations, such as the use of encryption, established security policies, and improper network segmentation.Also, in Australia, Biddle et al. [34] conducted a survey to understand better the Australian public's attitudes towards data governance, including the level of concern about data breaches.Additionally, Ikegami and Kikuchi [35] proposed a probabilistic model that estimated a given company's risk of a data breach.They referenced two datasets associated with data breaches in Japan from 2005 to 2018.
The United Kingdom's Information Commissioner's Office publishes information about data breach cases quarterly (https://ico.org.uk/about-the-ico/our-information/complaints-and-concerns-data-sets/self-reported-personal-data-breach-cases/ accessed on 7 April 2024).Likewise, the United States Government's open data website supplies a dataset covering data breach incidents that affect at least 500 Washington State residents (https://catalog.data.gov/dataset/data-breach-notifications-affecting-washingtonresidentsaccessed on 7 April 2024).
These datasets, however, either fall outside the geographical scope of this paper or are overly constrained for our research purposes.At the time of this writing and to the best of our knowledge, 12 other works have referenced the work of Rosati and Lynn [6].These studies are briefly presented in Table 1.
The dataset offered by Privacy Rights Clearinghouse, while more comprehensive in terms of the types of companies breached and the time frame covered compared to Rosati and Lynn [6], was deemed excessively broad in scope for the focus of this paper, which centered on publicly listed companies.Furthermore, the new fields incorporated by Rosati and Lynn [6] into the dataset greatly enhanced our comprehension of the risks associated with these incidents.
This paper adopted the dataset published by Rosati and Lynn [6].This choice was primarily based on the dataset's concentration on data breaches in publicly traded U.S. companies over an extended time frame and the new fields added by the authors, which aligned with the scope and objectives of our research.
Table 1.Publications that cited the work of [6].
Reference Description [36] Assesses the effects of cybersecurity investments within the realm of data breaches.[37] Explores the connection between cybersecurity performance influencing factors that lead to human errors and the subsequent occurrence of data breaches.[38] Investigates the impact of non-financial disclosure laws obligations to breached companies.[39] Investigates the responses of organizations to data breach incidents.
[40] Presents statistical distributions of malicious and negligent data breaches.
[41] Assesses the implications of implementing an inverted firm strategy with the use of Application Programming Interfaces (APIs), highlighting the elevated risk of data breaches associated with the use of public APIs.
[42] Proposes a cybersecurity risk-quantification-and-classification framework designed for application to real-world data breaches.[43] Examines the compliance strategies that IT business leaders within nonprofit organizations should adopt to reduce the risk of cyber threats that could potentially lead to data breaches.
[44] Reviews predictive algorithms and analyzes trends of breach incidents.
[45] Performs a brief exploratory data analysis of the dataset.
[46] Studies the effectiveness of K-Means Featurization (KMF) in addressing the challenges presented by complex datasets.

The Dataset
The dataset used in this work, published by Rosati and Lynn [6], was originally retrieved from the Privacy Rights Clearinghouse repository.The authors subsequently filtered the PRC dataset to encompass data breach events that impacted companies listed on the NYSE or NASDAQ exclusively.Therefore, the work of Rosati and Lynn [6] published and described the dataset, which this study analyses and discusses.
According to PRC, the data breaches were primarily gathered from U.S. Attorneys General and the Department of Health and Human Services, and they do not constitute an exhaustive list of all breaches, reflecting only the data breaches that have been reported and made publicly available in the United States.The list is even less exhaustive in the case of this work, as it comprehends publicly listed U.S. companies exclusively.Nevertheless, the dataset was deemed suitable for the objectives of this study, which were to estimate breach sizes and explore various security facets related to data breaches within NYSE-and NASDAQ-listed companies.
The dataset is presented in a tabular format, in which each column contains specific information about these events.In total, the table comprises 506 rows, each representing a data breach, and 15 columns.To enhance the analysis conducted, the dataset was enhanced with information regarding the economic sector in which each company operates, using Yahoo!Finance API.This augmentation allowed for a more comprehensive examination of the factors pertinent to data breaches.However, for this paper, only nine of the original columns were used, with the addition of the economy sector column.A description of these 10 dataset fields is presented in Table 2. Examples: Financial, Energy, Healthcare

Exploratory Data Analysis
This section presents several remarks concerning the dataset provided by Rosati and Lynn [6].

Geographical View
As outlined in Section 3, the dataset pertains to data breaches reported in the United States, and, therefore, the geographical analysis is restricted to this country.
As noted in Figure 2, the States that experienced the highest number of reported breaches were California, with 79 breaches, and New York, with 75.Conversely, during the period covered by the dataset, there were no reported breaches in the following States: Hawaii, Mississippi, Montana, New Mexico, North Dakota, South Dakota, West Virginia, and Wyoming.Among the States where breaches occurred, Alaska, Iowa, Maine, and Vermont registered the fewest incidents, with only one occurrence each.The Maine exposure transpired in 2012 and targeted New York State Electric & Gas (NYSEG) and Rochester Gas and Electric (RG&E), subsidiaries of Iberdrola USA.This breach resulted in the disclosure of 5100 records containing Social Security numbers, dates of birth, and financial institution account numbers [48].In response to the breach, the company later offered a creditmonitoring-assistance membership (https://oag.ca.gov/ecrime/databreach/reports/sb24-22146 accessed on 7 April 2024).
Consistent with the choropleth map shown in Figure 2, the distribution of breach types among the 10 most affected States, as depicted in Figure 3, was predominantly led by California and New York.This chart also shows similar breach-type proportions in the States.It is remarked, however, that while hacking breaches represented a significant share of exposures in these States, New Jersey had only one reported case of this type.
This case was the Heartland Payment Systems data breach that took place in 2009.This attack was orchestrated by international hackers and involved a U.S. Secret Service informant who returned to his criminal life as a hacker and exploited an SQL injection vulnerability [49].The company waited a year to announce the disclosure of the credit card information of 130 million customers, causing a drop in its stock price by almost 80% [50].

Companies
The dataset comprises 506 data breaches distributed among 274 unique companies.Figure 4 represents the 10 companies that experienced the highest number of violations.A translation between the company ticker and its name is provided in Table 3, which also indicates the company sector.Among the 10 companies that experienced the most breaches, 4 operated within the financial sector.As outlined in Section 4.5, this is the most targeted economic sector.
With the aim of identifying areas of increased vulnerability within these companies, Figure 5 illustrates the various incident types for each corporation.Consequently, it is worth noting, for example, that while Sears Holding had a somewhat even distribution of types, all security breaches at Apple were attributed to hacking activities, while a significant proportion of McDonald's disclosures resulted from malicious insider actions.Furthermore, debit card and credit card fraud incidents were only observed in financial companies, and stationary computer loss, inappropriately accessed, discarded, or stolen, successfully breached Walgreens only.

Breach Sizes
The total sum of records breached in the dataset, which considered exclusively publicly traded companies, resulted in roughly 1.07410 × 10 9 , which was approximately 3 times the population of the country in 2013, at around 3.16128 × 10 8 people [51].This may have been due to the breach of data of deceased people and to the same person's data being leaked multiple times.
The 10 most significant breaches, in terms of records leaked, in the dataset and their corresponding types are depicted in Figure 6.A dictionary between each of these affected companies' ticker and their name is provided in Table 4.
In the case of the LinkedIn incident, which was the most voluminous data exposure examined within this study, it was reported that unsalted SHA-1 hashed passwords were leaked.However, limited information is available regarding the specific method by which the data were stolen [7].Poornachandran et al. [52] were able to successfully crack approximately 2.5% of these passwords.In a noteworthy incident in 2014 that occurred in California, a spear-phishing campaign effectively compromised approximately 145 million records stored by eBay [53].As outlined by the author, this breach exposed sensitive data, including customer names, email addresses, physical addresses, phone numbers, and birth dates, all in unencrypted plaintext, resulting in an estimated cost of $300 million to the company.
Minkus and Ross [54] demonstrated the severity of this breach, highlighting that, at that time, it enabled the adversary to retrieve the complete purchase history for a known username.As a consequence, it facilitated the identification of buyers of sensitive items, such as firearms and pregnancy and HIV tests.
Other cases present in Figure 6 that affected the Heartland Payment Systems, Home Depot, and Target cases are discussed in Sections 4.1 and 5.2.1.
Figure 7 illustrates the cumulative sum of breach sizes for each announcement type and the corresponding case counts.This analysis allows for a better understanding of how the timing and nature of announcements relate to the scale of data breaches.Rodrigues et al. [45], however, suggest that previous announcements do not significantly impact the amount of data leaked.
While only 185 out of the 506 breaches were preceded by an announcement, it is noteworthy that the most significant announcement type was 'Earnings'.This observation could indicate a potential financial motive by the attackers.Additionally, it is worth highlighting that despite a lower case count for 'Statement' announcements, the breach sizes associated with this category were more substantial compared to those related to 'Investigation'.

Breach Types
As observed in Figure 8, the most prevalent breach types are PORT (139 events) and HACK (118), collectively accounting for 50.79% of the incidents within the dataset.This understanding reinforces the significance of these breach types in the overall landscape of data breaches.Cases with unknown causes (UNKN) may indicate either an ineffective forensic investigation or a lack of transparency.Determining whether the problem of data breaches is worsening and identifying the prevailing trends are critical concerns.It is important to emphasize that these trends are not always immediately apparent.Thus, there is a need for rigorous data analyses to ascertain whether any discernible trends exist.Moreover, when possible, such studies can help make predictions of the trajectory of data breaches.
To improve the understanding of the Tactics, Techniques, and Procedures (TTPs) employed by attackers in data breaches and their evolving trends over time, Figure 9a provides a visualization of the total breaches reported by year for each breach type.When correlating Figures 8 and 9a, it becomes evident that while PORT breaches are the most frequent category of disclosures, HACK attacks have grown significantly since 2012.The disclosure of data as a consequence of PORT TTPs was more prominent during the early years of the dataset, with 2006 standing out, especially with 43 breaches in this category.This number was nearly four times greater than the second-most-relevant type, DISC, which had 11 breaches.These trends offer relevant information about attackers' shifting tactics and priorities over time.
The statement that humans are often considered the weakest link in cybersecurity is well-documented.Building upon this, Hammouchi et al. [9] concluded that breaches originating from human factors were on the decline, potentially due to increased awareness among personnel.A similar analysis was carried out on companies listed on the NYSE and NASDAQ to validate this assertion in this scenario.
In this context, we focused on breach types INSD, PHYS, PORT, STAT, and DISC, which are associated with human factors.Figure 9b displays the cumulative count of breaches related to these types over the years.This analysis led us to conclude that human involvement in data breaches is diminishing in publicly listed companies.However, it is important to note that this reduction in cases may also be influenced by the decline in PORT cases, as previously discussed, which were particularly numerous in 2006.
In Section 5.2, we further explore potential causes for the declining trend in humanrelated data breaches.Additionally, we present a more comprehensive discussion of mitigation strategies specific to each breach type.This analysis provides a deeper understanding of the factors influencing the reduction in such breaches and reviews best practices on how organizations can effectively address these vulnerabilities.

Company Sector
By combining data breach type with the sector in which a company operates, it is possible to verify whether a predisposition of a TTP exists when targeting a specific economic sector.Although no preference for attack vector is evidenced by Figure 10, which shows a similar distribution of types in the company sectors, some other observations are made.For instance, it is noted that the healthcare industry was not targeted by any hacking activity in the time frame covered by the dataset, nor were companies in the industrial sectors leaked via paper documents (PHYS).These observations provide valuable context regarding the distribution of breach types within different economic sectors and may help inform security strategies for these industries.
Furthermore, a more robust relationship is found between fraud involving debit/credit cards (CARD), which is the second-least-frequently-known breach type (Figure 8), and financial companies.This correlation aligns with the nature of financial operations, demonstrating a concentration of this specific exposure type in the financial sector.Differently, attacks targeting stationary devices (STAT), the least-frequently-known breach type, were primarily present in the four most-breached sectors: namely, Financial Services, Consumer Cyclical, Industrials, and Technology.These findings contrast with those of Hammouchi et al. [9], who, in their analysis of the PCR dataset from 2005 to 2019, noted that the most-targeted companies were in the sectors of healthcare and manufacturing/technology/communications, categorized as "BSO" by PRC, due to the sensitiveness of the data they hold.However, Figure 10 shows that when restricting this analysis to the publicly listed companies in the dataset this scenario changes, and that financial companies, which also hold sensitive data, are the most breached.
Figure 11 illustrates that the substantial number of breaches in this sector, shown in Figure 10, was not concentrated in a specific period but was relatively constant throughout the time contemplated in the dataset.This provides valuable information about the persistence of security challenges faced by companies in the financial sector.In Figure 10, the string "-" represents companies to which the Yahoo!Finance API could not identify the pertaining sector.It is important to emphasize that the decrease in the number of cases in 2015 was due to the incompleteness of the data for that year, which only went up to March.On the other hand, the breach cases for the 'Industrials' companies were more frequent in 2006 and have demonstrated a descending trend since then.
An association between the company sector and data protection standards is pertinent.This connection arises from the close relationship between a company's industry and the type of data it stores, influencing the data owner's responsibilities and obligations.This relationship and an overview of compliance will be further explored in Section 5.1.

Stock Market
Data breaches can have significant implications for the stock market, often leading to fluctuations in the share prices of affected companies.When a data breach occurs, investor confidence may fade due to concerns about the company's ability to protect sensitive information, resulting in a decline in share value.
As examples, we mention the historical stock price trajectories of three noteworthy companies within the dataset: Citigroup Inc., which endured the highest number of breaches; JPMorgan Chase, the second-most-affected company; and LinkedIn.com,which suffered the most voluminous breach in the dataset.They are represented in Figure 12, and it is possible to observe different market reactions to these data exposures, allowing for the observation of varying stock price responses to these data exposures.
While, in some cases, a data breach can indeed contribute to a stock price drop, other factors can also be at play.A company's stock price decline can result from various factors, including data breaches, financial crises, regulatory modifications, and broader macroeconomic dynamics.Because of that, it is essential to emphasize that the connection between data breaches and stock price depreciation is not straightforward.In certain instances, a data breach can trigger a decline in share value by diminishing investor confidence in the company.This may be due to concerns that the company is incapable of adequately safeguarding the sensitive information of its customers and employees, or due to the apprehension that the breach might inflict financial or reputational harm.
Additionally, poor management practices following a data breach, such as inadequate responses or compromised data security measures, can further exacerbate the situation and contribute to declines in share prices.
Conversely, there are scenarios in which a drop in stock price can be attributed to other factors, such as financial crises or regulatory alterations.Likewise, if a company confronts new regulatory requirements, investors may opt to sell their shares amid concerns that they might struggle to comply with the updated rules.Understanding the nature of these relationships is essential when evaluating the impact of data breaches on a company's financial standing and its effect on share prices.
Moreover, it is vital to consider that inadequate management practices can set a detrimental cycle that exacerbates crises and precipitates a decline in share values.These decisions, like implementing cost-cutting measures that compromise data security, can potentially catalyze new data breaches, further worsening the financial situation and resulting in a subsequent dip in share prices.

Discussion
In this section, we discuss some compliance and security aspects regarding the observed statistics based on the data.This includes a review of relevant laws and standards applicable to the companies included in the study, exploring security controls to mitigate the underlying causes of breaches, and examining response strategies employed when incidents occur.

Compliance and Pertinent Legislation
Data protection laws are regulations enacted to protect people's privacy, identity, reputation, and autonomy.These regulations have a global presence and include significant standards such as the European General Data Protection Regulation (GDPR), Brazil's General Law for the Protection of Personal Data (LGPD), and China's Personal Information Protection Law (PIPL) [55].This legislative development reflects the evolving data protection and privacy regulation landscape globally.
Notably, the United States currently lacks a federal-level enacted data protection law.There is, however, ongoing discussion in Congress regarding a proposed bill known as the American Data Privacy and Protection Act (ADPPA) [56].

Standards by Sector
Nonetheless, the United States has a data protection framework for various economic sectors, as outlined in Table 5.This table also details the number of breaches in the dataset that are associated to each law or standard.
It is worth noting that the U.S. has additional laws regulating data protection, but these pertain to data categories that fall outside the scope of the dataset under study.For instance, laws like the Family Educational Rights and Privacy Act (FERPA) pertain to students' data, the Driver's Privacy Protection Act (DPPA) relates to driver's records, and the Federal Information Security Modernization Act (FISMA) applies to government data.Furthermore, laws such as the Children's Online Privacy Protection Act (COPPA) protect children's data.However, the dataset does not provide sufficient information to determine whether these laws were infringed in the reported breaches.

Data Breach Notification Acts
In addition to those standards, all 50 States have established laws mandating private enterprises to inform individuals regarding security breaches that compromise personally identifiable information [56].The majority of them also include government entities in the commitment.
However, some of those laws were not in effect during the time frame observed in the dataset.For instance, Alabama's Breach Notification Act, the most recent State law, was enacted in 2018.Similarly, South Dakota (2018) and New Mexico (2017) introduced their respective laws after the temporal scope of the dataset.Thus, their areas are depicted in gray in Figure 13, which displays the chronological order of the start of the effect of breach notification laws.The work of Coie [57] offers a comprehensive breakdown of the specifics of each State Act.It details critical aspects such as their definitions of data breaches and personal information, the timing and structure of breach notifications, their applicability, and other vital provisions that form the foundation of these State-level data breach notification laws.

Data Protection Acts
Not all states have enacted data protection laws, and some have yet to take effect, as seen in Table 6.These data may also be geographically visualized in Figure 14, helping illustrate the varying degrees of legislative coverage across different States.It is important to highlight that the dataset under examination covers the period between 2005 and 2015 and that the first data protection law took effect in 2020.Therefore, none of the breached data analyzed in this paper were subject to regulation by a data protection law.Another observation is that New York, the second-most-breached State in the dataset, has not enacted any data protection law.In contrast, Montana did not have any reported breach in the analyzed data and has enacted the Montana Consumer Data Privacy Act (MTCDPA).However, it has not yet commenced its effects.Elements of the California Consumer Privacy Act (CCPA) include the rights of the data subjects, encompassing the right to be informed about the nature of collected data and its selling practices, to request data deletion, to opt out of data sales, to access their data, and to not be discriminated against in service and pricing when exercising their privacy rights.

Vulnerabilities Mitigation
To ensure the proper protection of customers' data and compliance with these regulations, a company must effectively implement security controls that mitigate the vulnerabilities that could lead to a breach.Different incident types require different security countermeasures as mitigation.
As seen in Figure 15, data breaches in 2023 were most frequently caused by phishing attacks, and the most costly originated with malicious insiders (INSDs) [58].However, when categorizing these attack vectors into the types of the dataset, it was observed that the majority of them were related to hacking activities (HACK).The 'Accidental' label in the figure may be associated with the union of the DISC, PHYS, STAT, and PORT types of the dataset, while 'Physical' may be associated with the union of PHYS, STAT, and PORT.Social engineering and phishing, which may be considered a type of social engineering, are not strictly related to a category of the dataset.
This also complements the analysis of the dataset, informing the financial impact for different attack vectors, which is important to consider in a risk assessment and in prioritizing vulnerabilities mitigation.As an example, according to Section 4.4, HACK data breaches represent the second-most-frequent cause, and also correspond to significant average costs, as seen in Figure 15.This suggests a prioritization for mitigating HACKrelated vulnerabilities.
This section reviews some suitable security controls that can mitigate these and other vectors.Such countermeasures could have reduced the likelihood and/or the impact of the breaches in the scope of this study.
To improve overall security, companies may adopt a structured framework, such as the NIST Cybersecurity Framework (CSF), which is an agnostic framework categorizing several security controls in five cores, as shown in Table 7.These core categories provide an organized approach to improving cybersecurity measures.
Especially for vulnerability mitigation, the Protect core function presents some valuable recommendations divided into categories: Identity Management, Authentication and Access Control, Awareness and Training, Maintenance, Protective Technology, Information Protection Processes and Procedures, and Data Security.The latter, more pertinent to this study, is then divided into subcategories, which are as follows: data at rest are protected; data in transit are protected; assets are formally managed throughout removal, transfers, and disposition; adequate capacity to ensure availability is maintained; integrity-checking mechanisms are used to verify software, firmware, and information integrity; the development and testing environments are separate from the production environment; integrity-checking mechanisms are used to verify hardware integrity; and protections against data leaks are implemented.
Specific to the NIST SP 800-53, which reviews security and privacy controls for information systems and organizations, the framework mentions the sections regarding information flow enforcement (AC-4), separation of duties (AC-5), the principle of the least privilege (AC-6), personnel screening (PS-3), access agreements (PS-6), boundary protection (SC-7), transmission confidentiality and integrity (SC-8), cryptographic protection (SC-13), covert channel analysis (SC-31), system monitoring (SI-4), and protection from information leakage due to electromagnetic emanation (PE-19).Regarding the latter, the TEMPEST is a valuable specification regarding equipment shielding against non-intentional leakage of radio or electric signals, sounds, and vibrations.

Core Function Description
Identify Help determine the current cybersecurity risk to the organization.Protect Use safeguards to prevent or reduce cybersecurity risk.

Detect
Find and analyze possible cybersecurity attacks and compromises.

Respond
Take action regarding a detected cybersecurity incident.

Recover
Restore assets and operations that were impacted by a cybersecurity incident.
As these countermeasures are closely related to the attack vector, we considered the breach_type of the dataset for reviewing them.
However, it is essential to note that listing all security measures for mitigating data breach-related vulnerabilities is impractical due to the vast number of attack vectors.
Therefore, this section provides general good practices against common attack vectors, not an exhaustive list of available security controls.

CARD
The Payment Card Industry Data Security Standard (PCI-DSS) is a critical debit cardand credit card-related data security standard.The PCI requires technical and operational controls to be put in place by any entity that stores, processes, or transmits credit card data.
The PCI-DSS delineates specific requirements for protecting payment card data.These requirements are detailed in Table 8, providing a comprehensive overview of the PCI-DSS standards and their associated requirements.This is enforced by three ongoing steps: an assessment, identifying all locations of cardholder data, an inventory of assets, and analyzing them for vulnerabilities that could expose cardholder data.The following stage is to repair the vulnerabilities found, and, lastly, to report the assessment and remediation details and submit the resulting document to entities the company does business with.Track and monitor all access to network resources and cardholder data.
Regularly test security systems and processes.Maintain an information security policy.
Maintain a policy that addresses information security for all personnel.
Regarding encryption, PCI-DSS requires compliance with the Point-to-Point Encryption (P2PE) standard by using one of their listed validated solutions.Two relevant data breaches involving debit card and credit card information were the Home Depot and Target Corp. breaches, displayed in Figure 6, which represent the most voluminous breaches.
The most probable cause of the breach at Target was an infection by the Citadel malware [62].This malware, which is based on its predecessor Zeus, executes a Man-in-the-Browser attack.Another malware used in the attack was BlackPOS, which aims at Points of Sale (POS) devices [63].Section 5.2.2 provides more information on banking malware.
Consequently, approximately 40 million credit card and debit card records were leaked, including their encrypted PINs and other PII.
According to the studied dataset, Home Depot was also infected by BlackPOS [63], leaking 56 million payment records.The two companies were also PCI-DSS compliant at the time of the breach [64], although Table 8 shows some requirements that could have prevented a malware infection if successfully implemented, such as regularly updating antivirus software and maintaining secure systems and applications.Additionally, the Home Depot data breach could have been prevented using P2PE and network segregation.Hence, it is observed that PCI-DSS serves as a reliable foundation for credit card security, but for better security it should not be solely implemented.
One additional technology that may be used to improve transaction security is the Europay, MasterCard, and Visa (EMV) micro-processing chip, increasing complexity and costs for card counterfeiting, which is known as skimming.
Counterfeiting cases are steeply lowering in areas where EMV is implemented, but there has been a consequential rise in Card Not Present (CNP) crime [65].A CNP crime is the unauthorized use of another individual's payment details for a transaction, mainly through online means.The payment information may be obtained after a data breach: for example, Bodker et al. [66] describe the script followed by criminals in a CNP crime, which allows a more reasoned consideration for mitigation strategies.
Furthermore, both Target and Home Depot breaches were initiated with a phishing attack [63], which reinforces the need for Security Education, Training, and Awareness (SETA), as discussed in Section 5.2.7.

HACK
As evidenced in the cases examined in this paper, such as eBay, Target, and Home Depot, phishing is a common threat vector used to initiate data breaches.A practical approach for reducing the success rate of these attacks is implementing a robust SETA program, which is discussed in more detail in Section 5.2.7.Such a program plays a fundamental role in enhancing employees' ability to recognize and thwart security threats, such as phishing attempts.
Naqvi et al. [67] reviewed the literature on phishing mitigation procedures through different vectors, such as e-mail and websites.Most proposed techniques rely on Machine Learning or training and awareness.Multifactor Authentication (MFA) may protect the user account even after successful phishing, as the user's identification and password obtained with the technique would not suffice for logging in, requiring an extra factor.
As related in Section 5.2.1, in Target and Home Depot breaches, after successfully phishing credentials, the attacker used banking malware to exfiltrate data.As the financial sector represents the most significant contribution portion in the dataset, we find it convenient to discuss this type of malware.
However, it is relevant to note that certain malware strains are region-focused, such as Guildma, Grandoreiro, and Javali, which primarily targeted Brazilian entities [69].A threat intelligence project may be needed to identify common malicious activities within the organization's operational domain.
Even after a successful malware infection, the data breach may be prevented if the company effectively applies other security measures, such as encryption and access control.This was not the case, for example, with LinkedIn, which, as discussed in Section 4.3, had millions of unsalted password hashes leaked, highlighting the importance of comprehensive security measures to protect sensitive data.
Several factors compounded LinkedIn's security vulnerabilities.Firstly, the company employed the SHA-1 hashing algorithm, which has been demonstrated to be vulnerable to various attacks.NIST SP 800-131A revision 2 has disallowed the use of SHA-1, permitting it for non-digital signature applications only.Currently, SHA-2 and SHA-3 are secure message-digesting algorithms.
Secondly, LinkedIn's security was compromised by the absence of a salt algorithm to enhance the security of hashed passwords.When the same password is processed using the same message-digesting algorithm, it consistently generates the same hash value, which increases predictability and susceptibility to brute-force attacks.Salting algorithms involve appending a unique string to the password before hashing it, significantly improving its security.Additionally, password leak bases should be continually monitored, in search of credentials in use at the organization.Section 4.1 briefly introduced the Heartland Payment System breach, which relied on SQL injection.For application-level vulnerabilities, such as the one exploited at HPY, the Open Worldwide Application Security Project (OWASP) is a well-known reference.They regularly publish the Top 10 vulnerabilities in the application security scope, along with their mitigation strategies, such as input validation, Web Application Firewall (WAF), and software testing.
Although not in the studied dataset, attackers successfully intruded on cloud providers in the 2020 SolarWinds hack case, exposing and breaching their customers' data [70].In this incident, adversaries inserted arbitrary code in the source code of a company product called Orion.Afterward, SolarWinds distributed the malicious code to its customers as part of the product, infecting over 18,000, including government entities and private companies [71].This example reinforces the importance of Supply Chain Management (SCM) in cybersecurity.
In large and technologically complex companies, keeping the systems up to date may be challenging.As a consequence, attackers may exploit known vulnerabilities in the systems.Thus, it is fundamental to establish a patch-management program to timeously update and secure the organization's assets.
For zero-day attacks, which explore previously unknown vulnerabilities, a security patch has not yet been published by the product developer, and signature detection is ineffective.As an alternative, ML methods can detect such intrusions based on the perception of suspicious activities that differ from the expected baseline, which may enhance the detectability for novel attacks.
Data Loss Prevention (DLP) solutions also contribute to data security and avoidance of data breaches.Such technologies detect and deter unauthorized data transfers, including preventing PII data breaches.However, it is important to note that DLP is ineffective in detecting data exfiltration through steganographic techniques.
When there is a need to publish statistical metrics related to a dataset, but concerns about preserving the privacy of the individuals within the dataset are paramount, leveraging differential privacy can be a valuable and appropriate approach to addressing this challenge.Differential privacy, introduced by Dwork [72], provides a framework for releasing aggregate information about a dataset while adding noise or perturbation to the data so that individual records remain private and indistinguishable.This ensures that sensitive information is protected and that statistical knowledge can be derived without compromising the privacy of the data subjects, whilst maintaining the utility of the data [73].
Because of this balance between utility and privacy enhancement, differential privacy has applicability in several areas, and is used by the US Census and by big companies such as Google, Apple, and Microsoft [74].

INSD
An insider threat is anyone with authorized access to or knowledge of an organization's resources.The company trusts this person, who knows the company's fundamentals and has access to its assets.Because of that, a malicious insider can potentially cause great damage to the company imperceptibly.The average time taken for a company to detect an insider's malicious actions is 85 days [75].
An insider may be classified as unintentional or intentional.As unintentional insider threats are more suitable, within the scope of this work, to DISC, PHYS, PORT, and START, in this section, we discuss mainly intentional malicious insiders.
The main motivations for conducting an insider attack are financial benefits or espionage [76].Insider-incident action is privilege abuse, while the actions are undertaken mainly via privilege abuse.Because of that, the least-privilege policy and Privileged Access Management (PAM) technology are helpful tools for preventing insider leakage.
The leading adopted technologies used for mitigating insider threats are Data Loss Prevention (DLP), Privileged Access Management, User and Entity Behavior Analytics (UEBA), Security Information and Event Management (SIEM), Endpoint Detection and Response (EDR), and Insider Threat Management (ITM) [75].
Administrative security controls may also be implemented.A background check and an employee screening upon hiring may reveal a mischievous past history for the candidate, enabling the company to cancel the employment process.If a person passes this investigation, enforcing a Non-Disclosure Agreement (NDA) signing is an additional countermeasure, as it will legally constitute their liability.
After an employee is hired, other security measures should still be adopted.One is assessing the need to know for each employee, enforced through an access control mechanism.Granting more access to knowledge and data than the employee needs to perform their usual tasks exposes the information unnecessarily.A similar control is based on the principle of the least privilege, which grants a worker the minimum necessary privileges.
Separation of duties is another form of mitigating inside intentional threats, which divides critical tasks among several employees, as per their department in the organization, for example.A job rotation policy, although sometimes infeasible, may also help manifest fraud, sabotage, or espionage.Terminating the contract with the employer is another critical step in preventing data leakage, and the company must ensure that the user's accounts are disabled, preferably during the exit interview, in which any equipment belonging to the organization should be returned, and after which the ex-employee should be escorted out of the facility.

PHYS, PORT, STAT: Losses
Losing a device is greatly facilitated by its mobility, as an employee may take it anywhere and be robbed or mislay the portable equipment or document, possibly containing sensitive company data.In that regard, Bring Your Own Device (BYOD) has increased the potential for such occurrences.It refers to using employee-owned mobile devices to access business enterprise content or networks.Similar portability concepts are Choose Your Own Device (CYOD), Company-Owned and Personally Enabled (COPE), and Company-Owned Business Only (COBO), and they all raise security concerns.
Wani et al. [77] list some challenges these mobile devices bring to hospitals, which may also apply to companies in general.They categorize these challenges as related to technology, human factors, and policies.They also provide possible solutions to these challenges.Table 9 presents these challenges and solutions.
In addition to BYOD, teleworking and co-working spaces may pose a security threat to companies.These work models, which have emerged since the COVID-19 pandemic, also imply new security gaps similar to the BYOD-related ones.Geofencing is another suitable security control in this scenario, which refers to triggered actions in response to a device leaving a pre-defined geolocation.Such actions could be, for instance, disabling its network interface card or remotely wiping the device to prevent data leakage upon exiting an authorized area.
On this subject, Uz [78] evaluated the effectiveness of remotely wiping data, considering the deleted data may be forensically retrieved, as explained in Section 5.2.5.
Encryption is also recommended for data protection, and, for mobile devices, File-Based Encryption (FBE) is mandatory in Android since its 10th version [79].For notebooks, BitLocker and VeraCrypt are some available options.
One appropriate access control method for BYOD is Attribute-Based Access Control (ABAC).As per this paradigm, the company can deny and concede access to an identity based on attributes of the request, such as location, hour of the day, and object being accessed.

PHYS, PORT, STAT: Disposals
When disposing of sensitive data, one must be aware of the possibility of an adversary searching the dustbin, which is known as dumpster diving, a social engineering attack.
With this method, the attacker may access any object the company discards, such as equipment and documents.There may be sensitive data among this disposed-of material, such as employees' noted passwords or customers' data.In that case, the malicious actor will have more information to conduct the attack.
It is noteworthy that since the California v. Greenwood case in 1988, the legality of the warrantless search-and-seizure of garbage left in public areas has been established [80].Because of that, for one more layer of security against data breaches, companies should keep their waste bins locked in private areas.
As an additional countermeasure to this approach, a company should, at the end of the data life cycle, carry out an adequate disposal of information.To accomplish that, Data Classification and Asset Disposal Policies should be implemented and publicized to raise employees' awareness.To aid suitable editing of these and other policies, several esteemed security organizations provide policy templates, such as the SANS Institute (sans.org/information-security-policy/accessed on 7 April 2024) and CIS (cisecurity.org/accessed on 7 April 2024).
Before disposal, the media must be sanitized-that is, have its data rendered inaccessible for a given level of the attacker effort, depending on the classification of the data.Proper media-sanitization techniques are presented by NIST SP 800-88 [81] for different media types.For paper documents, for example, the standard states that they must be shredded in pieces small enough that there is reasonable assurance that the data cannot be reconstructed in proportion to the data confidentiality.To further hinder a malicious reconstruction, sensitive documents may be mixed with public paper in the shredder input.Regarding the size of the shredded pieces for each classification level, the German standard DIN 66399 [82] provides some valuable guidelines, some of which are summarized in Table 10.
Table 10.German DIN 66399 paper shredding sizes according to the data sensitivity.

Classification Level
Maximum Piece Width/Area P-1 (least sensitive) 2000 mm 2 (particle area) or 12 mm (strip width) P-2 800 mm 2 (particle area) or 6 mm (strip width) P-3 320 mm 2 (particle area) or 2 mm (strip width) P-4 160 mm 2 (cross-cut particle area) P-5 30 mm 2 (cross-cut particle area) P-6 10 mm 2 (cross-cut particle area) P-7 (most sensitive) 5 mm 2 (cross-cut particle area) Similar disposal approaches should be deployed to digital devices, such as Hard Drives (HD), Solid-State Drives (SSD), flash drives, and CDs/DVDs.Despite physical destruction and shredding still being possible for these types of media, and it indeed being recommended for more sensitive cases, the nature of these devices allows for other erasure mechanisms, especially for the least sensitive data.
It is known that simply deleting files via the operating system is not an effective way to purge data, as data carving techniques can retrieve said files [83].Other techniques, such as zero filling, in which all data are overwritten with zeroes, are effective against commonly available data retrieval mechanisms, according to NIST SP 800-88.Additional filling rounds may be performed to increase security.
Specifically for magnetic HDs, the degaussing technique may be used.It consists of applying a magnetic field to the hard drive, which changes the magnetic patterns on the device, consequently destroying the data.
The degaussing approach will not be practical for SSDs, which are not magnetic.For this type of media, a secure way of dealing with data remanence is crypto-shredding, also named crypto-erasure.In this procedure, the data stored in the device are encrypted with a secure algorithm, and then the decryption key is discarded, rendering the data unrecoverable.
In addition to disposals, these sanitization techniques should also be applied when donating or selling the devices if the sensitivity of the data allows the transfer of the property of the media.

PHYS, PORT, STAT: Thefts and Inappropriate Accesses
This section primarily discusses physical security aspects that may be implemented at a company facility to prevent a data breach.We understand that the mitigation approaches related to thefts of the company assets in possession of an employee outside of the company's premises are embraced in Section 5.2.4.
The physical-security design in a company starts in the architectural-arrangement stage of the facility construction.Crime Prevention Through Environmental Design (CPTED) strategies may be employed during this phase.Through this approach, criminals are deterred and more easily detected by the physical layout of the space.
The main CPTED principles are natural surveillance, access control, territorial reinforcement, and maintenance [84].As an example, it is stated that fences should be at least 3 ft (about 1 m) high to deter casual trespassers and at least 8 ft (approximately 2.5 m) high to deter purposeful infiltrators [85].
Other physical controls should be implemented to prevent incidents, especially in more sensitive areas, such as data centers.Examples include the use of a keypad (preventive), guards (deterrent), security cameras (detective), and alarms (corrective).
Nonetheless, all these security measures will be rendered useless if the human factor is successfully explored.An adversary may, for example, covertly sneak through a door opened by authorized personnel, a practice known as tailgating, or they may convince someone to let them enter, for example, by saying that they forgot their badge and are in a hurry.The latter is a social engineering tactic known as piggybacking.An effective countermeasure to these intrusions is the use of a mantrap.
However, intruders do not always perpetrate physical incidents.Authorized guests, for example, may perform unauthorized activities, and, in that case, additional physical countermeasures must be put in place.
One possible gap is the direct observation of devices' screens and keyboards.In such cases, an adversary may obtain sensitive data such as passwords through shoulder surfing.To hinder this activity, it may be necessary to relocate the devices.
Another security measure regarding the employee's workstation is the implementation of a Clean-Desk Policy, which enforces that all desks within the company must be clear of objects and documents.After successfully implementing this policy, an intruder cannot steal a sensitive document from a worker's desk.
NIST SP 800-12 [86] Chapter 15 reviews other physical security practices.Physical safety, which aims to protect people's physical integrity, life, and health, is another relevant topic in this discussion.However, as these incidents do not usually result in data breaches, which are the focus of this work, we do not include them in the discussion.

DISC
Unintended disclosure may be classified as the result of an unintentional insider threat, either due to negligence or recklessness.
An effective Security Education, Training, and Awareness program may be capable of reducing the incidence of these cases and promoting compliance in an organization.
Education aims to equip IT personnel with security skills through methods like cyberattack simulations, targeting a high level of expertise.Training focuses on enhancing security knowledge among all employees through classes, for example.Awareness efforts aim to capture the attention of all employees regarding security concerns through mediums like banners, addressing a basic level of security understanding [87].
It may be observed that security awareness programs are helpful in mitigating risks associated with the general utilization of technological resources by the general user, such as credentials compromise and social engineering attacks, like phishing.Conversely, security training and education focus on the prevention of cyber incidents rooted in technical vulnerabilities, such as weaknesses originating from misconfiguration, and they should be directed to IT personnel.
Alyami et al. [88] assessed the critical factors for deploying a successful SETA program, based on a survey with 65 respondents.They produced a ranked list of essential factors of success.Gamification is also seen as a reasonable way of enhancing engagement in the program.
According to PCR (Table 2), the DISC type categorization includes publicly posted information sent to the wrong party.In addition to SETA, a two-person control may also reduce the likelihood of these disclosures.With this approach, two people must authorize an action before its execution.

Containment, Recovery, and Response
The affected company must study a response strategy after an attacker and a data breach have circumvented the security controls.From the technical point of view, the company must quickly contain the data leakage to minimize the potential damage and then identify and eradicate the components of the incident.NIST SP 800-61 [89] provides a more in-depth guide for computer security incident handling.
This NIST publication divides the incident response process into five steps: Preparation, which occurs before an incident and corresponds to preventive security measures; Detection and Analysis, in which the attack vector and TTPs are identified; Containment, Eradication, and Recovery, which comprises an initial restriction of the malicious activity and a subsequent cleanse of malicious artifacts (though keeping them for forensic analysis); followed by the Restoration of the systems' operation.Finally, post-incident activities include discussing and documenting the incident to understand it better and prevent similar future intrusions.
For a better comprehension of the causes of the incident and of eventual system modifications made by the intruder, forensic tools and techniques may be helpful.NIST SP 800-86 [90] provides guidelines for integrating forensic techniques into incident response, including data collection, examination, and reporting from different sources, such as files, operating systems, networks, and applications.When performing digital forensics, it is important to maintain a chain of custody and preserve the integrity of the evidence, not removing it.
Specifically, in the data leakage domain, ref. [91] proposed a data breach response methodology based on ISO 27035 [92] and NIST 800-61 [89].Their study emphasized the importance of automating this process, especially due to the short time required to notify a breach for legislative compliance.
Hillmann et al. [93] conducted 12 interviews with customers regarding their expectations regarding a data breach response.They concluded that expectations vary according to several factors, such as breach severity, data leakage type, and company sector.Hence, a company must adapt its response strategy to the specific scenario to maximize the chance of meeting its customers' expectations.
As a more general guide, the Federal Trade Commission (FTC) presents recommendations for an adequate data breach response (www.ftc.gov/business-guidance/resources/data-breach-response-guide-business accessed on 7 April 2024), such as assembling an incident response team, fixing vulnerabilities, and removing improperly posted information on the web.
For notification, especially for complying with the legislation mentioned in Section 5.1, the FTC mentions the importance of notifying law enforcement and affected businesses and individuals, specifying what happened, what information was stolen, how the attackers used the information, what remediation measures were taken, and how customers may contact the organization regarding the breach.They also provide a model letter for data breach notification.

Conclusions and Future Work
Data breaches are a growing threat to organizations of all sizes and industries.This indepth analysis of data breaches in publicly listed companies in the United States improves our comprehension of the evolving landscape of cybersecurity threats within the corporate sector.This research contains several notable findings, which provide valuable knowledge of the patterns and implications of data breaches in these organizations.
Our study discovered that the financial sector has emerged as the primary target for malicious actors.This emphasizes the critical need for deploying strong cybersecurity measures within the industry, as it continues to be a prime focus for cyberattacks.Understanding the vulnerabilities that make the financial sector susceptible to breaches is paramount for securing the sensitive data handled within these organizations.Thus, we have also shown the most common causes for breaches in this economic sector.
Not only in the financial sector, we identified that incidents related to portable devices (PORT) and to malicious outsiders (HACK) were the most prevalent types of breaches.This highlights the importance of organizations taking proactive countermeasures to protect their data and mitigate the risk of such incidents.
Several frameworks, standards, and laws have been discussed to achieve this protection and some vulnerability mitigation controls.Adopting these practices is fundamental for preventing PII exposure.By addressing the vulnerabilities and threats revealed in this research, organizations can better protect their sensitive data and minimize the potential financial and reputational damages associated with data breaches.However, they are not exhaustive nor definite, and companies should adopt an approach of continuous due diligence and due care.
By discussing compliance, prevention, impacts, and response to data breaches, this paper enhances the understanding of data-exposure patterns and advances the discussion on applicable strategies for reducing the probability of occurrence and consequent costs.Addressing cyber incidents from the combined perspective of regulatory aspects, implementing security controls, and response planning is fundamental for appropriately mitigating the related risks.
We propose, as future work, a similar approach to studying data breaches, adopting a global data breaches dataset, along with a discussion of data protection laws at the federal level for different countries and their compliance aspects.Furthermore, expanding the research scope to encompass publicly and privately held companies may provide a more comprehensive view of the data breach scenario.

Figure 1 .
Figure 1.Taxonomy of data breaches, in which the red boxes represent the fields explored in this work.Adapted from [29].

Figure 2 .
Figure 2. Geographical distribution of count of reported breaches by occurrence State.

Figure 3 .
Figure 3. Distribution of breach type per occurrence State.

Figure 6 .
Figure 6.The 10 biggest breaches in the dataset, and their types.

Figure 7 .
Figure 7. Accumulated number of records breached per announcement type.

Figure 8 .
Figure 8. Count of breaches by type.
(a) By type (b) By human factor causes Figure 9. Count of breaches per year.

Figure 10 .
Figure 10.Distribution of breach type per company sector.

Figure 11 .
Figure 11.Count of breaches by company sector per year.
(a) Citigroup Inc.(b) JPMorgan Chase & Co. (c) LinkedIn.comFigure 12.Historic stock prices (USD) of the two companies with the most incident occurrences (a,b), and the breach with the most records exposed (c).Vertical dashed lines represent a data breach event.

Figure 13 .
Figure 13.Breach notification statuses in U.S. States.The color map indicates the period in which the notification breach law took effect, and the pie charts indicate the distribution of breaches that occurred before (black) and after (green) the law took effect.In this Figure, the pie charts denote the number of breach incidents that occurred before (black) and after (green) the commencement of the law's effect.States without a pie chart either experienced no breaches or had not enacted notification laws within the dataset's time frame.California was the inaugural U.S. State to have a breach notification Act in effect, from 1 July 2003.The work of Coie[57] offers a comprehensive breakdown of the specifics of each State Act.It details critical aspects such as their definitions of data breaches and personal information, the timing and structure of breach notifications, their applicability, and other vital provisions that form the foundation of these State-level data breach notification laws.

Figure 14 .
Figure 14.US States that have enacted a data protection law.None of these acts were in effect in the dataset time range.

Figure 15 .
Figure 15.Frequency and average cost of initial attack vectors responsible for data breaches in 2023.Adapted from [58].

Table 2 .
[6]cription of the dataset fields studied in this work.The industry_sector field was created in this work.Adapted from[6].

Table 3 .
Dictionary of most-breached-companies names from their tickers.
-Figure 5. Breach types in the most-breached companies.

Table 4 .
Dictionary of companies with the biggest breaches, relating to their names, tickers, and breach date.

Table 5 .
Data protection standards and their applicability to the breaches in the dataset by type of data.

Table 6 .
State comprehensive data protection Acts and their applicability to the breaches in the dataset by type of data.None of these Acts were in effect in the dataset time range.