Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach

Bognár, László

doi:10.3390/jcp5030067

Open AccessArticle

Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach

by

László Bognár

Department of Information Technology, University of Dunaújváros, 2400 Dunaújváros, Hungary

J. Cybersecur. Priv. 2025, 5(3), 67; https://doi.org/10.3390/jcp5030067

Submission received: 27 July 2025 / Revised: 25 August 2025 / Accepted: 27 August 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Cybersecurity Risk Prediction, Assessment and Management)

Download

Browse Figures

Versions Notes

Abstract

This study presents a novel and interpretable, deployment-ready framework for predicting cybersecurity incidents through item-level behavioral, cognitive, and dispositional indicators. Based on survey data from 453 professionals across countries and sectors, we developed 72 logistic regression models across twelve self-reported incident outcomes—from account lockouts to full device compromise—within six analytically stratified layers (Education, IT, Hungary, UK, USA, and full sample). Drawing on five theoretically grounded domains—cybersecurity behavior, digital literacy, personality traits, risk rationalization, and work–life boundary blurring—our models preserve the full granularity of individual responses rather than relying on aggregated scores, offering rare transparency and interpretability for real-world applications. This approach reveals how stratified models, despite smaller sample sizes, often outperform general ones by capturing behavioral and contextual specificity. Moderately prevalent outcomes (e.g., suspicious logins, multiple mild incidents) yielded the most robust predictions, while rare-event models, though occasionally high in “Area Under the Receiver Operating Characteristic Curve” (AUC), suffered from overfitting under cross-validation. Beyond model construction, we introduce threshold calibration and fairness-aware integration of demographic variables, enabling ethically grounded deployment in diverse organizational contexts. By unifying theoretical depth, item-level precision, multilayer stratification, and operational guidance, this study establishes a scalable blueprint for human-centric cybersecurity. It bridges the gap between behavioral science and risk analytics, offering the tools and insights needed to detect, predict, and mitigate user-level threats in increasingly blurred digital environments.

Keywords:

human-centered cybersecurity; item-level modeling; cybersecurity behavior; risk rationalization; work–life boundary blurring; logistic regression; digital literacy; personality traits; cybersecurity risk prediction

1. Introduction

Cybersecurity risk refers to the potential for harm, loss, or disruption resulting from malicious digital activity. In both workplace and personal contexts, such risk can manifest as either objective harm—such as unauthorized access, data breaches, financial loss, or device malfunction—or as behavioral vulnerability, characterized by risky user practices that increase exposure to threats. Measuring objective cybersecurity incidents remains challenging in general populations, given the relatively low frequency and frequent invisibility of such attacks. However, survey-based self-reports provide a valuable window into the lived experiences of users, capturing a broad spectrum of events—from account lockouts and password resets to identity theft, financial fraud, and ransomware-related system failures. These events form the foundation for the present study, which distinguishes between six types of self-reported cybersecurity incidents to investigate their behavioral and psychological predictors. In doing so, we emphasize not just whether individuals have been victimized, but how everyday behaviors and dispositions systematically elevate their risk, especially as personal and professional digital boundaries become increasingly entangled.

The ongoing convergence of personal and professional digital domains has profoundly reshaped the cybersecurity landscape. As hybrid work arrangements, flexible schedules, and Bring-Your-Own-Device (BYOD) policies become the norm, individuals now navigate increasingly digitally entangled environments, where traditional boundaries between work and home have eroded [1,2]. This phenomenon—commonly referred to as boundary collapse—has introduced new layers of vulnerability that challenge the effectiveness of conventional security measures [3,4]. Employees, particularly knowledge workers, are now central actors in organizational risk exposure, as their personal digital behaviors—once peripheral—are directly implicated in enterprise-level security outcomes [5,6]. These developments have prompted calls for a more human-centered understanding of cybersecurity, one that accounts for not only technical infrastructure but also individual behaviors, cognitive patterns, and contextual pressures [7,8].

Motivated by this shift, the present study initially focused on Work–Life Blurring (WLB) as a theoretically salient antecedent of cybersecurity risk. WLB has been linked to digital overload, psychological detachment failure, and the diffusion of personal accountability across work and non-work roles [9,10,11]. From a conceptual standpoint, WLB offered a compelling lens for anticipating cybersecurity incidents: as workers fluidly alternate between private and professional use of devices, the likelihood of behavioral lapses and policy violations was presumed to increase.

Building on these insights, the present study aims to predict cybersecurity incidents using a five-domain conceptual framework encompassing Work–Life Blurring, Risk Rationalization, Cybersecurity Behavior, Digital Literacy, and Personality. Each domain captures a distinct facet of individual vulnerability, ranging from contextual role diffusion to cognitive justifications, practical skills, and trait-level predispositions [11,12,13]. The study specifically distinguishes between mild incidents (e.g., account lockouts, password resets) and serious incidents (e.g., financial loss, impersonation, or device compromise), reflecting the real-world variability in outcome severity [14,15].

In designing the study, the survey items for each of the five domains were initially intended to represent a distinct underlying construct, with the expectation that this structure would be supported by Exploratory and Confirmatory Factor Analysis. However, preliminary analyses revealed that not all five domains show sufficient unidimensionality to support latent variable modeling [8,16]. Moreover, predictive models using domain-level factor scores consistently underperformed compared to those using individual survey items, as measured by area under the curve (AUC) in logistic regression. While factor scores are valuable for construct validation, they tend to average out item-specific variance that may carry a unique predictive signal. In contrast, item-level models preserve this granular behavioral information, which is essential for accurate incident forecasting and actionable risk profiling. Based on these findings, we adopted an item-level logistic regression approach, prioritizing interpretability and maximizing predictive accuracy. This method also enables survey items to serve as direct, transparent inputs for cybersecurity training and self-assessment tools [4,17].

This study contributes to cybersecurity research in three key ways. First, it introduces a theoretically grounded and empirically optimized framework for predicting cybersecurity incident risk through five complementary domains—spanning contextual, behavioral, cognitive, and dispositional dimensions. Second, it advances human-centered modeling techniques by demonstrating the utility of interpretable, item-level models in hybrid digital ecosystems [17]. Third, it provides actionable insights for policy and training by modeling predictors across blurred work–life environments and differentiating incident severity. These findings support targeted interventions—from user-level self-assessment tools to behavior-specific workplace protocols—positioning predictive behavioral modeling as a cornerstone of next-generation cybersecurity in digitally integrated, boundary-dissolved environments. Beyond these contributions, the study also pursues two overarching goals: to validate the methodological approach of item-level statistical modeling itself and to apply this approach to determine which of the five domains most strongly predict cybersecurity incidents.

The remainder of the paper is structured as follows. Section 2 reviews the five behavioral, cognitive, and dispositional domains and the typology of cybersecurity incidents, situating them within existing research. Section 3 details the methodology, including data collection, survey design, and the modeling strategy. Section 4 presents the empirical results, beginning with item reliability and factor analysis, followed by full-sample and stratified models, and concluding with threshold optimization. Section 5 discusses the implications of these findings, addressing methodological contributions, domain-specific insights, and limitations, while Section 6 concludes the paper with recommendations for research and practice.

2. Literature Review

While cybersecurity behavior is shaped by a wide range of organizational, cultural, and infrastructural influences, our focus is restricted to individual-level predictors. In line with the study’s goal—to build interpretable models from user-reported data—we exclude factors such as organizational security culture, policy enforcement, and community-level deterrents, which, though important [8,16], fall outside the scope of our end-user-centered framework.

This section is organized into three parts.

Section 2.1 introduces the five thematic domains that served as predictors in our modeling framework: Work–Life Blurring, Risk Rationalization, Cybersecurity Behavior, Digital Literacy, and Personality Traits. These domains were selected for their conceptual distinctiveness, empirical grounding, and applicability in behaviorally driven cybersecurity research. Each domain reflects a specific pathway through which users may become vulnerable to digital threats, and their operationalization in the survey enabled interpretable, item-level analysis.

Section 2.2 turns to the outcome side of our model: the typology of cybersecurity incidents reported by participants. Drawing on prior work in digital behavior research, we classify incidents into mild and serious categories to capture the full spectrum of consequences. We also introduce composite outcome variables designed to improve robustness in classification.

Finally, Section 2.3 reviews prior approaches to predictive modeling in cybersecurity, focusing on how user-level vulnerability has been estimated using survey-based scoring tools, real-time behavioral analytics, and hybrid systems that combine static and dynamic inputs. Despite growing calls for human-centered cybersecurity, surprisingly little is known about the actual predictive power of behavioral and psychological variables in forecasting real-world incidents. Much of the existing literature offers conceptual frameworks or retrospective accounts but stops short of rigorously testing whether behavioral indicators meaningfully outperform—or even complement—technical or organizational predictors. As such, the empirical strength of the behavioral perspective remains an open question. This study seeks to address that gap by evaluating the predictive contributions of multiple behavioral, cognitive, and dispositional domains in explaining incident occurrence. This context provides a methodological foundation for our use of logistic regression and situates our approach within ongoing efforts to operationalize human-centered cybersecurity through scalable and adaptive predictive tools.

Together, these three sections provide the conceptual and empirical basis for the predictive models presented in the remainder of the paper.

2.1. Behavioral-Cognitive Domains of Cyber Risk

This section presents the five-domain conceptual framework that underpins our predictive modeling of cybersecurity incident risk.

Although Work–Life Blurring (WLB) emerged in this study as a weaker direct predictor of cybersecurity incidents, its theoretical relevance as a contextual amplifier of digital risk remains compelling. WLB refers to the erosion of traditional boundaries between professional and personal domains, often driven by ubiquitous connectivity, mobile technologies, and flexible work arrangements. Drawing from boundary theory and segmentation–integration models [10,18,19], WLB captures individuals’ capacity—or lack thereof—to maintain psychological detachment and role clarity in hybrid environments. Prior studies have linked WLB to increased technostress, digital fatigue, and self-regulatory depletion, all of which are hypothesized precursors to risky digital behavior [9,20,21]. Recent conceptual reviews employing the Antecedents–Decisions–Outcomes (ADO) framework further underscore the multidimensionality of WLB, highlighting how various antecedents—including organizational expectations and digital norms—interact with individual coping strategies and behavioral outcomes [22]. Other empirical evidence suggests that WLB may not directly cause security lapses but instead creates gray zones in which secure and insecure behaviors coexist without clear norms or oversight [2,11].

In operational terms, our survey captured WLB through items measuring difficulty in detachment from work-related communications, frequency of cross-domain device sharing, and perceived expectations of constant availability.

Risk Rationalization (RR) refers to cognitive mechanisms by which individuals justify their security-compromising actions. This concept extends Bandura’s (1999) theory [23] of moral disengagement and is aligned with neutralization theory from criminology, as applied in information systems security contexts [13,24]. Users may, for instance, rationalize bypassing a security step as a necessity due to work urgency or consider it harmless if no immediate consequence follows [25,26]. Recent studies have reframed such justifications as a form of “cybersecurity hygiene discounting,” wherein individuals diminish the perceived importance of routine protective behaviors—such as software updates or secure password use—based on convenience or perceived irrelevance [27]. This reframing builds on and expands traditional neutralization theory by emphasizing not only post hoc rationalizations but also proactive cognitive shortcuts that devalue protective norms. Empirical work further confirms that specific rationalization techniques, such as denial of responsibility and metaphor of the ledger, significantly predict misuse of organizational information systems, even when policy awareness and deterrence are present [28]. These developments highlight RR not merely as a residual factor but as a central cognitive mechanism actively shaping security compliance and noncompliance in workplace settings.

Our survey included RR items reflecting common justifications: blaming inconvenience, shifting responsibility, or citing lack of clarity in policy requirements.

Cybersecurity Behaviour (CB) encompasses tangible user actions that either mitigate or amplify digital risk exposure. Foundational research has linked insecure practices—such as habitual password reuse, neglecting software updates, and sharing personal devices—to elevated organizational vulnerabilities [29,30,31]. Parallel findings have also emerged in higher education, where survey-based models have revealed distinct behavioral dimensions shaping students’ cybersecurity awareness and practices [32]. Conversely, secure behaviors like enabling multi-factor authentication (MFA) and regularly updating credentials have been shown to significantly reduce breach likelihood [33,34]. Later empirical work continues to support these findings. [35], in a cross-national survey, found that despite relatively high awareness, individuals often failed to adopt protective behaviors such as using strong and unique passwords or enabling MFA. Recent studies reinforce the continued importance of cybersecurity behaviors. A 2025 systematic review MFA in digital payment systems found that, despite its clear protective benefits, MFA remains underutilized due to usability challenges and misalignment with NIST standards [36]. In parallel, usage trends show many small and medium-sized businesses still lagging—65% reported not implementing MFA, often citing cost and lack of awareness. Regarding password hygiene, [37] reported that nearly half of observed user logins are compromised due to password reuse. Complementary data from JumpCloud [38] revealed that up to 30% of organizational breaches involve password sharing, reuse, or phishing—highlighting how user behavior continues to drive risk. These findings underscore the persistent relevance of secure digital habits and the necessity of tracking them at the user level for effective cybersecurity risk assessment.

Our behavioral scale was designed to reflect these ongoing risks by including both negatively and positively framed items that capture users’ actual digital practices—such as MFA usage and password management—rather than relying solely on self-perceived awareness.

Digital Literacy (DL) incorporates both general digital skills and cybersecurity-specific awareness. Foundational scholarship by [39] first described digital literacy as the capacity to “…understand and use information from a variety of digital sources,” establishing early recognition of its multifaceted nature. Subsequent work by [40] expanded this to include both access to technology and the skills to use it effectively. Ref. [41] further refined the concept, emphasizing users’ navigational competence and evaluative judgment online. Ref. [42] showed that knowing how the internet works, how companies use data, and what privacy policies mean helps people take better control of their online privacy—highlighting why digital literacy matters for cybersecurity behavior. Later, Ref. [43] introduced a layered model of digital literacy encompassing operational, formal, and critical dimensions—areas crucial for interpreting security-related signals. Recent studies reinforce DL’s role in cybersecurity resilience. A qualitative study found that low digital literacy increases the vulnerability of both individuals and organizations to cyberattacks [44]. In Vietnam, Ref. [45] reported that higher digital literacy predicts better personal information security behaviors, indicating that stronger technical and policy awareness leads to safer online actions. Broader evidence also shows that people with higher digital literacy are significantly better at spotting phishing scams and other online threats [46]. These findings underscore that DL is not just theoretical, they confirm it is a measurable and trainable asset essential for reducing cybersecurity risk.

Our DL scale thus captures confidence and competence in managing digital risk—evaluating email legitimacy, interpreting URLs, and identifying secure connections. Items also reflect functional digital fluency, procedural autonomy, and emotional self-regulation, including technostress.

Personality Traits (P) have long been recognized as foundational predictors of individual behavior across domains, including cybersecurity. The Big Five model [47] remains the most widely adopted framework in this context, encompassing five dimensions: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Several other traits frequently investigated in cybersecurity research—such as Impulsivity, Vigilance, or Trust Propensity—can be understood as facets or behavioral expressions rooted in these five broader domains. Impulsivity is often conceptualized as a facet of low Conscientiousness (e.g., lack of discipline, poor self-control) or high Neuroticism (e.g., emotional reactivity, stress-driven decisions). Its inclusion in cybersecurity research reflects its strong predictive value for risky behavior [12,47,48,49]. Research on personality and privacy suggests that individuals high in Agreeableness and Openness to Experience often show heightened concern for personal data and privacy-preserving behavior. For example, they tend to value ethical norms and communal responsibility, which motivates cautious sharing and adherence to privacy settings [50,51,52]. Conversely, empirical findings in online behavior indicate that higher Openness is sometimes associated with more extensive information disclosure—such as increased posting activity and less restrictive privacy settings on social platforms—suggesting a complex, context-dependent relationship [53]. So, while the theoretical expectation is that Agreeableness and Openness promote privacy-conscious behavior, actual outcomes may vary depending on context and manifestation of the trait.

Our measurement of personality traits reflects a behaviorally grounded adaptation of the Big Five. Openness was assessed through items reflecting curiosity toward new technologies and willingness to explore digital tools. Conscientiousness captured attention to digital organization and task follow-through. Extraversion was reflected in social media participation and online community engagement, while Agreeableness focused on conflict aversion and valuing others’ digital privacy. Neuroticism was indexed by anxiety about unpredictability and loss of digital control.

2.2. Self-Reported Cybersecurity Incidents: Typology and Prior Research

To model cybersecurity risk effectively, it is essential not only to identify predictive factors but also to define meaningful outcome variables. In our study, we classify six self-reported incident types into two severity categories. Mild incidents (INC1–INC3) include account lockouts, forced password resets, or unauthorized login notifications, while Serious incidents (INC4–INC6) refer to financial loss, impersonation or total device compromise. This two-tiered typology reflects both user-level consequences and broader systemic risks. It aligns with existing empirical categorizations that distinguish digital harms by severity, user awareness, and operational disruption [15,54,55]. Although self-reports are susceptible to recall bias and semantic variability, structured survey instruments grounded in concrete behavioral language can yield valid insights into cybersecurity exposure—especially when log data are unavailable [54]. These instruments are particularly effective when paired with typological frameworks that make distinctions users can easily understand and recall. Recent conceptual advances also support the importance of severity-based models. [56] proposed the Cybersecurity Incident Severity Scale (CISS), a multidimensional framework that formalizes severity by integrating technical, operational, and human-centric impact factors. Drawing on analogies from emergency response and public health (e.g., Richter and NIH stroke scales), the CISS model allows researchers and practitioners to assign meaningful weight to diverse outcomes, including disruptions to workflow, psychological distress, or reputational damage. While our study does not adopt CISS scoring directly, it shares the foundational aim of quantifying cyber harm beyond a single binary breach/no-breach outcome.

In our survey, participants responded to six binary (yes/no) items indicating whether they had experienced specific cybersecurity incidents in the past. In addition to analyzing these individual outcomes, we also constructed composite variables—for example AtLeastOneMild and AtLeastOneSerious, to improve statistical robustness and capture the breadth of user-reported experiences. This approach allows us to map behavioral predictors to cybersecurity consequences with greater granularity and practical relevance.

2.3. Predictive Modeling in Cybersecurity Research

Historically, much cybersecurity research has focused on predicting behavioral intentions or compliance-related patterns—such as willingness to follow security policies, password reuse, or responses to phishing simulations [12,57]. While such models are valuable for understanding user psychology and identifying risk-prone behavior, they are often limited in practical application, as their response variables typically reflect proxy measures—like self-reported intentions or isolated behavioral tendencies—rather than actual, user-reported security incidents. The challenge, then, is how to bridge the gap between theoretically grounded survey constructs and the operational prediction of real-world security events that carry tangible consequences for users and organizations alike.

Our study addresses this gap by using actual self-reported incidents as outcome variables, including both mild and serious events. This approach is conceptually consistent with the call for human-centered modeling in cybersecurity [7,8] yet methodologically optimized for operational application: each predictor reflects a directly observable, interpretable, and potentially modifiable aspect of individual digital behavior.

While some may argue that survey-based models lack the immediacy or behavioral granularity of telemetry-based systems, recent work has demonstrated that well-constructed surveys can yield meaningful and predictive insights into incident susceptibility. Notably, [58] developed multilevel models using survey data from over 27,000 EU citizens and 9000 enterprise users, linking individual characteristics—such as prior incident experience, digital confidence, and national cultural context—to actual phishing simulation outcomes and configuration behaviors. Although the study focused on behavioral outcomes rather than incident labels per se, its integration of cultural, organizational, and psychological predictors showed that survey-derived data can approximate real-world risk conditions when designed thoughtfully.

Our approach builds on and extends this logic. Rather than predicting behavioral proxy outcomes (e.g., clicks in phishing simulations), we directly model self-reported security incidents—a more ecologically valid and practically relevant endpoint. The use of logistic regression on item-level survey data offers two core advantages. First, it circumvents the need for strict unidimensionality, which, as our factor analysis showed, was lacking in domains based on diverse behavioral indicators [16]. Second, it allows each survey item to serve as a transparent predictor, which can be immediately interpreted, acted upon, and potentially integrated into personalized risk assessments or training interventions.

In operational contexts, survey-based predictive models can be transformed into practical scoring tools. These scores—whether expressed numerically or categorized into levels such as low, medium, or high—enable organizations to identify users with elevated susceptibility to cybersecurity incidents based solely on self-reported inputs. When integrated into onboarding procedures, periodic training, or targeted assessments, such tools allow for scalable, privacy-preserving interventions, including just-in-time prompts, personalized awareness campaigns, or access restrictions. Importantly, this approach maintains interpretability and transparency, distinguishing it from opaque machine learning systems often embedded in security infrastructure. However, survey-based models may lack temporal sensitivity, prompting researchers and practitioners to explore hybrid approaches that combine static user characteristics with real-time telemetry. One increasingly adopted solution is User and Entity Behaviour Analytics (UEBA), which enhances detection by linking baseline profiles with dynamic activity patterns such as login behavior, device usage, or authentication anomalies [59]. In such frameworks, survey-based scores can function as front-end risk indicators or baseline inputs that are subsequently refined through behavioral monitoring [7,60]. These developments reflect a growing consensus that predictive human-risk modeling is most effective when embedded within adaptive, multi-channel security architectures that integrate cognitive, dispositional, and behavioral dimensions in real time.

3. Methodology

This section outlines the study’s methodological framework, covering data collection procedures, survey design, outcome definitions, and the modeling strategy. We also detail our approach to model evaluation, including cross-validation, overfitting checks, and threshold optimization for selected models.

3.1. Participants and Procedure

Data Collection Challenges in Cybersecurity Research

Collecting reliable data on cybersecurity behavior presents unique methodological and ethical challenges. Participants may underreport risky actions or overstate their awareness due to social desirability bias, forgetfulness, or lack of self-awareness. Experimental designs, such as phishing simulations, can yield more realistic insights but often raise ethical concerns, particularly when deception is involved. As a result, large-scale self-reported surveys—despite their limitations—remain one of the most practical and ethically sound approaches for investigating individual risk patterns across diverse populations. In this study, we opted for an indirect measurement strategy using self-report survey items, balancing feasibility, ethical standards, and the need for behavioral detail.

Two-Phase Recruitment Strategy

To ensure both depth and diversity in the responses, data collection occurred in two phases. The first phase involved convenience sampling within the researcher’s own academic and professional network, including friends, colleagues, and some university students in Hungary. This approach yielded 197 cleaned responses, providing an initial foundation of cybersecurity-related experiences and behavioral profiles. However, early analyses revealed demographic and occupational homogeneity in the initial sample, prompting a second phase of data collection via the Prolific platform. Prolific is an established online research participant recruitment platform that provides access to diverse, pre-screened participant pools [61]. Data from Prolific were collected in five sequential rounds. After each round, the sample was reviewed to assess underrepresented sectors, countries, or demographic layers. This adaptive, layer-filling approach allowed for a more balanced representation of professional domains, national contexts, age groups, and gender identities, improving the generalizability of the findings.

In the subsequent statistical analyses, the Hungarian data were retained in the full-sample models despite their relative homogeneity, as excluding them would have reduced statistical power and removed valuable behavioral variance. At the same time, we also conducted parallel analyses on the dataset without Hungarian participants to examine how the overrepresentation of this subgroup may have influenced overall results. The full-sample analysis was therefore used to test the framework’s predictive validity on the largest available dataset, while the stratified and Hungary-excluded models were employed to account for demographic skewness and assess subgroup-specific generalizability.

Phase 1: Pilot Sample (N = 197)

After cleaning, this sample included 197 valid responses, largely from Hungary (73%) and Romania (18%). Participants were fairly evenly split by gender (54% female, 46% male) and represented a wide age range, skewed toward the 35–54 age group. Educational attainment ranged from high school (34%) to doctoral degrees (12%), and the most common employment sectors were Education (26%), IT/Technology (23%), and Finance/Business (10%).

Phase 2: Prolific Sample (5 Rounds, Total N = 256 Cleaned)

To diversify the sample across countries, sectors, and demographic strata, data collection was conducted in five adaptive rounds on Prolific.

Round 1 (N = 6 cleaned): Trial Deployment

This round served as a technical trial to confirm instrument compatibility with the Prolific platform. Basic demographic filters were applied.

Round 2 (N = 130 cleaned): Main Data Collection Launch

This round aimed to expand beyond Central-Eastern Europe by targeting respondents from six Western countries (the UK, USA, Ireland, Germany, Austria, Netherlands). Filters ensured participants were actively employed (full-time or part-time) and used digital devices for at least 25% of their work tasks. Education levels were diverse, spanning from secondary to graduate degrees. However, sector information was not yet included, limiting the interpretation of occupational vulnerability.

Round 3 (N = 40 cleaned): Country and Education Balancing

This round refined geographic targeting to four underrepresented countries from Round 2 (Ireland, Germany, Austria, Netherlands) while maintaining the same work and education filters. The goal was to address geographic imbalances and add mid-range education categories (technical/community college).

Round 4 (N = 40 cleaned): Occupational Layer Expansion (STEM/IT)

With core demographic balance achieved, Round 4 focused on occupational diversification by explicitly recruiting participants from STEM and IT-related sectors (Information Technology; Science, Technology, Engineering & Mathematics). Country coverage was expanded to include Australia and Sweden, further enhancing geographic diversity. All key filters were applied: digital work engagement, education, and exclusion of earlier participants.

Round 5 (N = 40 cleaned): Sectoral Balancing (Education)

The final round aimed to balance occupational representation by specifically targeting professionals in Education & Training—a group underrepresented in the Prolific dataset relative to the pilot. Filters mirrored those of Round 4, ensuring consistent standards. Duplicate participants from prior rounds were excluded to prevent overlap and contamination.

The diversity of the final sample (N = 453) is summarized in Table 1.

While the resulting dataset is non-representative in a statistical sense, it reflects a diversified convenience sample with improved balance across gender, age, education, sector, and geographic region. Such a design is well-suited for the study’s primary aim: to build and internally validate binary logistic regression models that predict self-reported cybersecurity incidents. Given the behavioral focus and the predictive (rather than inferential) goals of the analysis, the emphasis was placed on sampling diversity, methodological transparency, and model performance metrics (e.g., AUC, calibration, and cross-validation), in line with best practices for exploratory modeling in applied behavioral research. The dataset’s heterogeneity also enables comparative analysis across subgroups, making it possible to evaluate whether predictive models built within specific demographic or occupational layers perform better or worse than those trained on the full sample—thereby shedding light on the contextual robustness and generalizability of model-based risk profiling.

3.2. Survey Design and Item Construction

The research was initially aimed at demonstrating that WLB is a strong and distinctive theoretical domain for predicting cybersecurity incident risk. However, exploratory analyses of responses from a preliminary survey indicated that WLB alone lacked sufficient predictive strength to account for variance in incident occurrence. Instead, other theoretical domains—such as specific cybersecurity behaviors, cognitive rationalizations, and personality traits—emerged as stronger predictors of incidents.

This empirical redirection aligns with a growing body of research emphasizing individual-level behavioral and dispositional risk factors over environmental constructs. For example, [14] demonstrated that indicators such as attention and diligence, resilience, and a trusting mindset better predicted susceptibility to phishing attempts than contextual dimensions like WLB. Similarly, prior studies have shown that traits such as impulsiveness, low conscientiousness, and high general trust are reliably associated with poor cybersecurity practices [12,48,52]. The use of cognitive rationalizations—such as justifying unsafe behavior due to time pressure or convenience—also reflects known mechanisms of moral disengagement and policy violation [25,27]. These findings suggest that while WLB may function as a situational amplifier, it is the behavioral choices, cognitive framing, and personality-linked dispositions of users that most directly shape the likelihood of cybersecurity incidents.

Building on these, the survey was designed to capture both individual-level predictors of cybersecurity incidents and the incidents themselves. Predictors were organized into five theoretical domains—Work–Life Blurring (WLB), Risk Rationalization (RR), Cybersecurity Behavior (CB), Digital Literacy (DL), and Personality Traits (P)—each represented by multiple item-level questions. The outcome variables included six self-reported incident items (INC1–INC6), along with composite indicators reflecting the occurrence of at least one, two, or all three mild or serious incidents. A complete list of all items, grouped by domain and item ID, is provided in Appendix A Table A1.

3.2.1. Item Development and Theoretical Alignment

Items were designed to reflect distinct behavioral, cognitive, and dispositional factors associated with digital risk. Most were phrased as first-person statements grounded in everyday digital experiences, increasing interpretability and user relevance. Each domain included at least one reverse-coded item to help reduce acquiescence bias and support internal consistency evaluation.

WLB items reflect the degree of digital boundary overlap between personal and professional life (e.g., WLB1, “I use my work and personal accounts interchangeably throughout the day”).

RR items assess rationalizations for risky digital behavior under pressure or social influence (e.g., RR5, “Time pressure makes me more likely to overlook security procedures”).

CB items include both protective behaviors (e.g., CB1, “I use two-factor authentication”) and known vulnerabilities (e.g., CB2_R, “I use the same password on multiple sites”).

DL items capture digital skills and self-efficacy (e.g., DL2, “I am confident in spotting suspicious links in emails”).

P items reflect behaviorally grounded indicators of the Big Five personality traits: Openness to Experience (P1, P2), Conscientiousness (P3, P4), Extraversion (P5, P6), Agreeableness (P7, P8), and Neuroticism (P9, P10).

Theoretical domains were derived from prior empirical studies and cybersecurity behavior models, and items were crafted to span both attitudinal and behavioral aspects of digital risk. Items were originally designed to represent distinct theoretical domains; however, item-level predictors were ultimately used in modeling due to the multidimensional nature of several domains, as confirmed by factor analytic results discussed later in the paper.

3.2.2. Demographic and Work-Style Variables

The survey also included demographic and contextual items: age group (D1), gender (D2), education level (D3), field of work (D4), and country of residence (D5), along with job type (JR1), remote work possibility (JR2), and use of personal devices for work (JR3). These variables were not used to define analytic layers or stratify the data but were retained for exploratory and post hoc inclusion in the best-performing models.

3.2.3. Incident Outcome Variables

The six incident items (INC1–INC6) captured self-reported experiences with cybersecurity problems, ranging from general digital disruptions to serious events such as financial loss or device failure. Each of these six items was treated as a separate binary response variable (Yes vs. No), allowing for incident-specific modeling. In addition, the items were grouped into two severity-based composite categories:

Mild incidents: INC1–INC3 (e.g., account lockout, suspicious login alerts, password resets)
Serious incidents: INC4–INC6 (e.g., financial loss, impersonation, ransomware/device failure)

For each severity group, three binary composite variables were created to capture cumulative incident exposure:

AtLeastOneMild: Respondent reported at least one mild incident (INC1, INC2, or INC3),
AtLeastTwoMild: Respondent reported two or more mild incidents,
AllThreeMild: Respondent reported all three mild incidents,
AtLeastOneSerious: Respondent reported at least one serious incident (INC4, INC5, or INC6),
AtLeastTwoSerious: Respondent reported two or more serious incidents,
AllThreeSerious: Respondent reported all three serious incidents.

Altogether, these 12 binary outcomes served as dependent variables in the logistic regression models. This structure enabled the analysis of both specific and aggregated cybersecurity risk experiences across varying levels of severity.

Responses of “Not sure/Don’t know” were combined with the “No” category to preserve sample size, and because lack of certainty does not constitute a confirmed event. This operationalization ensures consistency across outcomes but may lead to conservative estimates of incident prevalence if some uncertain cases reflect unrecognized events.

3.3. Predictive Modeling Strategy

To ensure both predictive utility and practical relevance, we adopted a layered modeling strategy only for subgroups with real-world operational significance—specifically, participants working in Education, IT/Technology, and those from countries with sufficient sample size (Hungary, UK, and USA). These layers correspond to contexts where organizational or institutional interventions are actionable. In contrast, individual characteristics such as gender, age, and remote work status were not used to split the data or train separate models. Instead, these variables were retained for a later analytic phase, where their inclusion or interaction effects were tested within some illustrative models to explore explanatory enhancement without fragmenting the modeling pipeline.

As described in Section 1 and Section 2.3, item-level logistic regression was selected due to insufficient unidimensionality in several domains and the superior predictive performance of item-based models compared to domain-level factor scores. Although each model was estimated using stepwise logistic regression with item-level predictors, we intentionally refrained from reporting only statistically significant predictors in the model goodness tables. In predictive modeling, significance is not the primary criterion; rather, model performance metrics such as 10-fold AUC and 10-fold deviance R² take precedence. Due to multicollinearity and redundancy within theoretical domains (e.g., multiple items capturing the same behavioral construct), stepwise procedures may exclude some variables even if they carry predictive value in alternate combinations. Listing only the retained variables could misleadingly suggest their unique causal importance, which is not the objective in prediction-focused modeling. Instead, we summarize model composition at the level of contributing theoretical domains, highlighting whether predictors from areas such as risk rationalization or cybersecurity behavior were selected, without overinterpreting individual item inclusion.

Additionally, we limited the inclusion of demographic variables—such as gender, age, remote work, and device sharing—to post hoc analyses in all-data sample or in large-sample layers (e.g., UK, USA, Hungary). This decision reflects both methodological and practical constraints: including such categorical variables in smaller subgroups can lead to overfitting, unstable estimates, or complete separation in logistic regression. By restricting their inclusion to robust models and treating them as explanatory enrichments rather than primary segmentation tools, we aimed to balance model complexity, generalizability, and fairness.

To ensure interpretability and reliability, we applied a strict model categorization protocol based on five criteria: Minimum event count, 10-fold AUC, 10-fold deviance R², AUC drop, R² drop ratio.

The minimum event count rule ensures statistical stability by requiring at least 10 cases and 10 non-cases per model; models below this threshold are automatically flagged as weak due to estimation instability.

The 10-fold AUC evaluates how well a model distinguishes between cases and non-cases across repeated cross-validation folds, providing a robust measure of discriminatory performance. The 10-fold deviance R² captures how much variance the model explains in the outcome variable under cross-validation, serving as a goodness-of-fit indicator. The AUC drop, calculated as the difference between the in-sample and 10-fold AUC, reflects potential overfitting; larger drops suggest that model performance does not generalize well to unseen data. Similarly, the R² drop ratio, which compares in-sample and cross-validated deviance R², quantifies overfitting in model fit. We are aware that other calibration metrics such as the Brier score [62] could be applied. In this study, however, we focused on discrimination (AUC) and explanatory power (deviance R²), as these provide robust and comparable indicators across stratified logistic regression models with varying prevalence rates.

Models were classified as Good, Moderate, or Weak based on threshold combinations (see Table 2). In addition to absolute performance, the classification protocol incorporated overfitting diagnostics: models with extreme discrepancies between in-sample and cross-validated performance (e.g., large AUC or R² drops) or insufficient class balance (fewer than 10 events or non-events) were automatically downgraded to “Weak”, regardless of their nominal AUC or deviance R² values.

Once the best-performing models were identified we proceeded to determine the optimal classification probability thresholds for some of the models as illustrative examples. While 10-fold AUC is effective for comparing models based on their ability to discriminate between outcomes across all possible thresholds, it does not specify which threshold yields the most useful binary classification for real-world decision-making. In practical applications such as cybersecurity risk detection, the threshold determines how sensitive or specific a model will be—affecting the trade-off between false positives and false negatives. For this reason, we optimized thresholds only after selecting the final models, using criteria tailored to our analytic goals (e.g., balancing sensitivity and specificity).

4. Results

This section presents the empirical findings of the study in four parts. First, we assess the internal consistency and dimensionality of items assigned to five theoretical domains using Cronbach’s alpha and exploratory factor analysis (EFA). These psychometric analyses inform our decision to model predictors at the item level rather than by aggregated domain scores. Second, we report the performance of logistic regression models estimated on the full dataset across twelve cybersecurity incident outcomes. Third, we evaluate layered models stratified by sector (Education, IT) and country (Hungary, UK, USA), comparing their performance against the corresponding full-sample models to assess potential gains in fit and generalizability. Finally, we conduct a post hoc evaluation by (1) incorporating selected individual-level demographic and contextual variables (e.g., gender, age, remote work status) into a subset of models chosen for their illustrative value, and (2) analyzing optimal probability thresholds for binary classification in selected models to explore the practical implications of deployment and the potential risk of overfitting, especially in rare-event outcomes.

4.1. Internal Consistency and Dimensionality of Domain Items

Before fitting predictive models, we assessed the reliability and latent structure of survey items grouped into five theoretically defined domains: Work–Life Blurring (WLB), Risk Rationalization (RR), Digital Literacy (DL), Cybersecurity Behavior (CB), and Personality (P). Internal consistency was evaluated using Cronbach’s alpha, and latent dimensionality was assessed through EFA with Varimax rotation. The number of factors retained for each domain was determined using the Kaiser criterion, whereby only factors with eigenvalues greater than 1 were extracted. Table 3 summarizes the reliability coefficients, number of extracted factors, and brief remarks for each domain.

Although internal consistency was acceptable for WLB, RR, and DL (α > 0.70), the Cybersecurity Behavior and Personality domains showed lower alpha values, reflecting the conceptual and behavioral diversity of their items. Below we try to interpret the dimensional structure found in four of the five domains.

Work–Life Blurring (WLB): Two Latent Dimensions

The WLB domain demonstrated high internal consistency (α = 0.8292), indicating strong overall cohesion. However, EFA revealed a two-factor structure:

Factor 1 reflected mental and temporal interference, including work encroaching on personal time or space, such as WLB3 (work tasks interrupt personal time), WLB6_R (difficulty maintaining separation), WLB8 (checking work emails during personal time), WLB10 (mental detachment difficulties), and WLB4 (blended digital identity).
Factor 2 captured platform and device overlap, including the use of shared tools and accounts across work and personal domains, as in WLB2 (checking personal accounts on work devices), WLB5 (using work platforms for personal tasks), WLB7 (shared platforms), WLB9 (shared physical space), and WLB1 (interchangeable accounts).

These findings suggest that work–life blurring spans both cognitive spillover and practical integration, which may independently contribute to digital risk.

Risk Rationalization (RR): Conceptual Refinement Improves Structure

With a Cronbach’s alpha of 0.7224, the RR domain initially showed acceptable reliability. However, EFA identified one item, RR7_R, (“I believe it is my responsibility to recognize serious threats before relying on IT”), as loading strongly on a separate factor. This item reflects normative commitment rather than rationalization and runs counter to the self-justifying tone of the other items.

Once RR7_R was excluded, the remaining items clustered cleanly onto a single component with a Cronbach’s alpha of 0.7589, supporting their use as a unidimensional measure of pragmatic justifications for neglecting security practices.

Following its exclusion based on conceptual divergence and factor loadings, RR7_R was removed from all subsequent analyses, including item-level logistic regressions. Therefore, only the remaining RR items contributed to the predictive models presented in Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9.

Cybersecurity Behavior (CB): Behavioral Diversity Drives Factor Split

The CB domain showed weaker internal consistency (α = 0.6300), and EFA identified two distinct components:

Factor 1 captured credential-related risks, including password reuse (CB2), skipping software security checks (CB3), and underestimating personal responsibility for account protection (CB6).
Factor 2 reflected proactive and protective behaviors, such as creating backups (CB4), using two-factor authentication (CB1), storing passwords in a password manager (CB5), and regularly checking for threats (CB7).

This factor structure illustrates the dual nature of cybersecurity behavior: users may engage in protective actions while simultaneously neglecting basic credential hygiene, or vice versa. However, even when these two factors were treated separately, internal consistency remained low. Cronbach’s alpha values for both subgroups failed to exceed conventional reliability thresholds, with values for Factor 1 items (e.g., CB2–CB6) ranging between 0.51 and 0.61, and even lower alpha estimates for Factor 2 items, particularly for CB5 (α = 0.398 when omitted), indicating weak item-total correlations.

These results suggest that cybersecurity behaviors, although conceptually grouped, may reflect multiple loosely connected habits rather than a single coherent behavioral trait.

Personality (P): Multi-Trait Structure Confirmed

As expected, the ten personality items fragmented into five components, aligning closely with the Big Five trait model. The internal consistency was relatively low (α = 0.6117), which is not surprising given the heterogeneity of the traits:

Openness (P1, P2): Interest in technology and innovation.
Conscientiousness (P3, P4): Attention to detail, organization.
Extraversion (P5, P6): Social engagement.
Agreeableness (P7, P8): Conflict avoidance, empathy.
Neuroticism (P9, P10): Anxiety and control concerns.

This confirms that these items reflect distinct personality traits rather than a unified latent construct.

The relatively low Cronbach’s alpha values observed for the Cybersecurity Behavior (CB) and Personality (P) domains reflect genuine multidimensionality rather than measurement error. CB items capture both protective and risky practices that do not form a single latent construct but together characterize the breadth of digital habits influencing incident risk. Likewise, the P items were drawn from the Big Five framework, where each trait is theoretically distinct and internal consistency across all ten items is not expected. While this limits the use of domain-level factor scores, it also supports the rationale for retaining these items individually: item-level modeling preserves unique predictive variance that would otherwise be averaged out. Future work could refine these domains by constructing subscales (e.g., credential hygiene vs. proactive defense in CB), by adopting trait-level scores for personality, or by applying exploratory clustering methods to identify empirically coherent groupings.

Based on this, we chose to retain individual items as predictors in the logistic regression models, using domain membership only as a labeling tool for interpretability. This approach respects both the empirical structure of the data and the theoretical diversity of digital risk-related behaviors.

4.2. Model Results—All Data

Table 4 presents the performance of logistic regression models estimated on the full dataset for twelve binary outcome variables. These include six incident-specific outcomes (INC1–INC6) and six composite indicators capturing cumulative exposure to mild or serious incidents.

Model performance was assessed using four key metrics: AUC and deviance R², both in-sample and under 10-fold cross-validation. Model strength was categorized as Good, Moderate, or Weak based on the criteria outlined in Section 3.3.

Key results include the following: The AllThreeSerious model achieved the strongest performance, with high discrimination (10-fold AUC = 0.829) and cross-validated fit (10-fold Deviance R² = 15.75%), despite the low number of positive cases. The AtLeastOneMild model also performed well (10-fold AUC = 0.759; 10-fold Deviance R² = 7.77%), demonstrating solid generalizability across a well-populated outcome. AtLeastTwoMild and INC2 reached the moderate performance range, while all other models were categorized as Weak, either due to low discrimination, limited variance explained, or unstable event distributions.

Across the twelve full-sample models, Cybersecurity Behavior (CB) appeared most consistently, included in 11 out of 12 models. Risk Rationalization (RR) and Personality (P) were each retained in 10 models, while Digital Literacy (DL) and Work–Life Blurring (WLB) were present in 7 models each. Stronger-performing models (e.g., AtLeastOneMild, AtLeastTwoMild, AllThreeSerious) tended to incorporate predictors from multiple domains, suggesting the added value of diverse theoretical inputs.

These results establish a full-sample baseline and will serve as a comparison point for the stratified models presented in the next section.

To address the potential distortions introduced by the Hungarian subsample, we also re-estimated the full-sample models after excluding Hungarian respondents. This complementary analysis was worthwhile for two reasons. First, it allowed us to assess whether the relatively homogeneous Hungarian subgroup (dominated by academics and students) masked predictive signals in the broader dataset.

If we compare full-sample models with and without Hungarian respondents, the contrast is informative: while the “All Data” results produced only one Good model (AllThreeSerious), excluding Hungary revealed stronger and more generalizable patterns, with AtLeastOneMild and AtLeastTwoMild both reaching Good and AtLeastTwoSerious improving to Moderate. These shifts suggest that the Hungarian subgroup’s demographic skew toward students and academics dampened predictive variance in the combined dataset, muting the signal of otherwise strong models. At the same time, the drop of AllThreeSerious to Weak highlights the trade-off of reduced sample size for rare outcomes. Taken together, these comparisons confirm that including Hungary preserved sample power but at the cost of weaker full-sample model performance, justifying the complementary presentation of both sets of results.

4.3. Model Results—Layered Models

This section examines whether stratifying the sample by organizationally meaningful subgroups improves model performance. Logistic regression models were re-estimated within five analytically motivated layers: Education, IT and Technology, and three country-based groups (Hungary, UK, and USA). The structure of the outcome variables remained unchanged, allowing direct comparison of model fit metrics with those from the full-sample models.

4.3.1. Education Layer

Table 5 summarizes model performance for participants working in the Education sector (N = 104). Overall, stratification yielded notable improvements in discrimination and generalization across several outcomes.

In the Education layer, several models demonstrated stronger predictive performance compared to their full-sample counterparts. INC1 (10-fold AUC = 0.747; 10-fold deviance R² = 9.72%) and INC2 (10-fold AUC = 0.820; 10-fold deviance R² = 19.43%) both reached the Good category, reflecting substantial gains in discrimination and explanatory power relative to the full data. INC6 also achieved Good performance (10-fold AUC = 0.767; 10-fold deviance R² = 14.12%), indicating stable generalizability within this sector. A few additional models achieved moderate results, such as AtLeastTwoMild (10-fold AUC = 0.774; 10-fold deviance R² = 10.97%) and AtLeastOneSerious (10-fold AUC = 0.771; 10-fold deviance R² = 10.95%), suggesting sector-specific improvements for moderately prevalent outcomes.

However, other models remained weak due to limited variance explained or very low event counts. For example, AtLeastOneMild, despite excellent in-sample fit (AUC = 0.987), dropped to Weak under cross-validation (10-fold deviance R² = 3.81%), highlighting overfitting risk. Similarly, rare outcomes such as AtLeastTwoSerious and AllThreeSerious could not be modeled reliably.

Overall, the Education layer results suggest that sector-specific modeling yields clear performance benefits—particularly for incident-specific models (e.g., INC1, INC2, INC6)—while also underscoring the persistent challenges of modeling rare incidents.

4.3.2. IT and Technology Layer

Table 6 summarizes model performance for participants working in the IT or Technology sector (N = 133).

In the IT layer, several models demonstrated strong predictive performance. AtLeastTwoMild (10-fold AUC = 0.800; 10-fold deviance R² = 15.83%) and AllThreeMild (10-fold AUC = 0.748; 10-fold deviance R² = 12.70%) achieved the highest combined discrimination and explanatory power. INC2 (10-fold AUC = 0.781; deviance R² = 13.11%), INC5 (10-fold AUC = 0.760; deviance R² = 12.65%), and AtLeastTwoSerious (10-fold AUC = 0.776; deviance R² = 10.49%) also reached the Good category, highlighting robust sector-specific generalizability. A number of additional models fell into the moderate range, including INC3, INC4, and INC6, reflecting solid but less consistent performance.

By contrast, some models suffered from instability due to sparse positive cases or potential overfitting. For example, AtLeastOneMild and AllThreeSerious achieved very high in-sample AUCs (0.885 and 0.964, respectively) but collapsed under cross-validation (10-fold deviance R² = 0.0), underscoring the limitations of modeling rare outcomes. Overall, the IT models suggest that incident prediction can reach good generalizability in this sector, particularly for mild and moderately frequent outcomes, though data scarcity remains a critical constraint for rarer incident types.

4.3.3. Country-Specific Layers: Hungary, UK, and USA

To evaluate geographic variation in model performance, logistic regression models were re-estimated for respondents from three countries with sufficient sample sizes: Hungary, the United Kingdom (UK), and the United States (USA). Table 7, Table 8 and Table 9 summarize model results across the 12 outcomes for each country.

Across the three national layers, distinct patterns of predictive performance emerged. The UK models demonstrated some of the strongest results, with several outcomes achieving high discrimination and cross-validated explanatory power. For example, INC1 reached a 10-fold AUC of 0.815 and deviance R² of 18.22%, while AllThreeMild achieved a 10-fold AUC of 0.832 and R² of 21.64%. Although certain rare-event outcomes (e.g., AllThreeSerious) were unstable or unanalyzable, the UK layer overall produced multiple models in the Moderate range, reflecting relatively strong generalizability.

The USA models also performed well, particularly for compound outcomes. AtLeastTwoMild achieved one of the highest performances across all layers (10-fold AUC = 0.951; R² = 19.53%), while AtLeastOneSerious reached the Moderate range (10-fold AUC = 0.740; R² = 10.25%). Several incident-specific models (e.g., INC2, INC4, INC6) likewise showed stable Moderate performance. As with the UK layer, however, rare-event outcomes such as AllThreeSerious tended to collapse under cross-validation, highlighting data scarcity issues despite strong in-sample metrics.

In contrast, the Hungarian models clustered largely in the Moderate category, with INC1–INC3 and INC6 achieving balanced discrimination (10-fold AUCs ~0.70–0.79) and explanatory power (10-fold R² ~ 5–7%). However, no Hungarian model reached the Good category, and several outcomes (e.g., AtLeastTwoMild, INC5, AtLeastTwoSerious) were downgraded to Weak due to cross-validation collapse. These results likely reflect the structural bias of the Hungarian subsample, which was disproportionately composed of academics, students, and acquaintances from the Phase 1 recruitment. This homogeneity reduced behavioral diversity and may have introduced social desirability bias, leading to compressed variance and weaker predictive generalizability.

As in other layers, rare-event outcomes were either unstable or non-analyzable (e.g., AllThreeSerious in Hungary and the UK), underscoring the challenges of modeling sparsely distributed incidents. Domain inclusion patterns also varied: Work–Life Blurring (WLB) appeared frequently in both the UK and USA layers, while Digital Literacy (DL) was more often retained in the UK and IT-related models. These shifts suggest that contextual factors tied to national or sectoral environments shape the predictive salience of different theoretical domains.

4.3.4. Specific Items of the Domains as Illustration

To provide additional transparency, we illustrate item-level predictors for the AtLeastTwoMild outcome, which represents a moderately prevalent and methodologically stable endpoint. Table 10 lists the survey items retained in this model and in its stratified variants across Education, IT, Hungary, UK, and USA layers.

It is important to note that these item-level findings are provided as an illustration. As described earlier, correlations among survey items and the use of stepwise selection mean that individual predictors cannot be generalized as stable or unique drivers of incidents. For this reason, the more reliable basis for interpretation remains the domain-level distinctions, which capture broader and more consistent patterns across models.

A comparison of these model items highlights both recurring patterns and contextual distinctions. Across layers, items reflecting work–life interference (e.g., “It is hard to mentally disconnect from work during my free time,” WLB10), rationalizations that diminish security vigilance (e.g., “Sometimes I ignore security threats if they interrupt my work,” RR2), and credential-related practices (e.g., “I sometimes skip software security checks,” CB3; “I could probably do more to protect my online accounts,” CB6) appeared frequently, suggesting that mental detachment, rationalization, and basic security hygiene form a common backbone of vulnerability. At the same time, distinct predictors emerged in different contexts. In the Education layer, digital literacy items (e.g., DL1, DL4, DL5) and openness (P1) were prominent, indicating that skills and curiosity shape educators’ exposure. The IT layer emphasized rationalization under workload pressure (RR6) and domain overlap (WLB5, WLB6, WLB8), consistent with technology-intensive roles. National models showed additional variation: for Hungary, self-reported technical ability (DL6) was decisive; in the UK, work–life blurring and digital literacy items (WLB8–WLB10, DL2–DL3) clustered strongly; while in the USA, rationalizations (RR6), stress with technology (DL7), and interpersonal tendencies (P7) stood out.

4.4. Post Hoc Evaluation: Individual-Level Predictors and Threshold Performance

4.4.1. Inclusion of Categorical Variables

To evaluate and illustrate whether demographic and contextual information improves prediction beyond behavioral and psychological indicators, we extended the IT-layer model for the AtLeastOneSerious outcome by incorporating age group (D1: young ≤ 35, older > 35) and gender (D2: male vs. female). These variables were added to a model that already included item-level predictors from the five theoretical domains. Table 11 compares model performance and composition before and after the inclusion of categorical and interaction terms.

These demographic terms are presented solely for methodological demonstration, as their direct use in deployment could introduce fairness and ethical concerns.

The extended model demonstrated clear performance gains:

AUC increased from 0.810 to 0.884.
10-fold AUC improved from 0.676 to 0.789.
10-fold deviance R² rose from 0.00% to 9.20%.

These improvements suggest that carefully chosen demographic variables, especially when used in interaction terms, can enhance generalizability and sensitivity without inflating overfitting risk. However, this should not be interpreted as a general recommendation for categorical variable expansion. In many other models tested across different outcomes and layers, the inclusion of demographic predictors led to clear signs of overfitting—such as large discrepancies between in-sample and cross-validated metrics—or even complete separation, a statistical issue where the outcome can be perfectly predicted by one or more variables, leading to infinite or unstable coefficient estimates. To avoid these pitfalls, we should limit such expansions to models with sufficiently large event counts and ensure that all additional predictors are supported by theoretical relevance and statistical prudence.

Domain Stability and Item Variation

Notably, while both models drew predictors from the same theoretical domains (e.g., WLB, RR, CB, DL, P), they selected different items within each domain. This variation is neither surprising nor problematic. In stepwise logistic regression—particularly in predictive rather than explanatory modeling—item selection is influenced by redundancy, collinearity, and local data interactions. Several items within the same domain may carry overlapping signals, and only one or two may be retained depending on the statistical context and co-variates included.

This is precisely why, throughout our model goodness summary tables, we reported only the contributing domains rather than listing specific item-level predictors. Doing so preserves interpretability and avoids overemphasizing what may be arbitrary or unstable item selections across similar models with equivalent predictive power.

Interaction Effects and Contextual Nuance

The extended model included interaction terms that revealed nuanced effects of behavioral predictors depending on age and gender:

WLB4 × Gender (D2): WLB4, (“My personal and professional digital lives are intertwined”) was a strong positive predictor overall. However, the negative interaction for men (−1.69, p = 0.034) indicates that this risk factor is more predictive for women and may be weaker or non-significant in male respondents.
WLB9 × Age (D1): WLB9, (“My work and personal activities often take place in the same physical space”) became a significant risk factor only among younger individuals (interaction = +0.83, p = 0.050), possibly due to their more fluid work–life boundaries and domestic work setups.
CB6 × Age (D1): CB6, (“I could probably do more to protect my online accounts”) was not significant overall but became a negative predictor among older users (−0.90, p = 0.078), suggesting that self-perceived vulnerability predicts actual risk more clearly in this group.
P10 × Age (D1): P10, (“I’m worried about losing control of my data”) had a protective effect overall, but the interaction term (+0.77, p = 0.071) suggests that younger users may not translate this concern into protective action, diluting its effect.

These findings support the case for contextual enrichment of models through post hoc variable inclusion, particularly when supported by adequate sample size, balanced class distribution, and theoretical justification.

4.4.2. Optimal Probability Threshold Selection for Classification

In binary classification, selecting an optimal threshold—the probability cutoff for predicting a positive outcome—critically influences the trade-off between false positives and false negatives, two error types with markedly different implications in cybersecurity contexts. While Receiver Operating Characteristic (ROC) curves are constructed by varying this threshold, they provide a threshold-agnostic view of model behavior. Ultimately, however, operational deployment requires fixing a specific cutoff value to decide whether a user is labeled as “at risk.”

Defining a Positive Event in Cybersecurity Risk Prediction

In our study, a positive event is defined as the self-reported occurrence of a cybersecurity incident. Depending on the specific outcome variable (e.g., INC1, AtLeastOneSerious), this may refer to mild disruptions (e.g., account lockouts, unauthorized login notifications) or severe incidents (e.g., financial loss, device compromise). A predicted positive case thus represents a user that the model identifies as likely to have experienced such an incident.

Classification Metrics and Threshold Sensitivity

Model performance is evaluated by comparing predicted classifications to actual outcomes, based on a chosen threshold. These comparisons yield:

True Positives (TP): Cases correctly predicted as having experienced a cybersecurity incident.
True Negatives (TN): Cases correctly predicted as not having experienced an incident.
False Positives (FP): Cases incorrectly predicted as positive.
False Negatives (FN): Cases incorrectly predicted as negative.

From these quantities, we derive key metrics:

Precision (Positive Predictive Value): TP/(TP + FP)
Recall (Sensitivity or True Positive Rate): TP/(TP + FN)
Specificity (True Negative Rate): TN/(TN + FP)
F1 Score: 2 * (Precision * Recall)/(Precision + Recall)
Youden’s J Statistic: Sensitivity + Specificity − 1
Accuracy = (TP + TN)/(TP + FP + TN + FN)

ROC Curves and Threshold-Agnostic Evaluation

The ROC curve plots the model’s true positive rate (sensitivity) against the false positive rate (1 − specificity) as the classification threshold is varied from 0 to 1. This provides a complete picture of the trade-off between detection and false alarms across the entire range of possible thresholds.

Although the ROC curve is constructed by sweeping through thresholds, it provides a threshold-agnostic representation of model performance. The Area Under the Curve (AUC) condenses this information into a single number, representing the model’s average ability to discriminate between positive and negative cases regardless of the threshold. AUC values range from 0.5 (random guessing) to 1.0 (perfect discrimination).

Error Asymmetry in Cybersecurity Risk Prediction

In cybersecurity applications, false negatives—cases where real threats are missed—are often more damaging than false positives, which may merely trigger unnecessary warnings. A missed detection can lead to serious harm, such as financial fraud or system compromise, while an excessive warning may only result in minor inconvenience or user fatigue.

Consequently, threshold selection frequently emphasizes high recall, ensuring that at-risk individuals are detected, even if it results in more false positives. Yet overly sensitive thresholds can overwhelm response teams or erode user trust due to frequent false alarms.

Toward Informed Threshold Selection

Rather than relying on a single cutoff, we present multiple thresholding strategies tailored to different operational needs:

-: Maximum Youden’s J Statistic: A balance between sensitivity and specificity.
-: Maximum F1 Score: Best for rare-event detection when precision and recall must be balanced.
-: High Recall value: Trying to avoid the most dangerous situations.

The selection of a classification threshold in predictive models is not merely a statistical exercise; it has direct operational implications in real-world cybersecurity. Choosing when to flag a user or event as “at risk” determines the scope and effectiveness of interventions, the burden on response teams, and the credibility of the alert system itself.

Using the Hungary AllThreeMild model as an example, we outline practical threshold selection strategies suited for different operational objectives. ROC curves and Detection Error Trade-off (DET) curves in Figure 1 and Figure 2, and Confusion Matrices for the three optimization criteria in Table 12 enable a nuanced understanding of performance trade-offs and support context-aware decision-making.

Figure 1 shows the Receiver Operating Characteristic (ROC) curve for the Hungary AllThreeMild model. The ROC curve plots the true positive rate (recall) against the false positive rate across all possible classification thresholds, illustrating the trade-off between correctly detecting incidents and mistakenly flagging non-incidents. The area under the curve (AUC) provides a summary measure of discrimination: an AUC of 0.75 indicates that in 75% of randomly chosen pairs of one incident and one non-incident case, the model assigns a higher probability to the incident. The annotated thresholds highlight operational strategies, with both the Youden J and F1 optimization criteria converging at t = 0.47, while a high-recall threshold (t = 0.17) favors catching as many incidents as possible at the expense of more false positives. Figure 2 presents the Detection Error Trade-off (DET) curve, which shows the relationship between false negative and false positive rates, complementing the ROC curve by directly emphasizing error rates. Together, these figures illustrate not only the model’s overall discriminative power but also the consequences of threshold selection for practical deployment.

From Table 12, we observe:

The Youden’s J and F1 Score thresholds (both at 0.47) offer a balanced trade-off, catching ~75% of real incidents with moderate false alarms.
The High Recall threshold (0.17) catches 95.4% of true incidents but misclassifies many safe users as risky (68 false positives vs. only 9 true negatives).

In practice, the choice of a classification threshold should reflect a careful balance between detection effectiveness and operational feasibility. While statistical criteria such as Youden’s J and the F1 score offer valuable guidance, they must be interpreted in light of the specific priorities and constraints of the cybersecurity environment. For high-risk settings where the cost of missing a true incident is unacceptable, a more aggressive threshold favoring recall may be appropriate, even at the expense of increased false positives. Conversely, in resource-constrained scenarios where every alert carries a cost, a more conservative threshold may be warranted.

It is also important to recognize that threshold tuning—especially when based heavily on observed data—can introduce overfitting, making the model appear more effective in evaluation than it may be in deployment. Therefore, effective threshold selection is not a one-time decision but an adaptive process that benefits from cross-validation, independent testing, and continual alignment with real-world outcomes.

5. Discussion

This section interprets the empirical findings considering the study’s objectives and broader implications. First, we summarize the main patterns observed across the full-sample and stratified models. We then consider why layered modeling improved performance in specific subgroups and reflect on the practical value of different types of models for organizational or policy applications. Finally, we address key limitations—particularly those related to rare events and generalization in small samples—and outline priorities for future research, including opportunities for external validation and temporal modeling.

5.1. Summary of Findings

This study assessed the predictive performance of logistic regression models across twelve cybersecurity incident outcomes using both full-sample and stratified modeling approaches. Predictors were drawn from five theoretically grounded domains and evaluated using cross-validated AUC and deviance R².

In the full-sample models (including Hungary), predictive strength was uneven. Only one model achieved strong performance:

AllThreeSerious, despite a very low number of positive cases (n = 13), yielded a high 10-fold AUC (0.829) and 10-fold deviance R² (15.75%), though its reliability was limited by sparse data and the risk of overfitting.

Two further models—AtLeastOneMild (10-fold AUC = 0.759; 10-fold deviance R² = 7.77%) and AtLeastTwoMild (10-fold AUC = 0.704; 10-fold deviance R² = 8.20%)—were classified as moderate, showing some degree of generalizability across well-populated outcomes. Most other full-sample models, particularly incident-specific models such as INC1, INC3, and INC5, were weak, exhibiting limited cross-validated discrimination and low explanatory power.

When Hungarian participants were excluded (removing the overrepresented academic/student subsample), the pattern of results shifted. Two models have now reached strong performance:

AtLeastOneMild improved to a Good classification (10-fold AUC = 0.832; 10-fold deviance R² = 17.98%), becoming the most robust model overall.
AtLeastTwoMild also achieved Good performance (10-fold AUC = 0.770; 10-fold deviance R² = 14.75%).

In contrast, AllThreeSerious declined to Weak despite its high in-sample fit, reflecting instability due to sparse positive cases (n = 12). Several other models, such as INC4 (Moderate) and AtLeastTwoSerious (Moderate), showed improved stability and explanatory power compared to the full-sample results.

Taken together, the comparison highlights how the Hungarian subsample influenced aggregate performance: full-sample results preserved statistical power and behavioral variance, while the non-Hungarian analysis produced more stable evidence of strong predictive performance for the Mild incident outcomes.

Stratified analyses revealed that occupation-based layers (Education, IT) outperformed both national layers and the full-sample models, delivering the strongest and most stable predictive results.

Education: Several outcomes (e.g., INC1, INC2, AtLeastTwoMild) shifted upward from Weak or Moderate in the full-sample models to Good performance, reflecting the internal coherence of this occupational subgroup.

IT: This layer yielded some of the strongest results in the entire study, including Good models for AllThreeMild, AtLeastTwoMild, INC2, and INC5, with cross-validated AUC and deviance R² values that exceeded those of the full-sample.

National layers: By contrast, the UK and USA produced a few strong outcomes but with less consistency, while the Hungary layer clustered mainly in the Moderate range, reflecting the skewed Phase 1 sample composition.

A consistent pattern across occupation layers was that moderately prevalent outcomes (e.g., AtLeastTwoMild, INC2) produced the most reliable and generalizable models. Rare-event outcomes (e.g., AllThreeSerious) remained fragile or non-analyzable even when stratified.

Domain-level patterns also highlight the value of stratification. While CB, RR, and P remained broadly important across all models, WLB became more predictive in Education and DL in IT, aligning with subgroup-specific challenges in boundary management and technical skills.

Taken together, these results demonstrate that occupation-based stratification provides clearer, stronger, and more interpretable models than both full-sample and national stratification, underscoring the importance of tailoring predictive frameworks to organizational context.

While demographic variables such as gender and age were included in the modeling process for methodological demonstration, their direct use in deployment would raise fairness and ethical concerns. Any application of such predictors should therefore be complemented by fairness-aware approaches to avoid unintended bias in risk scoring.

5.2. Interpretation and Practical Implications

5.2.1. Why Layered Models Outperform General Models

The consistent performance gains observed in Education, IT, UK, and USA layers likely result from greater behavioral and contextual homogeneity within these subgroups. When data are stratified by occupation or country, the resulting models are less affected by competing norms, structural inconsistencies, and unmeasured sources of variation. For example, digital practices, access policies, and boundary management differ substantially between an IT professional and a university teacher. Modeling them together may obscure domain-specific patterns, leading to lower generalizability. By contrast, layered models benefit from tighter signal-to-noise ratios and more consistent predictor-outcome relationships. Even with smaller sample sizes, these advantages can outweigh the risks of overfitting, especially when combined with rigorous cross-validation.

The implication is clear: when actionable insights are needed for specific organizational environments, layered models should be prioritized, provided that the subgroup presents a coherent behavioral or structural context (e.g., same profession or country), and has sufficient sample size and event counts to support stable estimation.

General models remain valuable in broad monitoring, exploratory screening, or policy design, where overfitting concerns are lower and the need for wide applicability is higher.

5.2.2. What Incident Outcomes Are Viable for Modeling

Another key finding is that not all incident outcomes are equally amenable to modeling—even within stratified layers. Models built on moderately prevalent outcomes (e.g., AtLeastTwoMild, INC2) consistently achieved better discrimination and generalization. These outcomes strike a balance: they are neither so common as to lack variation, nor so rare that they lead to sparse events.

In contrast, serious low-frequency events (e.g., AllThreeSerious) often produced unstable or overfitted models, despite showing superficially high AUC. These models are vulnerable to complete separation or inflated R² values under cross-validation.

Based on these results, incident selection should consider:

-: Event count thresholds: At least 10–20 positive cases per model, with a recommended 10:1 or greater case-to-predictor ratio for basic logistic regression.
-: Outcome reliability: Composites that combine multiple similar items (e.g., AtLeastTwoMild) generally perform better than single-item outcomes, due to increased signal strength.
-: Actionability: Outcomes used for modeling should have practical relevance for intervention or monitoring.

5.2.3. When to Use Contextual or Demographic Variables or to Optimize Classification Thresholds

Demographic and contextual variables (e.g., age, gender, education, remote work status) can improve model fit when added post hoc. However, their use should be carefully timed and justified.

We recommend only incorporating contextual variables:

-: They do not violate any ethical rule,
-: After base models have been trained and cross-validated,
-: In layers with sufficient sample size and event count to prevent overfitting,
-: When the case-to-variable ratio exceeds 10:1, or when stepwise regularization is used to manage redundancy.

Critically, these variables should not drive model segmentation unless a strong theoretical or practical justification exists. Instead, they should be treated as:

-: Fairness audit variables (e.g., checking if model performance differs by gender),
-: Personalization enrichments (e.g., adapting alert thresholds based on age or device-sharing),
-: Exploratory factors for hypothesis generation, not primary model drivers.

5.2.4. Threshold Selection and the Risk of Overfitting

While model discrimination (AUC) is important, effective deployment requires converting probabilistic predictions into actionable decisions—typically by applying a fixed classification threshold. This threshold determines which users are flagged as “at risk” and directly impacts operational outcomes such as false positives, false negatives, and resource allocation. As shown in Section 4.4.2, threshold choice is not trivial: small adjustments can drastically alter sensitivity and specificity, especially in imbalanced or low-frequency outcomes.

For example, a threshold optimized for high recall may flag nearly all true positives but overwhelm administrators with false alarms. Conversely, a conservative threshold may miss serious risks. These trade-offs are context-dependent: educational institutions might tolerate higher false positives for early intervention, while overburdened IT teams may require stricter thresholds to avoid alert fatigue.

To manage this, we recommend:

-: Using cross-validated Youden’s J or F1 score to identify balanced thresholds,
-: Visualizing ROC and DET curves to understand trade-offs under different operating points,
-: Calibrating thresholds per outcome and layer, rather than applying a universal cutoff,
-: Monitoring model performance over time to detect threshold drift as digital behaviors change.

It is also important to note that threshold tuning itself can introduce overfitting if overly tailored to the training data. Organizations should therefore validate selected thresholds on new data or through simulation before deploying high-stakes interventions.

In sum, threshold setting is a critical design decision that should be grounded not only in statistical optimization but also in practical constraints, resource availability, and ethical considerations. Predictive models without carefully tuned thresholds risk being either ineffective or damaging in real-world settings.

5.2.5. Model Deployment in Organizational Settings

First, organizations must determine which incident outcomes are appropriate for modeling. Models targeting moderately prevalent outcomes (e.g., AtLeastTwoMild, INC2) offer a good balance of statistical stability and practical relevance. Rare outcomes (e.g., AllThreeSerious) may produce inflated scores or unstable predictions and are best avoided for individual-level classification—particularly in small teams.

Second, the model scope should match the operational context. Stratified models tailored to a specific sector (e.g., Education or IT) offer better performance than generalized models. For instance, an educational institution may train its own model using Education-layer coefficients from this study as a template, refining them with internal data over time.

Third, scoring implementation must be transparent and proportionate. Employees identified as high risk should not face punitive action but rather be offered low-friction interventions—such as personalized training prompts, optional toolkits (e.g., password managers), or temporary monitoring. Risk scores may be expressed in simple tiers (e.g., low/medium/high) and updated periodically as part of onboarding, annual assessments, or organizational audits.

Organizations should adopt minimum safeguards for ethical deployment, such as:

-: Clear documentation of scoring logic,
-: Minimum event and sample size thresholds,
-: Options for employee opt-in or feedback,
-: Routine fairness checks across demographic groups.

Used thoughtfully, survey-based models can serve as an interpretable and actionable layer within broader cybersecurity strategies, especially when real-time behavioral data are not available or ethically permissible.

Those aiming for more dynamic, real-time detection can integrate survey-based scores into User and Entity Behavior Analytics (UEBA) systems. In such configurations, the survey-based risk score serves as a static baseline, helping to calibrate UEBA sensitivity for different users. For instance, an employee flagged as high risk through survey responses might trigger lower behavioral thresholds for login anomalies or privilege escalation. Over time, the system refines its model by combining static traits (e.g., digital literacy, personality) with behavioral telemetry (e.g., login times, device switches, location anomalies). This hybrid approach balances early-stage interpretability with adaptive learning and reflects a growing best practice in human-centric cybersecurity architecture.

In addition to these organizational considerations, it is important to recognize the methodological and ethical boundaries of this approach. While survey items were carefully designed to minimize social desirability and recall bias through anonymity and neutral framing, such biases may still attenuate predictive validity and should be acknowledged in deployment contexts. Moreover, the specific item-level coefficients disclosed in Table 10 are intended as illustrative examples only, since correlations and stepwise selection limit their generalizability; domain-level patterns remain the more reliable interpretive basis. Future research should assess the longitudinal stability of predictors, as risk propensities may shift with evolving roles, technologies, and organizational practices. Until such evidence accumulates, survey-based models are best understood as a static but interpretable layer that complements, rather than replaces, adaptive monitoring systems.

5.3. Limitations and Future Work

This study offers new insights into the predictive modeling of cybersecurity incident experiences using self-reported behavioral, cognitive, and dispositional data. However, several limitations should be acknowledged, and these inform clear directions for future research.

5.3.1. Rare Event Limitations

A consistent limitation across multiple layers was the challenge of modeling rare serious outcomes, such as financial loss, identity theft, or ransomware attacks. These outcomes suffered from low event counts, particularly in subgroup analyses, leading to unstable estimates, overfitting risks, and in some cases, models that could not be analyzed at all. Although some rare-event models (e.g., the full-sample AllThreeSerious model with only 13 events, or its counterpart in the IT layer) showed high AUC, these metrics were often misleading due to near-complete separation or extreme class imbalance.

Despite the achieved favorable AUC values, cross-validation results showed clear signs of overfitting and instability, reflecting the scarcity of positive cases. Such results should therefore be interpreted as methodological illustrations rather than actionable tools. For practical deployment, we recommend relying only on models with sufficient event counts and robust cross-validated performance. Future research may explore alternative techniques—such as penalized regression, rare-event logistic corrections, or data augmentation methods—that are designed to address extreme class imbalance. Until such refinements are validated, rare-event models should be reported transparently but excluded from operational risk scoring or organizational deployment. Alternatively, aggregating related events into composite categories may improve model stability while retaining interpretability.

5.3.2. Generalization Risks in Small Subgroups

While layered models showed substantial performance gains, they also carry risks of overfitting, especially when trained on small subgroups with limited variability. Some stratified models produced strong in-sample metrics but displayed large drops under cross-validation or flagged remarks related to instability (e.g., event counts <10). This highlights the need for cautious interpretation and replication in larger or external samples.

To improve generalizability, future studies should aim to: Maintain a 10:1 ratio of cases to predictors wherever feasible, apply regularization techniques or ensemble models to minimize overfitting, and report cross-validated metrics (as done here) rather than relying solely on in-sample fit.

5.3.3. Biases from Self-Report and Convenience Sampling

A key limitation of this study is its reliance on retrospective, self-reported incidents and behaviors. Such measures are prone to recall error, misclassification, and social desirability effects, all of which can reduce the accuracy of reported cybersecurity experiences. While large-scale surveys remain one of the most practical and ethically acceptable methods for capturing user-level incidents, they cannot fully substitute for behavioral or log-based evidence. Future research should therefore seek to triangulate survey-based predictions with telemetry or organizational log data when available, embed validation items to benchmark self-reports against observed behaviors, and adopt longitudinal designs that allow for prospective comparisons.

Our measures of emotional states relied on self-reported Likert-scale items rather than psychophysiological or behavioral validation.

The Hungarian case illustrates how convenience sampling can shape subgroup outcomes: even when sample size is adequate, compressed behavioral variance or biased responding may limit model performance. More broadly, such biases are difficult to eliminate entirely, reinforcing the importance of stratified analyses and transparent reporting of recruitment methods.

5.3.4. Future Research Directions

Several avenues are open to advance this line of work:

The coding of “Not sure/Don’t know” responses into the “No” category may have introduced a conservative bias, as some ambiguous cases might have represented unrecognized incidents. While this choice maximized statistical power, it may also have understated true event rates. Future studies could test the robustness of findings by treating “Not sure” as a separate analytic category.

Although our evaluation prioritized discrimination (AUC) and explanatory strength (cross-validated deviance R²), future work should also incorporate calibration-focused metrics such as the Brier score [62], following recent recommendations in predictive modeling research [63].

External validation: Models should be tested on new samples or in longitudinal follow-ups to assess real-world stability and transferability across contexts.

Temporal modeling: Longitudinal or event history approaches could capture sequences of risk behavior, rather than static snapshots, improving predictive precision. Survival analysis could extend our framework to model not only whether but also when cybersecurity incidents occur.

Causal inference: While this study focused on prediction, future designs (e.g., experiments, natural experiments, or instrumental variable approaches) could investigate causal mechanisms behind digital risk exposure.

Adaptive deployment: Dynamic models that incorporate ongoing behavioral data (e.g., from digital platforms or organizational logs) could support real-time risk profiling and context-aware interventions.

6. Conclusions

This study introduces a novel, interpretable framework for predicting cybersecurity incidents through item-level behavioral, cognitive, and dispositional indicators. By modeling twelve outcomes across stratified subgroups and a full international sample, we demonstrate that granular, domain-specific predictors—rather than aggregated scores or behavioral intentions—can effectively anticipate real-world incidents, especially those of moderate frequency such as suspicious logins and password resets.

The study fulfills two overarching objectives: it validates item-level modeling as a methodological contribution and applies this approach to determine which of the five domains most strongly predict cybersecurity incidents.

Our findings offer both methodological and applied value. The item-level modeling approach preserves behavioral specificity and enhances transparency, making the results readily actionable for cybersecurity training, user-level risk scoring, and early-warning systems. Stratified models outperform general ones in several layers, underscoring the importance of contextualized modeling in human-centric cybersecurity. We also show how threshold optimization and fairness-aware demographic inclusion can improve practical decision-making while maintaining interpretability.

Conceptually, the study integrates five theoretical domains—Work–Life Blurring, Risk Rationalization, Cybersecurity Behavior, Digital Literacy, and Personality—into a unified predictive architecture. No single domain dominates; rather, incident risk emerges from their interaction, revealing the multifaceted nature of digital vulnerability in blurred work–life environments.

This work reframes cybersecurity as a behavioral challenge—where everyday digital choices carry measurable risk. It offers a scalable and ethically grounded framework for organizations seeking to detect, predict, and mitigate human-driven cybersecurity threats. By bridging theory, analytics, and deployment relevance, the study contributes a durable foundation for the next generation of adaptive, user-aware cybersecurity systems.

Funding

No funding was received to assist with the preparation of this manuscript.

Institutional Review Board Statement

The Ethical Committee of the University of Dunaújváros, Hungary has granted approval for this study on 25 July 2025 (Ref. No. DUE-EC/2025/002).

Informed Consent Statement

Prior to participation, all individuals were informed about the study’s aims and the anonymous and voluntary nature of the survey through an initial statement in the questionnaire. This statement also made clear that by participating, they consent to the anonymized use of their data solely for statistical analysis. We have ensured that it is technically impossible to identify any participant from the data collected, maintaining the strictest levels of confidentiality and data protection.

Data Availability Statement

Data available through the link: https://drive.google.com/file/d/1ie2TP1yuUQiiZq-fp55YMx3HGK8mnxXM/view?usp=sharing (accessed on 27 July 2025).

Conflicts of Interest

The author has no competing interests to declare that are relevant to the content of this article.

Appendix A

Table A1. Survey Items by Domain with Response Options.

Item ID	Question Text	Domain	Response Options
D1	What is your age range?	Demographics	Under 18; 18–24; 25–34; 35–44; 45–54; 55–64; 65 or older
D2	What is your gender?	Demographics	Male; Female; Prefer not to say
D3	What is your highest level of education?	Demographics	High school; Some college; Bachelor’s degree; Master’s degree; Doctorate
D4	Which field do you work in?	Demographics	IT/Technology; Education; Healthcare; Finance/Business; Other
D5	In which country do you currently reside?	Demographics	Open text
JR1	What is your job type?	Work Style	Company employee; Freelancer/Contractor; Academic/Research; Student; Other
JR2	Is remote work an option for you?	Work Style	Yes; No
JR3	Are you required or expected to perform any work-related tasks using your personal devices?	Work Style	Yes; No
WLB1	I use my work and personal accounts interchangeably throughout the day.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB2	I check personal accounts for social media or other apps using my work computer or work phone.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB3	Work tasks often interrupt my personal time.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB4	My personal and professional digital lives are intertwined.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB5	I often use work-related platforms to manage personal tasks.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB6_R	I strictly separate my work and personal activities. (R)	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB7	I use the same online platforms for both work and personal purposes, such as Google, Microsoft Teams, or Zoom.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB8	I check work emails while doing personal things.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB9	My work and personal activities often take place in the same physical space.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
WLB10	It is hard to mentally disconnect from work during my free time.	Work–Life Blurring	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR1	Sometimes sharing passwords with coworkers can save time.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR2	Sometimes I ignore security threats if they interrupt my work.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR3	Sometimes I feel that certain cybersecurity rules don’t really apply to me.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR4	Sometimes I take security risks because others around me do the same.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR5	Time pressure makes me more likely to overlook security procedures.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR6	Following every security prompt sometimes feels like it slows down important work.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR7_R	I believe it is my responsibility to recognize serious threats before relying on IT. (R)	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
RR8	Most shortcuts I take online feel harmless and unlikely to cause real problems.	Risk Rationalization	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB1	I use two-factor authentication when it’s available.	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB2_R	I use the same password on multiple sites. (R)	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB3_R	I sometimes skip software security checks. (R)	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB4	I always create backups of my important files.	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB5	I store passwords in a password manager.	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB6_R	I could probably do more to protect my online accounts. (R)	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
CB7	I regularly check my accounts or devices for potential security issues.	Cybersecurity Behavior	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL1	I am comfortable using various digital platforms.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL2	I am confident in spotting suspicious links in emails.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL3	I can recognize when a website or login page may be fake.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL4	I help others fix issues with digital tools.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL5	I adjust privacy settings on new apps easily.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL6	I can solve common tech problems on my own.	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
DL7_R	New tech stresses me out. (R)	Digital Literacy	1–5 Likert scale: Not at all typical of me—Completely typical of me
P1	I’m curious about how tech works.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P2	I enjoy trying new digital tools.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P3	I pay attention to details in tasks.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P4	I keep my digital life organized.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P5	I participate in online communities.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P6	I frequently post or share on social media.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P7	I avoid conflict in online discussions.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P8	I value others’ digital privacy like my own.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P9	It makes me anxious to go off-plan.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
P10	I’m worried about losing control of my data.	Personality Traits	1–5 Likert scale: Not at all typical of me—Completely typical of me
INC1	Have you ever personally experienced a cybersecurity-related problem (e.g., virus, account breach, account lockout, unauthorized access)?	Incident Outcomes	Yes; No; Not sure/Don’t know
INC2	Have you ever received a notification from a service (e.g., email provider, bank, company) about suspicious login activity?	Incident Outcomes	Yes; No; Not sure/Don’t know
INC3	Have you ever had to reset your password due to a suspected security issue?	Incident Outcomes	Yes; No; Not sure/Don’t know
INC4	Have you ever lost money or access to a paid service due to a cybersecurity issue?	Incident Outcomes	Yes; No; Not sure/Don’t know
INC5	Has someone ever used your personal or work account without your permission due to a cybersecurity incident?	Incident Outcomes	Yes; No; Not sure/Don’t know
INC6	Has a cybersecurity problem ever caused your computer or other device to stop working properly?	Incident Outcomes	Yes; No; Not sure/Don’t know

References

Barlette, Y.; Jaouen, A.; Baillette, P. Bring Your Own Device (BYOD) as reversed IT adoption: Insights into managers’ coping strategies. Int. J. Inf. Manag. 2021, 56, 102212. [Google Scholar] [CrossRef]
Borkovich, D.J.; Skovira, R.J. Working from home: Cybersecurity in the age of COVID-19. Issues Inf. Syst. 2020, 21, 234–246. [Google Scholar] [CrossRef]
Hasan, S.; Ali, M.; Kurnia, S.; Thurasamy, R. Evaluating the cyber security readiness of organizations and its influence on performance. J. Inf. Secur. Appl. 2021, 58, 102726. [Google Scholar] [CrossRef]
Pollini, A.; Callari, T.C.; Tedeschi, A.; Ruscio, D.; Save, L.; Chiarugi, F.; Guerri, D. Leveraging human factors in cybersecurity: An integrated methodological approach. Cogn. Technol. Work. 2022, 24, 371–390. [Google Scholar] [CrossRef] [PubMed]
Dalal, R.S.; Howard, D.J.; Bennett, R.J.; Posey, C.; Zaccaro, S.J.; Brummel, B.J. Organizational science and cybersecurity: Abundant opportunities for research at the interface. J. Bus. Psychol. 2022, 37, 1–29. [Google Scholar] [CrossRef] [PubMed]
Zimmermann, V.; Renaud, K. Moving from a ‘human-as-problem” to a ‘human-as-solution” cybersecurity mindset. Int. J. Human-Computer Stud. 2019, 131, 169–187. [Google Scholar] [CrossRef]
Diesch, R.; Pfaff, M.; Krcmar, H. A comprehensive model of information security factors for decision-makers. Comput. Secur. 2020, 92, 101747. [Google Scholar] [CrossRef]
Glaspie, H.W.; Karwowski, W. Human factors in information security culture: A literature review. In Advances in Human Factors in Cybersecurity; Kantola, J., Barath, T., Nazir, S., Andre, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 269–280. [Google Scholar] [CrossRef]
Derks, D.; van Mierlo, H.; Schmitz, E.B. A diary study on work-related smartphone use, psychological detachment and exhaustion: Examining the role of the perceived segmentation norm. J. Occup. Health Psychol. 2014, 19, 74–84. [Google Scholar] [CrossRef]
Kossek, E.E.; Ruderman, M.N.; Braddy, P.W.; Hannum, K.M. Work–nonwork boundary management profiles: A person-centered approach. J. Vocat. Behav. 2012, 81, 112–128. [Google Scholar] [CrossRef]
Thilagavathy, S.; Geetha, S.N. Work-life balance—A systematic review. Vilakshan-XIMB J. Manag. 2023, 20, 258–276. [Google Scholar] [CrossRef]
McCormac, A.; Zwaans, T.; Parsons, K.; Calic, D.; Butavicius, M.; Pattinson, M. Individual differences and Information Security Awareness. Comput. Hum. Behav. 2017, 69, 151–156. [Google Scholar] [CrossRef]
Siponen, M.; Soliman, W.; Topalli, V.; Vestman, T. Reconsidering neutralization techniques in behavioral cyber-security as cybersecurity hygiene discounting. SSRN 2024. [Google Scholar] [CrossRef]
Baltuttis, D.; Teubner, T.; Adam, M.T. A typology of cybersecurity behavior among knowledge workers. Comput. Secur. 2024, 140, 103741. [Google Scholar] [CrossRef]
Redmiles, E.M.; Kross, S.; Mazurek, M.L. How I learned to be secure: A census-representative survey of security advice sources and behavior. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 666–677. [Google Scholar] [CrossRef]
Lahcen, R.A.M.; Caulkins, B.; Mohapatra, R.; Kumar, M. Review and insight on the behavioral aspects of cybersecurity. Cybersecurity 2020, 3, 10. [Google Scholar] [CrossRef]
Guan, T.; Chang, S.; Deng, Y.; Xue, F.; Wang, C.; Jia, X. Oriented SAR Ship Detection Based on Edge Deformable Convolution and Point Set Representation. Remote Sens. 2025, 17, 1612. [Google Scholar] [CrossRef]
Ashforth, B.E.; Kreiner, G.E.; Fugate, M. All in a Day’S Work: Boundaries and Micro Role Transitions. Acad. Manag. Rev. 2000, 25, 472–491. [Google Scholar] [CrossRef]
Voydanoff, P.; Nippert-Eng, C.E. Home and Work: Negotiating Boundaries through Everyday Life. Contemp. Sociol. A J. Rev. 1996, 27, 153. [Google Scholar] [CrossRef]
Mazmanian, M.; Orlikowski, W.J.; Yates, J. The Autonomy Paradox: The Implications of Mobile Email Devices for Knowledge Professionals. Organ. Sci. 2013, 24, 1337–1357. [Google Scholar] [CrossRef]
Wajcman, J.; Rose, E.; Brown, J.E.; Bittman, M. Enacting virtual connections between work and home. J. Sociol. 2010, 46, 257–275. [Google Scholar] [CrossRef]
Singh, R.; Aggarwal, S.; Sahni, S. A Systematic Literature Review of Work-Life Balance Using ADO Model. FIIB Bus. Rev. 2022, 12, 243–258. [Google Scholar] [CrossRef]
Bandura, A. Moral Disengagement in the Perpetration of Inhumanities. Pers. Soc. Psychol. Rev. 1999, 3, 193–209. [Google Scholar] [CrossRef]
Willison, R.; Warkentin, M.; Johnston, A.C. Examining employee computer abuse intentions: Insights from justice, deterrence and neutralization perspectives. Inf. Syst. J. 2018, 28, 266–293. [Google Scholar] [CrossRef]
Cheng, L.; Li, W.; Zhai, Q.; Smyth, R. Understanding personal use of the Internet at work: An integrated model of neutralization techniques and general deterrence theory. Comput. Hum. Behav. 2014, 38, 220–228. [Google Scholar] [CrossRef]
Posey, C.; Bennett, R.J.; Roberts, T.L. Understanding the mindset of the abusive insider: An examination of insiders’ causal reasoning following internal security changes. Comput. Secur. 2011, 30, 486–497. [Google Scholar] [CrossRef]
Siponen, M.; Vance, A. Neutralization: New Insights into the Problem of Employee Information Systems Security Policy Violations. MIS Q. 2010, 34, 487. [Google Scholar] [CrossRef]
Mohammed, Y.; Warkentin, M.; Nehme, A.; Beshah, T. Testing a comprehensive model of employee IS misuse in a developing economy context. J. Knowl. Manag. 2025, 29, 1974–2017. [Google Scholar] [CrossRef]
Hadlington, L. Employees Attitude towards Cyber Security and Risky Online Behaviours: An Empirical Assessment in the United Kingdom. Int. J. Cyber Criminol. 2018, 12, 269–281. [Google Scholar] [CrossRef]
Pfleeger, S.L.; Sasse, M.A.; Furnham, A. From Weakest Link to Security Hero: Transforming Staff Security Behavior. J. Homel. Secur. Emerg. Manag. 2014, 11, 489–510. [Google Scholar] [CrossRef]
Vance, A.; Siponen, M.; Pahnila, S. Motivating IS security compliance: Insights from Habit and Protection Motivation Theory. Inf. Manag. 2012, 49, 190–198. [Google Scholar] [CrossRef]
Bognár, L.; Bottyán, L. Evaluating Online Security Behavior: Development and Validation of a Personal Cybersecurity Awareness Scale for University Students. Educ. Sci. 2024, 14, 588. [Google Scholar] [CrossRef]
Bulgurcu, B.; Cavusoglu, H.; Benbasat, I. Information Security Policy Compliance: An Empirical Study of Rationality-Based Beliefs and Information Security Awareness. MIS Q. 2010, 34, 523. [Google Scholar] [CrossRef]
Ng, B.-Y.; Kankanhalli, A.; Xu, Y. Studying users’ computer security behavior: A health belief perspective. Decis. Support Syst. 2009, 46, 815–825. [Google Scholar] [CrossRef]
Zwilling, M.; Klien, G.; Lesjak, D.; Wiechetek, Ł.; Cetin, F.; Basim, H.N. Cyber Security Awareness, Knowledge and Behavior: A Comparative Study. J. Comput. Inf. Syst. 2020, 62, 82–97. [Google Scholar] [CrossRef]
Tran-Truong, P.T.; Pham, M.Q.; Son, H.X.; Nguyen, D.L.; Nguyen, M.B.; Tran, K.L.; Van, L.C.; Le, K.T.; Vo, K.H.; Kim, N.N.; et al. A systematic review of multi-factor authentication in digital payment systems: NIST standards alignment and industry implementation analysis. J. Syst. Arch. 2025, 162, 103402. [Google Scholar] [CrossRef]
Radwan, R.; Zejnilovic, S. Password Reuse Is Rampant: Nearly Half of Observed User Logins Are Compromised. Cloudflare Blog. Available online: https://blog.cloudflare.com/password-reuse-rampant-half-user-logins-compromised (accessed on 26 August 2025).
Blanton, S. 50+ Password Statistics & Trends to Know in JumpCloud. Available online: https://jumpcloud.com/blog/password-statistics-trends (accessed on 26 August 2025).
Gilster, P. Digital Literacy; Wiley Computer Pub: Hoboken, NJ, USA, 1997. [Google Scholar]
DiMaggio, P.; Hargittai, E. From the ‘Digital Divide’ to ‘Digital Inequality’: Studying Internet Use as Penetration Increases; Arts and Cultural Policy Studies Working Paper Series; Princeton University Center: Princeton, NJ, USA, 2001; Volume 15, pp. 1–23. [Google Scholar]
Hargittai, E. Survey Measures of Web-Oriented Digital Literacy. Soc. Sci. Comput. Rev. 2005, 23, 371–379. [Google Scholar] [CrossRef]
Park, Y.J. Digital Literacy and Privacy Behavior Online. Commun. Res. 2011, 40, 215–236. [Google Scholar] [CrossRef]
van Deursen, A.J.; van Dijk, J.A. The digital divide shifts to differences in usage. New Media Soc. 2014, 16, 507–526. [Google Scholar] [CrossRef]
Ramadhany, A.F.; Damayanti, N.E.; Rahmania, L.A.; Inawati. Digital literacy as a cyber crime defense and prevention strategy. In Proceedings of the 9th International Seminar of Research Month 2024, Surabaya, Indonesia, 15–16 October 2024; NST Proceedings: Malang, Indonesia, 2025; pp. 778–785. [Google Scholar] [CrossRef]
Phan, B.T.; Do, P.H.; Le, D.Q. The impact of digital literacy on personal information security: Evidence from Vietnam. In Proceedings of the International Conference on Emerging Challenges: Sustainable Strategies in the Data-driven Economy (ICECH 2024), Thanh Hoa, Vietnam, 1–2 November 2024; Atlantis Press: Cambridge, MA, USA, 2025; pp. 475–489. [Google Scholar] [CrossRef]
Ismaeel, S. The impact of digital literacy on cybercrime awareness, victimization, and prevention measures: A study of cyberbullying in Saudi Arabia. Pak. J. Criminol. 2025, 17, 77–96. [Google Scholar]
McCrae, R.R.; Costa, P.T. A five-factor theory of personality. In Handbook of Personality: Theory and Research, 2nd ed.; Pervin, L.A., John, O.P., Eds.; Guilford Press: New York, NY, USA, 1999; pp. 139–153. [Google Scholar]
Gratian, M.; Bandi, S.; Cukier, M.; Dykstra, J.; Ginther, A. Correlating human traits and cyber security behavior intentions. Comput. Secur. 2018, 73, 345–358. [Google Scholar] [CrossRef]
Hadlington, L. Human factors in cybersecurity; examining the link between Internet addiction, impulsivity, attitudes towards cybersecurity, and risky cybersecurity behaviours. Heliyon 2017, 3, e00346. [Google Scholar] [CrossRef]
Junglas, I.A.; Johnson, N.A.; Spitzmüller, C. Personality traits and concern for privacy: An empirical study in the context of location-based services. Eur. J. Inf. Syst. 2008, 17, 387–402. [Google Scholar] [CrossRef]
Li, H.; Sarathy, R.; Xu, H. The role of affect and cognition on online consumers’ decision to disclose personal information to unfamiliar online vendors. Decis. Support Syst. 2011, 51, 434–445. [Google Scholar] [CrossRef]
Shappie, A.T.; Dawson, C.A.; Debb, S.M. Personality as a predictor of cybersecurity behavior. Psychol. Popul. Media 2020, 9, 475–480. [Google Scholar] [CrossRef]
Halevi, T.; Lewis, J.; Memon, N. A closer look at the self-reported behaviors of users on social networks. arXiv 2013, arXiv:1301.7643. Available online: https://arxiv.org/abs/1301.7643 (accessed on 26 August 2025).
Buil-Gil, D.; Kemp, S.; Kuenzel, S.; Coventry, L.; Zakhary, S.; Tilley, D.; Nicholson, J. The digital harms of smart home devices: A systematic literature review. Comput. Hum. Behav. 2023, 145, 107770. [Google Scholar] [CrossRef]
Wash, R.; Cooper, M.M. Who provides phishing training? Facts, stories, and people like me. In Proceedings of the 2018 ACM CHI Conference on Human Factors in Computing Systems (CHI ’18), Montreal, QC, Canada, 31 October 2018; pp. 1–12. [Google Scholar] [CrossRef]
Conard, C.F. Quantifying the Severity of a Cybersecurity Incident for Incident Reporting. Master’s Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2024. [Google Scholar]
Hanus, B.; Wu, Y. Impact of Users’ Security Awareness on Desktop Security Behavior: A Protection Motivation Theory Perspective. Inf. Syst. Manag. 2016, 33, 2–16. [Google Scholar] [CrossRef]
de Bruin, M. Individual and Contextual Variables of Cyber Security Behaviour. Master’s Thesis, University of London, London, UK, 2022. Available online: https://arxiv.org/abs/2405.16215 (accessed on 26 August 2025).
Khaliq, S.; Tariq, Z.U.A.; Masood, A. Role of user and entity behavior analytics in detecting insider attacks. In Proceedings of the 2020 International Conference on Cyber Warfare and Security (ICCWS), Norfolk, VA, USA, 12–13 March 2020; IEEE: Princeton, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Danish, M. Enhancing cyber security through predictive analytics: Real-time threat detection and response. arXiv 2024, arXiv:2407.10864. [Google Scholar]
Palan, S.; Schitter, C. Prolific.ac—A subject pool for online experiments. J. Behav. Exp. Finance 2018, 17, 22–27. [Google Scholar] [CrossRef]
Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Rasool, A.; Aslam, S.; Hussain, N.; Imtiaz, S.; Riaz, W. nBERT: Harnessing NLP for Emotion Recognition in Psychotherapy to Transform Mental Health Care. Information 2025, 16, 301. [Google Scholar] [CrossRef]

Figure 1. ROC curve for the Hungary-layer model predicting AllThreeMild incidents.

Figure 2. DET curve for the Hungary-layer model predicting AllThreeMild incidents.

Table 1. Distribution of the Responses in the Layers of the Final Sample.

Layers	Category	Number of Responses
Age Group	18–24	68
	25–34	100
	35–44	103
	45–54	121
	55–64	44
	65 or older	17
Gender	Female	220
Gender	Male	233
Education Level	High school	92
	Some college	74
	Bachelor’s degree	160
	Master’s degree	103
	Doctorate	24
Employment Sector	Education	104
	Finance/Business	48
	Healthcare	26
	IT/Technology	133
	Other	142
Employment Status	Academic/Research	27
	Company employee	284
	Freelancer/Contractor	58
	Student	50
	Other	34
Country of Residence	Germany	40
	Hungary	142
	Ireland	10
	Romania	36
	The Netherlands	13
	UK	86
	USA	103
	Other (Austria, Australia, Brazil, Malta, Spain, UAE)	23

Table 2. Model Categorization Criteria.

Metric	Threshold for Concern	Impact on Category
Minimum Events/Non-events	<10 events or <10 non-events	Automatically classified as Weak
10-fold AUC	<0.68 → Weak ≥0.74 → Good	Directly influences category
10-fold Deviance R²	<3% → Weak ≥7.5% → Good	Directly influences category
AUC Drop	>0.10 (in-sample AUC—10-fold AUC)	Triggers downgrade to Weak
R² Drop Ratio (10-fold R²—in-sample R²)/in-sample R²	>0.65	Downgrade one step (e.g., Strong → Moderate)

Table 3. Summary of Reliability and Dimensionality of Theoretical Domains.

Domain	Cronbach’s Alpha	No. of Factors (EFA)	Remark
Work–Life Blurring (WLB)	0.829	2	High reliability; multidimensionality
Risk Rationalization (RR)	0.758	1 (excluding RR7_R)	Acceptable reliability; RR7_R forms a distinct normative factor
Digital Literacy (DL)	0.823	1	High reliability and unidimensional structure
Cybersecurity Behavior (CB)	0.630	2	Low reliability; distinct behavioral subdimensions
Personality (P)	0.611	5	Low reliability; reflects multiple psychological traits

Table 4. (a) Logistic Regression Summary—All Data. (b) Logistic Regression Summary—All Data without Hungary.

(a)
Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²%	10-Fold Deviance R²%	Domains	Category
INC1	268/185	0.623	0.573	3.63	0.74	CB, DL, P, WLB	Weak
INC2	355/98	0.736	0.691	12.04	7.61	CB, P, RR, WLB	Moderate
INC3	207/246	0.725	0.671	8.39	3.66	CB, DL, P, RR, WLB	Weak
AtLeastOneMild	421/32	0.839	0.759	19.45	7.77	CB, DL, P, RR, WLB	Moderate
AtLeastTwoMild	288/165	0.745	0.704	13.33	8.2	CB, DL, P, RR, WLB	Moderate
AllThreeMild	121/332	0.684	0.644	7.62	4.2	CB, P, RR, WLB	Weak
INC4	112/341	0.694	0.633	7.34	1.53	CB, P, RR, WLB	Weak
INC5	84/369	0.662	0.609	4.5	0.7	CB, P, RR	Weak
INC6	156/297	0.721	0.658	10.46	3.25	CB, DL, P, RR	Weak
AtLeastOneSerious	202/251	0.632	0.586	3.87	1.12	CB, DL, P, RR, WLB	Weak
AtLeastTwoSerious	100/353	0.68	0.65	6.0	3.46	P, RR	Weak
AllThreeSerious	13/403	0.864	0.829	21.73	15.75	CB, DL, RR	Good
(b)
Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²%	10-Fold Deviance R²%	Domains	Category
INC1	182/129	0.702	0.653	7.21	4.53	CB, DL, P, WLB	Weak
INC2	254/57	0.810	0.711	16.06	4.70	CB, P, RR, WLB	Weak
INC3	282/29	0.839	0.738	21.14	4.90	CB, P, RR, WLB	Weak
AtLeastOneMild	294/17	0.899	0.832	34.28	17.98	CB, DL, P, WLB	Good
AtLeastTwoMild	266/45	0.816	0.770	22.85	14.75	CB, P, WLB	Good
AllThreeMild	158/153	0.739	0.672	12.88	4.97	CB, P, RR, WLB	Weak
INC4	55/251	0.740	0.704	11.80	7.15	CB, DL, P, RR	Moderate
INC5	51/255	0.757	0.699	13.55	6.35	CB, DL, P, RR, WLB	Weak
INC6	76/230	0.705	0.673	9.55	6.02	CB, DL, P, RR, WLB	Weak
AtLeastOneSerious	130/176	0.686	0.6404	7.82	3.03	CB, DL, RR, WLB	Weak
AtLeastTwoSerious	40/266	0.807	0.7236	21.28	7.74	CB, DL, P, RR, WLB	Moderate
AllThreeSerious	12/294	0.958	0.875	49.44	4.24	CB, DL, P, RR, WLB	Weak

Table 5. Logistic Regression Summary—Education Layer.

Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²	10-Fold Deviance R²	Domains	Category
INC1	56/48	0.818	0.747	24.63	9.72	WLB, RR, CB, DL, P	Good
INC2	84/20	0.884	0.820	37.95	19.43	WLB, RR, CB, DL, P	Good
INC3	87/17	0.904	0.798	44.64	12.8	WLB, RR, CB, DL, P	Weak
AtLeastOneMild	94/10	0.987	0.963	72.18	3.81	RR, P	Weak
AtLeastTwoMild	85/19	0.865	0.774	35.3	10.97	RR, CB, DL, P	Moderate
AllThreeMild	48/56	0.761	0.662	17.24	2.71	WLB, RR, CB, P	Weak
INC4	15/77	0.986	0.851	76.42	0.0	RR, CB, DL, P	Weak
INC5	9/83	0.907	0.736	44.87	0.0	RR, CB, P	Weak
INC6	21/71	0.795	0.767	19.92	14.12	RR, CB, P	Good
AtLeastOneSerious	38/54	0.869	0.771	35.56	10.95	WLB, RR, CB, DL	Moderate
AtLeastTwoSerious	7/85	0.829	0.791	16.93	9.98	RR	Weak
AllThreeSerious	0/92	nan	nan	nan	nan	–	Weak

Table 6. Logistic Regression Summary—IT or Technology Layer.

Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²	10-Fold Deviance R²	Domains	Category
INC1	91/42	0.801	0.688	23.74	0.14	WLB, RR, CB, DL, P	Weak
INC2	104/29	0.865	0.781	31.95	13.11	WLB, RR, CB, P	Good
INC3	120/13	0.778	0.690	14.98	4.99	RR, P	Moderate
AtLeastOneMild	128/5	0.885	0.722	28.49	0.0	WLB, RR, CB	Weak
AtLeastTwoMild	116/17	0.884	0.800	35.88	15.83	WLB, RR, P	Good
AllThreeMild	71/62	0.830	0.748	26.14	12.7	WLB, RR, CB, DL, P	Good
INC4	20/104	0.826	0.740	22.82	4.98	WLB, RR, CB, DL, P	Moderate
INC5	24/100	0.812	0.76	23.46	12.65	WLB, RR, CB, P	Good
INC6	30/94	0.813	0.736	25.37	9.41	RR, CB, DL, P	Moderate
AtLeastOneSerious	50/74	0.810	0.676	23.76	0.0	WLB, RR, CB, DL, P	Weak
AtLeastTwoSerious	17/107	0.845	0.776	29.66	10.49	RR, DL, P	Good
AllThreeSerious	7/117	0.964	0.837	53.82	0.0	WLB, RR, CB, P	Weak

Table 7. Logistic Regression Summary—Hungary Layer.

Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²	10-Fold Deviance R²	Domains	Category
INC1	86/56	0.759	0.680	17.97	6.11	WLB, RR, CB, DL, P	Moderate
INC2	101/41	0.789	0.706	21.25	7.43	WLB, RR, CB, DL, P	Moderate
INC3	110/32	0.769	0.710	16.53	7.23	RR, CB, DL, P	Moderate
AtLeastOneMild	127/15	0.744	0.647	12.59	1.61	WLB, DL, P	Weak
AtLeastTwoMild	105/37	0.718	0.623	9.16	0.0	RR, CB, DL, P	Weak
AllThreeMild	65/77	0.749	0.693	13.13	5.93	WLB, RR, CB, DL, P	Moderate
INC4	17/93	0.823	0.704	26.79	3.78	WLB, RR, CB, DL	Moderate
INC5	8/102	0.921	0.726	44.65	0.0	WLB, CB, P	Weak
INC6	25/85	0.906	0.793	37.7	4.96	WLB, RR, CB, DL, P	Moderate
AtLeastOneSerious	39/71	0.763	0.664	15.94	1.99	WLB, CB, DL, P	Weak
AtLeastTwoSerious	10/100	0.976	0.848	65.91	0.0	WLB, RR, CB, DL, P	Weak
AllThreeSerious	1/109	nan	nan	nan	nan	–	Weak

Table 8. Logistic Regression Summary—UK Layer.

Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²	10-Fold Deviance R²	Domains	Category
INC1	38/48	0.896	0.815	41.86	18.22	WLB, RR, P	Moderate
INC2	64/22	0.817	0.775	20.54	7.3	WLB, P	Moderate
INC3	76/10	0.939	0.827	48.34	10.14	WLB, P	Weak
AtLeastOneMild	78/8	0.963	0.884	61.21	0.0	WLB, RR, P	Weak
AtLeastTwoMild	69/17	0.981	0.930	72.22	12.38	WLB, RR, CB, DL, P	Moderate
AllThreeMild	31/55	0.920	0.832	49.61	21.64	WLB, RR, CB, DL, P	Moderate
INC4	13/73	0.790	0.669	20.98	0.0	RR, CB, DL	Weak
INC5	6/80	0.932	0.863	41.06	12.47	DL, P	Weak
INC6	17/69	0.808	0.737	21.76	12.04	WLB, RR, DL	Moderate
AtLeastOneSerious	27/59	0.700	0.655	9.56	4.23	WLB, RR, DL	Weak
AtLeastTwoSerious	8/78	0.977	0.667	66.27	0.0	WLB, RR, CB, DL	Weak
AllThreeSerious	1/85	nan	nan	nan	nan	nan	Weak

Table 9. Logistic Regression Summary—USA Layer.

Model	Events/Non-Events	AUC	10-Fold AUC	Deviance R²	10-Fold deviance R²	Domains	Category
INC1	69/34	0.743	0.700	14.79	8.76	WLB, RR, DL	Moderate
INC2	91/12	0.845	0.774	27.71	11.46	WLB, RR, DL	Moderate
INC3	96/7	0.877	0.758	36.35	0.0	WLB, CB, DL	Weak
AtLeastOneMild	99/4	0.940	0.746	50.45	0.0	WLB, DL	Weak
AtLeastTwoMild	92/11	0.986	0.951	69.7	19.53	WLB, RR, CB, DL, P	Moderate
AllThreeMild	65/38	0.755	0.690	16.93	5.39	WLB, CB, DL	Moderate
INC4	24/78	0.767	0.711	15.25	7.52	WLB, RR, P	Moderate
INC5	21/81	0.739	0.671	13.34	5.25	RR	Weak
INC6	34/68	0.838	0.758	24.74	9.06	WLB, RR, CB, DL, P	Moderate
AtLeastOneSerious	53/49	0.799	0.740	21.23	10.25	WLB, RR, P	Moderate
AtLeastTwoSerious	18/84	0.913	0.812	38.18	9.94	RR, CB, DL, P	Weak
AllThreeSerious	8/94	0.902	0.807	37.88	7.54	WLB, RR, CB, P	Weak

Table 10. Item Codes Retained in AtLeastTwoMild Logistic Regression Models Across Full Sample and Stratified Layers.

Model/Layer	Item Codes
All Data	CB1, CB3, DL7, P10, P4, P7, RR7, WLB4, WLB10
Education Layer	CB5, DL1, DL4, DL5, DL7, P1, P4, P9, RR2, RR4
Hungary Layer	CB3, DL6, P4, RR4
IT Layer	CB6, DL2, P7, RR2, RR3, RR6, RR7, WLB5, WLB6, WLB8, WLB10
UK Layer	CB6, CB7, DL2, DL3, P8, RR4, WLB4, WLB8, WLB9, WLB10
USA Layer	CB6, DL7, P7, RR2, RR6, WLB8

Table 11. Comparison of IT-Layer Models Predicting AtLeastOneSerious Incident With and Without Categorical Variables.

Model Type	AUC	Deviance R²	10-Fold AUC	10-Fold Deviance R²%	Included Predictors	Model Classification
Without Categorical Variables	0.81	23.76	0.676	0	WLB6, WLB7, WLB9, RR6, RR8, CB1, CB3, DL1, DL2, DL5, DL6, P5	Weak
With Categorical Variables	0.8841	39.19	0.789	9.2	WLB4, WLB9, RR2, RR6, CB1, CB6, DL5, DL6, P5, P7, P10, D1, D2, WLB4 × D2, WLB9 × D1, CB6 × D1, P10 × D1	Moderate-to-Strong

Table 12. Confusion Matrices for the Three Optimization Criteria.

Criterion	Threshold	TP	FP	TN	FN	Precision	Recall	Specificity
Youden J	0.47	49	22	55	16	0.69	0.754	0.714
F1 Score	0.47	49	22	55	16	0.69	0.754	0.714
High Recall	0.17	62	68	9	3	0.477	0.954	0.117

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bognár, L. Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach. J. Cybersecur. Priv. 2025, 5, 67. https://doi.org/10.3390/jcp5030067

AMA Style

Bognár L. Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach. Journal of Cybersecurity and Privacy. 2025; 5(3):67. https://doi.org/10.3390/jcp5030067

Chicago/Turabian Style

Bognár, László. 2025. "Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach" Journal of Cybersecurity and Privacy 5, no. 3: 67. https://doi.org/10.3390/jcp5030067

APA Style

Bognár, L. (2025). Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach. Journal of Cybersecurity and Privacy, 5(3), 67. https://doi.org/10.3390/jcp5030067

Article Menu

Predicting Cybersecurity Incidents via Self-Reported Behavioral and Psychological Indicators: A Stratified Logistic Regression Approach

Abstract

1. Introduction

2. Literature Review

2.1. Behavioral-Cognitive Domains of Cyber Risk

2.2. Self-Reported Cybersecurity Incidents: Typology and Prior Research

2.3. Predictive Modeling in Cybersecurity Research

3. Methodology

3.1. Participants and Procedure

3.2. Survey Design and Item Construction

3.2.1. Item Development and Theoretical Alignment

3.2.2. Demographic and Work-Style Variables

3.2.3. Incident Outcome Variables

3.3. Predictive Modeling Strategy

4. Results

4.1. Internal Consistency and Dimensionality of Domain Items

4.2. Model Results—All Data

4.3. Model Results—Layered Models

4.3.1. Education Layer

4.3.2. IT and Technology Layer

4.3.3. Country-Specific Layers: Hungary, UK, and USA

4.3.4. Specific Items of the Domains as Illustration

4.4. Post Hoc Evaluation: Individual-Level Predictors and Threshold Performance

4.4.1. Inclusion of Categorical Variables

4.4.2. Optimal Probability Threshold Selection for Classification

5. Discussion

5.1. Summary of Findings

5.2. Interpretation and Practical Implications

5.2.1. Why Layered Models Outperform General Models

5.2.2. What Incident Outcomes Are Viable for Modeling

5.2.3. When to Use Contextual or Demographic Variables or to Optimize Classification Thresholds

5.2.4. Threshold Selection and the Risk of Overfitting

5.2.5. Model Deployment in Organizational Settings

5.3. Limitations and Future Work

5.3.1. Rare Event Limitations

5.3.2. Generalization Risks in Small Subgroups

5.3.3. Biases from Self-Report and Convenience Sampling

5.3.4. Future Research Directions

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI