Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks

Kuldeyev, Nursultan; Mamyrbayev, Orken; Akhmediyarova, Ainur; Yerzhan, Assel

doi:10.3390/electronics15112400

Open AccessArticle

Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks

¹

Institute of Automation and Information Technologies, Satbayev University, Almaty 050013, Kazakhstan

²

Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(11), 2400; https://doi.org/10.3390/electronics15112400

Submission received: 20 April 2026 / Revised: 26 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Identifying insider threats in modern enterprise environments presents a unique cybersecurity challenge. Although malicious activity may often appear to be legitimate user activity, it is difficult to recognize the distinction. This study presents an innovative approach to insider threat detection by analyzing enterprise activity logs for session-level behavioural risk monitoring with behavioural biometrics. Behavioural patterns are modelled as temporal sequences across consecutive monitoring windows to capture both short-term behavioural intensity and long-term behavioural drift. The proposed system utilizes a hybrid deep learning architecture that includes a Long Short-Term Memory (LSTM) network and an autoencoder model to model temporal dependence of a user’s behaviour and to identify anomalies through reconstruction error analysis. The LSTM network captures user’s sequential activity and autoencoder determines variance from the user’s typical behavioural profile. The outputs of both models are aggregated using a unified behavioural risk scoring mechanism for session-level risk monitoring and ongoing insider threat assessment. The experimental results from Insider Threat Dataset for Corporate Environments demonstrate that proposed approach is effective in classifying normal versus malicious behaviours of users. The proposed framework achieves an accuracy of 97.65%, a precision of 96.35%, a recall of 99.05%, an F1-score of 97.68%, and a ROC-AUC of 99.20% on a near-balanced benchmark split. Under realistic class imbalance conditions, the framework achieves a PR-AUC of 0.842 and MCC of 0.781, representing the more operationally conservative performance estimate. These findings confirm that the proposed framework constitutes a viable solution for integrating behavioural modelling and anomaly detection within continuous enterprise authentication systems.

Keywords:

insider threat detection; behavioural biometrics; session-level risk monitoring; temporal anomaly smoothing; deep learning; LSTM; autoencoder; behavioural risk analysis

1. Introduction

Digital transformation has fundamentally changed how enterprises operate, with organizations increasingly depending on interconnected systems, cloud infrastructures, and large-scale data environments [1]. While these technological advances improve operational efficiency and productivity, they simultaneously introduce complex cybersecurity challenges that organizations must address [2]. Insider threat can be recognized as one of the most significant types of threat to organisations because insider threats can be generated by someone with legitimate access to organisational systems [3]. In contrast to external attacks, insider threats tend to be difficult to detect because the actions of authorised users are often indistinguishable from normal activities [4,5]. Enterprises produce large volumes of logs that contain records of logins, file access, email records, and network browsing, which offer valuable insight into how users behave in the enterprise; therefore, advanced analytic solutions need to be used to monitor user behaviours in order to provide timely identification of potential threats to a business’s security [6].

There can be many reasons for insider threats which come from a variety of organizational, behavioural and technical influences affecting users’ actions on an enterprise system [7]. The most common reason why malicious insider threat occurs is because insiders exploit their privileged access privileges to steal sensitive information, sabotage organizational resources, or gain financial gain [8]. There are also other times when insider threats occur because of human errors or lack of cybersecurity awareness [9] (unintentional or accidental misuse of an organization’s resources). Some of these contributing factors are weak access control mechanisms, excessive privilege allocations, limited monitoring of user activity and credential compromised by phishing and social engineering type attacks [10]. Because of the increased prevalence of remote work environments, along with distributed enterprise networks, the difficulty in monitoring employee behavior has increased for organizations to be able to detect abnormal or unauthorized access patterns through window-based behavioural analysis [11].

Existing studies have proposed various technical solutions for identifying insider threats and enhancing enterprise security [12]. Traditional security solutions are based on rules, known as rules-based monitoring solutions; intrusion detection systems are based on signatures, also known as signature-based intrusion detection systems [13]; and the other type of traditional security solution is based on static user identification, also known as static authentication methods. These traditional security solutions do not typically offer the capabilities needed to identify insider threats because it can be difficult to identify; as a result, these solutions do not work for sophisticated insider threats because malicious activity may occur outside of known malicious patterns [14]. In order to address this issue, a number of studies have begun to look at the application of machine (ML) and deep (DL) learning algorithms to analyze large volumes of behaviour data collected over time within enterprise environments [15]. Sequential learning algorithms, such as (LSTM) models, have been shown to effectively learn from the temporal dependencies of user activity, which allows them to identify abnormal behaviour over time. Likewise, reconstruction-based neural networks (e.g., Autoencoders) can be used to model normal behaviours and identify behaviours that deviate from this norm and indicate possible insider threats. Refs. [16,17] have independently explored LSTM-based sequential modelling and autoencoder-based reconstruction for insider threat detection, no existing framework simultaneously addresses three co-existing challenges in enterprise environments: (1) the instability of single-window anomaly decisions under natural behavioural drift, (2) the absence of a session-level authentication continuity mechanism that aggregates risk across evolving behavioural windows, and (3) the lack of a unified scoring architecture that dynamically balances temporal learning sensitivity against reconstruction stability based on validation-tuned fusion. Existing hybrid systems simply combine LSTM and Autoencoder outputs as parallel classifiers for static, session-level threat decisions, without any principled mechanism for resolving conflicting signals or tracking risk across time. The proposed framework addresses these limitations by introducing three novel architectural mechanisms: a validation-tuned weighted fusion strategy (empirically justified via sensitivity analysis), a persistence-based escalation mechanism requiring anomaly exceedance across k = 3 consecutive windows (reducing false positive escalation from 18% to below 5% while maintaining 99.05% recall), and a session-persistent Session-Level Behavioural Risk Monitoring engine that dynamically updates cumulative behavioural risk across rolling time windows mid-session.

Contributions of the Study:

The contribution of this work lies primarily in the systematic integration and refinement of established behavioural modelling and anomaly detection techniques for session-level insider threat monitoring. The framework introduces several practical enhancements designed to improve detection stability and reduce false positive escalation in enterprise environments.

🡺: A session-persistent behavioural risk monitoring architecture that evaluates insider threat likelihood across consecutive time windows rather than isolated session snapshots, enabling detection of gradual behavioural drift that single-window classifiers miss.
🡺: A validation-tuned late-fusion strategy that dynamically weights temporal sequence sensitivity against reconstruction stability, with empirical justification via sensitivity analysis.
🡺: A persistence-based escalation mechanism that requires sustained anomaly exceedance across k = 3 consecutive windows before security escalation, specifically designed to suppress transient behavioural spikes and reduce false positive escalation in continuous enterprise monitoring.
🡺: Behavioural analysis is done to study the risk score distribution, time series development of insider threat risk, and user authentication stability.

A comprehensive evaluation including ablation study, fusion parameter sensitivity analysis, cross-validation, imbalanced data assessment, and statistical significance testing, providing reproducible evidence of framework reliability.

Research Organization

The remainder of this paper is organized as follows. Section 2 presents a review of related studies on insider threat detection and behavioural analysis techniques used in enterprise security environments. The proposed methodology in Section 3 contains system architecture and data preprocessing procedures and feature extraction process and LSTM–Autoencoder model development for behavioural anomaly detection. The experimental setup and dataset description, which includes preprocessing steps for Insider Threat Dataset for Corporate Environments, are detailed in Section 4. The results and performance evaluation, which includes behavioural risk score analysis and temporal risk evolution and session authentication stability and confusion matrix evaluation and overall performance metrics of the proposed LSTM–Autoencoder Framework of Section 5 presents. The study ends with Section 6, which presents research directions for future work to enhance insider threat detection systems.

2. Literature Review

M. S. Mohamed and A. Arabo [18] proposed a SIEM-integrated insider threat detection framework using behavioural analytics on enterprise logs. Their system improves real-time detection, but lacks advanced temporal modelling of user behaviour. S. S. P. Pennada et al. [19] used machine learning and deep learning for behavioural analysis of user activities. However, the method does not capture long-term temporal dependencies. S. S. Abba et al. [20] proposed a continuous authentication framework using behavioural biometrics for zero-trust remote work environments. However, their system focuses on identity re-verification through static biometric templates and does not incorporate anomaly-based risk modelling. Unlike their approach, the proposed Session-Level Behavioural Risk Monitoring Model aggregates behavioural risk across consecutive temporal windows and applies persistence-based escalation for insider threat detection, extending beyond identity verification toward dynamic risk assessment. O. O. Aramide [21] applied AI-based behavioural authentication for identity verification, but it is not focused on insider threat detection.

I. Ibraheem et al. [22] proposed a user behaviour analytics framework for anomaly detection, but it lacks deep sequential learning for long-term behaviour modelling. J. Hu et al. [23] focused on authentication traceability in enterprise systems, but does not include behavioural anomaly detection. U. Uslu et al. [24] evaluated deep learning models for continuous authentication using behavioural biometrics, concentrating on persistent user identity verification. However, their research does not address insider threat detection or behavioural anomaly scoring. In contrast, the proposed Session-Level Behavioural Risk Monitoring Model goes beyond authentication by accumulating risk scores across multiple monitoring windows and triggering escalation only upon sustained anomalous behaviour, rather than performing periodic identity re-confirmation. A. Orun et al. [25] focused on cognitive behavioural authentication, but lacks anomaly-based detection capability. X. Tao et al. [26] used temporal convolutional networks for authentication, but does not include risk-based anomaly scoring. Manoharan et al. [1] Propose a BiLSTM-based insider threat model utilizes sequential user behavior data to capture both forward and backward temporal dependencies for improved detection accuracy. It effectively learns time-dependent patterns in user activities for anomaly classification. However, it is limited to single-source log data and lacks multi-dimensional feature integration. Yi and Tian [27] proposed hybrid insider threat detection system uses a CNN-based approach to combine user activity log spatial feature extraction with SMOTE and ADASYN class imbalance solutions. The system fails to maintain continuous monitoring because it cannot track long-term temporal connections and it cannot combine different behavioral data streams needed for evolving insider threat detection systems [28] propose a BRITD model that analyzes user behavior by capturing rhythmic and sequential patterns using stacked BiLSTM networks. It effectively detects evolving insider threats by learning temporal dependencies in user activities. However, its performance depends on historical behavior patterns and may be less effective in identifying completely new or abrupt attack behaviors.

Existing studies have investigated sequential behavioural modelling for insider threat detection using recurrent deep learning architectures. For example, Villarreal-Vasquez et al. [6] employed LSTM-based sequential learning to capture temporal dependencies in user behavioural activity logs for anomaly detection. Similarly, other behavioural sequence modelling approaches have demonstrated the effectiveness of temporal learning for identifying evolving insider threat patterns. Reconstruction-based anomaly detection has also been explored independently in the literature. Saminathan et al. [17] utilized autoencoder-based behavioural reconstruction to model normal user activity and identify anomalous deviations through reconstruction error analysis. In addition, multi-stage and layered insider threat detection frameworks have been proposed to improve behavioural monitoring and anomaly identification. He et al. [16] introduced a double-layer insider threat detection framework combining behavioural analysis and multi-level anomaly assessment mechanisms. While these studies demonstrate the effectiveness of individual sequential modelling or reconstruction-based anomaly detection strategies, existing approaches generally lack a unified session-persistent behavioural risk monitoring architecture that combines temporal sequence learning, reconstruction-based deviation analysis, and persistence-aware continuous authentication within a single integrated framework. Existing studies have demonstrated the effectiveness of LSTM-based temporal modelling and autoencoder-based anomaly detection for insider threat detection. However, most approaches perform detection at the individual session level and provide limited support for persistence-aware behavioural risk monitoring across consecutive time windows. The proposed framework addresses this limitation by integrating temporal sequence learning, reconstruction-based anomaly detection, weighted risk fusion, and session-level risk monitoring within a unified behavioural assessment framework, improving detection stability and reducing false positive escalation.

Problem Statement

The detection of insider threats within the enterprise workplace remains challenging due to advancements in cybersecurity technologies, the complexity of user behaviour, and the instability of malicious behaviour [29]. The existing security solutions rely primarily on traditional rule-based methods (e.g., rule-based monitoring systems, signature-based intrusion detection systems, static authentication) that rely heavily on pre-defined rules or just validating a user’s credentials [30]. These traditional security solutions are ineffective for detecting sophisticated insider attacks because they allow an authorized user to conduct malicious actions, which will appear similar to normal operations and thus will be undetectable by these traditional methods [31]. Additionally, the growing amount of data being processed in the enterprise and increasing amounts of data being used in remote working environments combined with the growing complexity of enterprise networks have made it more difficult for organizations to conduct continuous monitoring of user activity and detect anomalies across consecutive monitoring windows. Therefore, a proposed approach to overcome the challenges associated with detecting and mitigating insider threats is to develop a Session-Level Behavioural Risk Monitoring Model framework for insider threats that uses a behavioural biometrics approach to identify anomalies utilising user activity logs from the data through the application of deep learning models (e.g., LSTM, Autoencoder) that will capture temporal behaviours of users to detect anomalies and thus increase the accuracy and reliability of insider threat detection in enterprise networks.

3. Methodology

The proposed framework presents the Figure 1 behavioural biometrics–driven approach for Session-Level Behavioural Risk Monitoring Model and insider threat detection in enterprise environments. The process begins with collection of enterprise activity logs which include insider threat dataset login events and file access and network interactions. The system undergoes a data preprocessing stage which performs cleaning and normalization to ensure data quality and consistency. User activity pattern indicators are developed through the process of behavioural feature extraction. The LSTM model analyzes behavioural sequences because it learns how users behave over time through its ability to capture temporal dependencies. The autoencoder model identifies abnormal user behaviour through its behaviour reconstruction process which uses reconstruction error as an anomaly indicator. The system integrates LSTM temporal modelling outputs with autoencoder reconstruction results to generate an insider threat score which measures both temporal behavioural changes and reconstruction anomalies. The risk score enables a Session-Level Behavioural Risk Monitoring Model which monitors user behaviour through multiple windows to identify normal activities and potential insider threats. The proposed system demonstrates reliability for proactive insider threat detection through its multiple evaluation methods which include accuracy and precision and recall and F1-score and confusion matrix analysis and behavioural risk assessment through risk score distribution and aggregated insider risk trends and temporal risk evolution and session authentication stability and behavioural stability comparison.

3.1. Behavioural Feature Extraction

The process of feature extraction identifies essential user behavior patterns which security experts use to detect unusual user activities. The enterprise behavioural insider threat dataset provides pre-processed logs that researchers use to study various user interaction patterns which occur when users access enterprise systems through their login activities and file access and network usage and email communication.

The behavioural signs which users present to organization enable security professionals to analyse user behavior patterns which help them identify normal activity and potential security threats. The system stores user behavior data in structured feature representations which researchers use to track user behavior over different time periods. The system uses these representations as inputs for sequential learning models which include LSTM and Autoencoder and reconstruction-based learning models to identify patterns of normal behavior which help detect insider threats.

3.1.1. User Login Pattern Features

The system tracks user access patterns to determine when users access enterprise system and how often they access it. The system tracks three user authentication patterns, which include login frequency and login time distribution and session duration. Login frequency represents how often a user logs into the system within a specific time period. It can be expressed as shown in Equation (1).

L F_{u} = \sum_{i = 1}^{n} L_{i}

(1)

where

L F_{u}

denotes the login frequency of user

u

,

L_{i}

represents an individual login event, and

n

indicates total number of login events observed during the monitoring period. Higher or unusual login frequencies may indicate suspicious behavior.

L F_{u}

is a scalar count; its diagnostic power emerges through temporal evolution across consecutive windows. The LSTM captures sequential login anomalies such as after-hours access, rapid re-authentication, or credential misuse that are invisible in any single-window snapshot.

3.1.2. File Access Behavior

The system tracks all user activities that show how they interact with the organization’s documents and files. Abnormal file access activities, such as accessing a large number of files or unusual file types, may indicate insider threats. The file access rate can be computed as shown in Equation (2).

F A_{u} = \frac{N_{f}}{T}

(2)

where

F A_{u}

denotes the file access rate for user

u

,

N_{f}

represents the number of files accessed within the observation period, and

T

denotes the duration of that interval. Significant deviations in

F A_{u}

may indicate data exfiltration behaviour.

F A_{u}

serves as a per-window marker whose significance is amplified by multi-window modelling. Sustained elevation across windows rather than any single high value characterises data exfiltration, which the autoencoder detects as a reconstruction deviation from the user’s learned behavioural profile.

3.1.3. Network Activity Features

These features represent user interactions resources and network services. The features incorporate data about visited URLs and request frequency and browsing behavior. The network request frequency can be expressed as shown in Equation (3).

N R_{u} = \sum_{j = 1}^{m} R_{j}

(3)

where

N R_{u}

denotes the total number of network requests generated by user

u_{r}

,

R_{j}

represents an individual network request, and

m

is total number of requests within observation period. Unusual browsing patterns or excessive network requests may indicate suspicious activities.

N R_{u}

gains behavioural meaning through its temporal trajectory. Normal users exhibit predictable browsing rhythms, whereas insider threats manifest as irregular bursts of external communication or off-hours access patterns captured by the LSTM’s sequential modelling rather than the instantaneous value alone.

3.1.4. Email Communication Behavior

It can be tracks the patterns that occur when employees send and receive emails through the enterprise network. Employees who attempt to commit insider threats will display unusual communication patterns which include sending excessive emails and sending large file attachments. Email activity can be quantified as shown in Equation (4).

E A_{u} = E_{s} + E_{r}

(4)

where

E A_{u}

denotes total email activity of user

u

,

E_{s}

represents number of sent emails, and

E_{r}

represents the number of received emails. Abnormally high email activity or unusual attachment sharing may indicate potential insider threats.

E A_{u}

reflects the user’s communication footprint. Gradual increases in external emails or anomalous attachment behaviour typical of pre-resignation data leakage appear as temporal deviations that the autoencoder detects through reconstruction error on the learned normal communication baseline.

The 16 behavioural features were selected based on three criteria: relevance to documented insider threat patterns, temporal discriminability across monitoring windows, and anomaly sensitivity confirmed during validation. The five feature categories authentication, file access, network activity, email communication, and system usagecollectively cover all enterprise interaction channels through which insider threats are known to manifest, preventing evasion through suppression of any single indicator. While Equations (1)–(4) define low-level per-window aggregates, the behavioural complexity of insider threats is encoded in their sequential evolution rather than in isolated values. The LSTM transforms these aggregates into rich temporal embeddings across 30 consecutive windows, while the autoencoder detects deviations from the joint normal distribution of the full 16-feature vector. A sophisticated insider suppressing one observable metric will inevitably perturb correlated feature dimensions, raising reconstruction error across the full behavioural profile. This multi-dimensional robustness is consistent with findings [32], who demonstrate that principled feature selection in deep learning systems mitigates adversarial transferability by over 60% in network security contexts, underscoring that feature design directly governs model robustness against evasive behaviours.

The feature extraction in the proposed framework operates at two hierarchical levels. At the first level, user activities occurring within each 30-min monitoring window are aggregated into a fixed-dimensional behavioural feature vector of size 16. Scalar statistics such as login frequency (Equation (1)), file access rate (Equation (2)), network request count (Equation (3)), and email activity (Equation (4)) are computed independently for each window, capturing the behavioural intensity within that interval. At the second level, 30 such consecutive window-level feature vectors are organized in chronological order to form a temporal sequence of shape (30 × 16), representing approximately 7.5 h of user activity. This sequence constitutes the input to the LSTM model, enabling temporal dependency learning across windows rather than within them. The autoencoder receives the same (30 × 16) input for reconstruction-based anomaly detection. This two-level design intra-window aggregation followed by inter-window sequential modelling is fundamental to the proposed framework’s ability to detect both short-term behavioural deviations and gradual long-term drift. Behavioral features such as login frequency, file access rate, and email activity are computed within each 30-min time window. Specifically, login frequency represents the number of login events per window, file access rate denotes the number of file operations within the window, and email activity captures the number of emails sent or received during the same interval.

The extracted behavioural features created a complete view of how users interacted with enterprise system. The resulting feature vectors are subsequently used as input to models such as LSTM and Autoencoder to detect anomalous behaviours associated with insider threats. Categorical attributes present in the dataset including accessed URL categories, file type classifications, and device identifiers were transformed into numerical representations using one-hot encoding for low-cardinality features (fewer than 10 distinct values). High-cardinality categorical attributes with rare categories appearing in fewer than 1% of records were consolidated into a single “Other” group before encoding. This grouping threshold was determined exclusively from the training subset and the resulting category map was applied to the testing subset without re-fitting, ensuring no information leakage. Session duration features were derived by computing the difference between the first and last event timestamps within each 30-min window. Features exhibiting near-zero variance across the training set (variance < 0.001) were excluded from the final feature vector, as such features provide negligible discriminative information. The final 16-feature set was validated by confirming that each feature contributed meaningfully to model performance.

3.1.5. Categorical Feature Encoding

Several behavioural attributes in the enterprise activity logs are categorical in nature and must be transformed into numerical representations before being processed by the LSTM and Autoencoder models. The categorical variables present in the dataset include accessed URL categories, file type classifications, email activity labels, and device interaction identifiers. Since deep learning models require numerical inputs, a two-stage encoding strategy was applied to handle these variables systematically. In the first stage, rare category consolidation was performed to reduce noise from infrequent values. Categorical features often contain low-frequency values that contribute minimal discriminative information and may negatively affect model training stability. Any category appearing in fewer than 1% of training records was consolidated into a unified “Other” group prior to encoding. This consolidation mapping was derived exclusively from the training subset and applied to the test subset without re-fitting, ensuring no information leakage between data partitions.

In the second stage, one-hot encoding was applied to all categorical features containing fewer than 10 distinct values after consolidation. Each categorical variable was transformed into a binary indicator vector, where a value of 1 indicates the presence of a particular category and 0 indicates its absence. This encoding preserves the nominal nature of categorical variables without imposing artificial ordinal relationships between unordered categories such as URL types, file classifications, or email activity labels. Label encoding was explicitly avoided because it assigns arbitrary integer ranks to unordered categories, which can mislead the sequential learning process by implying magnitude relationships that do not exist between categories. The resulting encoded binary vectors were concatenated with standardized numerical behavioural features, including login frequency, file access rate, network request counts, and email activity counts, to construct the final 16-dimensional feature vector used as input to both the LSTM and Autoencoder models. This unified representation ensures that both continuous and categorical behavioural signals contribute consistently to insider threat modelling and anomaly detection across sequential monitoring windows.

3.1.6. Feature Aggregation Procedure

Each 30-min monitoring window is defined as the set of all raw activity log records falling within the interval for a given user. Within each window, the 16 behavioural features are computed as follows: For each 30-min monitoring window, all raw activity log records with timestamps falling within that interval are grouped by user identifier. Within this group, each of the 16 behavioural features is computed as a scalar statistic summarising the user’s activity during that window. Count-based features (login frequency, file access count, network request count, email sent count, email received count, failed login attempts, privilege escalation events) are computed as simple event totals within the window. Rate-based features (file access rate per minute, network request rate) are computed by dividing event counts by the window duration in minutes (30). Duration features (session duration) are computed as the difference in seconds between the first and last event timestamps within the window. Binary indicator features (e.g., presence of sensitive file access, external URL access) take value 1 if at least one qualifying event occurred within the window and 0 otherwise. Categorical features (URL category, file type) were one-hot encoded after rare category consolidation (categories appearing in fewer than 1% of training records consolidated to ‘Other’), then concatenated with numerical features to form the full feature vector.

The resulting per-window feature vectors for each user are sorted in ascending chronological order. Thirty consecutive vectors are arranged into a matrix of shape (30 × 16), where rows correspond to time steps (windows) and columns correspond to features. This matrix constitutes one input sequence for the LSTM and Autoencoder models. All 16 features are standardised using min-max normalization bounds computed from the training subset before sequence assembly, ensuring consistent numerical scale across all feature dimensions.

3.2. Deep Learning-Based Behavioural Sequence Models

The analysis of user behavioural patterns needs to include their time-based security track record, which enables organizations to identify insider threats that exist within their corporate networks. User activities such as login attempts, file access operations, network browsing, and email communications occur sequentially over time and form behavioural patterns that can reveal abnormal activities. The detection of sophisticated behavioural changes needs more than static analysis because attackers create their actions to look like normal user behavior. The system uses sequential learning techniques to track user activity patterns through time and to detect hidden changes in behavior. The study uses deep learning models to analyze time-series behavioural data, enabling the detection of suspicious user activities.

3.2.1. LSTM-Based Behavioural Sequence Model

The LSTM network serves as an effective tool for modelling user activity patterns that evolve over time. The LSTM network functions as a specialized type of recurrent neural network which enables sequential data analysis through its ability to maintain extended connections while solving vanishing gradient issue found in standard recurrent networks. The LSTM system uses its internal memory cell function to store vital behavioural data throughout several time periods. This function enables model to improve its ability to identify abnormal user activities by mastering complex user activity patterns.

Figure 2 shows the LSTM architecture which proposed framework uses. The LSTM network functions to analyze user behavior by processing their sequential activity data through its ability to detect time-based relationships. The system uses three main gates which include memory cell uses three gates to govern data movement through its system. The forget gate determines which information from previous cell state should be retained or discarded., while input gate integrates new behavioural information obtained from current user activity features. The output gate generates hidden state that represents the learned behavioural pattern at each time step. The LSTM model in proposed insider threat detection framework analyzes sequential behavioural features which include login activities and file access patterns and session interactions from the Insider Threat Dataset for Corporate Environments. The model learns time-based patterns of normal user behavior which allows it to detect suspicious behavior that matches insider threats thus enabling Session-Level Behavioural Risk Monitoring Model and dynamic behavioural risk assessment for enterprise environments.

The hidden state of LSTM network at time step

t

is computed as shown in Equation (5).

h_{t} = f (W_{h} x_{t} + U_{h} h_{t - 1} + b_{h})

(5)

where

h_{t}

represents the hidden state at time step

t

,

x_{t}

denotes input behavioural feature vector at time

t

,

h_{t - 1}

represents the hidden state from the previous time step,

W_{h}

and

U_{h}

are weight matrices associated with the input and recurrent connections,

b_{h}

denotes the vector of bias, and

f (\cdot)

represents the activate function.

To regulate the flow of information through the network, LSTM employs gating mechanisms. The input gate controls how much new information should be stored in the memory cell and is defined as shown in Equation (6).

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(6)

where

i_{t}

represents the input gate activation,

W_{i}

and

U_{i}

denote the weight matrices,

b_{i}

is the bias term, and

σ

represents the sigmoid activation function.

The ‘forget gate’ is responsible for the extent of memory information from the previous memory state that needs to be retained and not quashing or forgetting discarded, as shown in Equation (7).

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(7)

where

f_{t}

denotes the forget gate activation, and

W_{f}

,

U_{f}

, and

b_{f}

represent the corresponding parameters.

The memory cell state is updated by merging the retained and newly learned information, as defined in Equation (8).

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(8)

where

{\tilde{C}}_{t} = t a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

denotes the candidate memory content.

C_{t}

represents the updated memory cell state,

C_{t - 1}

denotes the previous memory state, and

{\tilde{C}}_{t}

represents the candidate memory content generated by the network.

3.2.2. Autoencoder-Based Anomaly Detection

The system employs an anomaly detection mechanism based on an Autoencoder model to detect unusual user behavior in enterprise environments. Autoencoders function as deep learning models that operate without supervision to develop compact data representations which they use to reconstruct entire input systems through their compression and reconstruction process. The researchers trained an autoencoder model with normal user behavior data taken from enterprise activity logs which they collected from the enterprise behavioural insider threat dataset. The model develops its understanding of user behavior through training which enables it to identify authentic user actions. The system uses reconstruction error measurement to detect potential insider threats when trained model receives new behavioural patterns which differ from its established normal operating patterns.

The insider threat detection system uses an autoencoder architecture which is shown in Figure 3. The autoencoder consists of two primary components, namely encoder and decoder, connected through a latent representation layer. The encoder transforms high-dimensional behavioural input data into a compact latent feature space that captures the essential characteristics of user activities. The decoder reconstructs the original behavioural feature representation from the latent representation by minimizing reconstruction loss during training. The autoencoder uses normal user behavior data which is collected from the Insider Threat Dataset for Corporate Environments to train its system. The model produces higher reconstruction errors when users display abnormal behavior or engage in malicious activities because these actions differ from established behavior patterns. The system uses reconstruction deviations as indicators for detecting unusual behavior which helps to calculate a behavioural risk score within complete detection system. The autoencoder system enhances insider threat detection and Session-Level Behavioural Risk Monitoring Model systems for enterprise environments by creating compact behavioural models and detecting reconstruction anomalies.

1.: Encoder Layer

The encoder network converts high-dimensional behavioural feature vector data into a compact latent representation which maintains essential user behavior attributes. The model achieves its main objective by converting data into fewer dimensions which allow it to concentrate on important behavioural elements while eliminating unnecessary data. The encoding process is mathematically represented as shown in Equation (9).

z = f (W_{e} x + b_{e})

(9)

where

z

denotes the latent representation,

x

represents the input behavioral feature vector,

W_{e}

denotes the encoder weight matrix,

b_{e}

represents the bias vector, and

f (\cdot)

is the nonlinear activation function applied in the encoder layer.

2.: Decoder Layer

The decoder network reconstructs original behavioural feature vector from a latent representation generated by encoder. The decoder needs to produce a complete and exact match of an input data. The system produces inaccurate reconstruction results when the input behavioural pattern differs from normal patterns which the system learned during training. The decoding process is defined as shown in Equation (10).

\hat{x} = g (W_{d} z + b_{d})

(10)

where

\hat{x}

denotes reconstructed feature vector,

z

represents latent representation obtained from the encoder,

W_{d}

denotes the decoder matrix of weight,

b_{d}

represents the bias vector, and

g (\cdot)

denotes the activation function used in decoder layer.

3.: Reconstruction Error

The reconstruction error between original input and reconstructed output is used to check whether or not a behavioural sequence is normal or anomalous. This is the error used as anomaly score in order to signal dangerous user behaviours. The error of reconstruction is calculated as shown in Equation (11).

E = ‖ x - \hat{x} ‖^{2}

(11)

where

E

represents the reconstruction error,

x

denotes the original behavioral feature vector, and

\hat{x}

represents reconstructed feature vector produced by the autoencoder model. Increase in reconstruction error implies an even greater deviation of normal behavioural patterns and could be an indicator of possible insider threat activities in the enterprise system.

3.2.3. Unified Insider Threat Score

The combined insider threat score is a combination of both the temporal behavioural deviation and reconstruction-based anomaly scores that produces a powerful and dynamic risk indicator. Because insider threats can frequently be said to be subtle changes in behaviour over time as opposed to a sudden abnormal behaviour, an outlier measure can result in an unstable detection. Hence, suggested architecture is a combination of successive dependency learning as implemented by the LSTM with reconstruction error as implemented an autoencoder.

The temporal anomaly component is derived from LSTM hidden representation represented by Equation (12):

S_{L S T M} (t) = g (h_{t})

(12)

where

h_{t}

represents the temporal behavioral embedding at time window

t

, and

g (\cdot)

is a transformation function (e.g., fully connected layer with sigmoid activation) that converts hidden state into a probability-based anomaly score. This score captures deviations in evolving user behavior patterns across time windows.

The reconstruction-based anomaly component is defined as Equation (13):

S_{A E} (t) = L_{r e c} (t)

(13)

where

L_{rec} (t)

denotes the mean squared reconstruction error between actual behavioral features and reconstructed features. High reconstruction error values demonstrate that data is acting unusually as compared to normal behavioural observations.

The final insider threat score is computed as a weighted combination denoted by Equation (14):

S_{final} (t) = α S_{L S T M} (t) + (1 - α) S_{A E} (t)

(14)

where:

$α \in [0, 1]$ controls the relative importance of temporal learning versus reconstruction deviation.
When $α$ is higher, the system emphasizes sequential behavioral evolution.
When $α$ is lower, reconstruction deviation plays a dominant role.

The weighted fusion in Equation (14) is theoretically grounded in the Naive Bayes classifier combination framework [33], which demonstrates that when two classifiers produce conditionally independent error distributions, the optimal score-level fusion reduces to a linear combination of individual scores. As detailed in Algorithm 1, the pipeline generates two complementary scores:

S_{L S T M} (t)

and

S_{A E} (t)

. Since

S_{L S T M} (t)

originates from discriminative sequential learning and

S_{A E} (t)

from unsupervised reconstruction, their errors are structurally independent the LSTM captures known temporal threat patterns while the autoencoder detects unknown deviations from normality. The convexity constraint (α + (1 − α) = 1) ensures

S_{final} (t)

remains within a probability-interpretable range [0, 1], preserving the semantic validity of the threshold comparison. The fusion weight α = 0.7 is not an arbitrary selection but an empirically calibrated estimate of the relative posterior reliability of each model’s signal, determined through validation-based sensitivity analysis.

The fusion mechanism in differs from prior hybrid approaches in two critical ways. First, existing LSTM-Autoencoder systems evaluate temporal and reconstruction scores independently against separate thresholds, meaning a user could be flagged by one model and cleared by the other with no principled resolution mechanism. The proposed weighted fusion resolves this by producing a single unified risk value whose relative model contributions are empirically calibrated via validation-set sensitivity analysis, rather than assuming equal contribution or selecting weights arbitrarily. Second, the fusion output feeds directly into a session-persistent risk accumulator rather than serving as a standalone classification decision.

The proposed fusion strategy improves detection performance by balancing temporal behavioural modelling and reconstruction-based anomaly detection. The system detects false positives which occur from temporary behavioural changes while it maintains detection sensitivity for gradual insider threat development.

To enable Session-Level Behavioural Risk Monitoring Model, the risk score is dynamically evaluated across consecutive time windows. The specific value of τ = 0.65 is empirically determined through validation-based grid search by Equation (15):

T h r e a t (t) = \{\begin{array}{l} 1, & if S_{final} (t) > τ \\ 0, & otherwise \end{array}

(15)

The system identifies a user as high-risk when their score exceeds established threshold during multiple assessment periods. The system allows for early user intervention based on this identification.

Algorithm 1: Behavioural Risk Score Generation Using LSTM–Autoencoder

Input: User activity dataset D
Output: Behavioral risk score R for each user session
1: Load dataset D
2: Preprocess dataset (remove missing values, normalize features)
3: Divide dataset into user sessions S
4: Initialize LSTM–Autoencoder model M
5: Train model M using normal behavioral data
6: For each session s in S do
7: Extract behavioral feature vector F from session s
8: Reconstruct F using trained model M
9: Compute reconstruction deviation between original and reconstructed features
10: Calculate risk score R(s) based on deviation level
11: Store R(s)
12: End For
13: Return behavioral risk scores for all sessions

3.3. Session-Level Behavioural Risk Monitoring with Temporal Smoothing

The session-level risk monitoring architecture described in this section represents the primary architectural departure of the proposed framework from existing hybrid anomaly detection systems. In standard LSTM-Autoencoder combinations, anomaly detection operates at the instance or session level: a feature vector is passed through both models, scores are combined or compared, and a binary threat label is assigned. This point-in-time design is insufficient for insider threat detection in enterprise environments for a fundamental reason insider threat behaviour frequently manifests as gradual behavioural drift across multiple sessions rather than a single anomalous event [7,29]. A system that classifies each session independently cannot distinguish a legitimate user exhibiting a one-time unusual pattern (a transient spike) from a malicious insider systematically deviating from normal behaviour over time (a sustained drift). The proposed framework addresses this by introducing three inter-dependent mechanisms: cumulative risk aggregation over T windows a persistence counter Et that tracks consecutive threshold exceedances, and a final access decision based on sustained cumulative risk RT. The interaction between these three mechanisms not any single component constitutes the technical novelty of the Session-Level Behavioural Risk Monitoring Model layer.

The proposed framework performs session-level behavioural risk monitoring with temporal smoothing, where user identity is assessed continuously across time windows based on behavioural anomaly scores rather than static one-time authentication. The system continuously checks user behavior through time-based analysis, which uses behavioural biometric data from enterprise logs in the enterprise behavioural insider threat dataset to evaluate user activity. This enables both monitoring activities and adjusting risk assessments. The persistence-based escalation mechanism addresses a fundamental limitation of point-in-time anomaly detection: in enterprise environments, legitimate users naturally exhibit transient behavioural deviations due to context switches, task urgency, or irregular working schedules. A single-window anomaly decision treating such deviations as insider threats produces unacceptably high false positive rates in practice. The proposed multi-window persistence model requiring k = 3 consecutive exceedances of threshold τ before security escalation, formalises behavioural consistency as a prerequisite for threat confirmation. This is conceptually distinct from simple threshold smoothing: it enforces temporal causality in the detection decision, ensuring that only sustained and reproducible anomalies, rather than isolated behavioural spikes, trigger authentication challenges. This design principle is consistent with clinical and network anomaly detection literature, where persistence of a signal across multiple independent observations is required to distinguish true pathological states from measurement noise.

3.3.1. Session-Level Risk Accumulation Model

Let

S_{final} (t)

denote the unified insider threat score at time window

t

. Instead of making a single decision per session, system computes a dynamic confidence score over multiple consecutive windows.

The cumulative behavioural risk over

T

windows is defined as Equation (16):

R_{T} = \frac{1}{T} \sum_{t = 1}^{t} S_{final} (t)

(16)

where:

$R_{T}$ = aggregated user risk score
$T$ = number of monitored time windows
$S_{final} (t)$ = insider threat score at the window $t$

Temporal aggregation over

T

monitoring windows smooths transient anomaly fluctuations in

S_{final} (t)

, reducing false escalations caused by isolated behavioural deviations. This stabilization effect is distinct from simple threshold averaging: it enforces a temporal consistency requirement whereby only sustained behavioural anomalies—rather than single-window spikes—contribute meaningfully to the cumulative risk score

R_{T}

.

3.3.2. Dynamic Risk Escalation

The implementation of a persistence-based escalation mechanism strengthens security systems by improving their detection accuracy. When anomaly scores surpass the threshold τ during three consecutive time periods, the system detects elevated risk levels represented by Equation (17):

E_{t} = \{\begin{array}{l} E_{t - 1} + 1, & if S_{final} (t) > τ \\ 0, & otherwise \end{array}

(17)

where:

$E_{t}$ = escalation counter
$τ$ = anomaly threshold

If

E_{t} \geq k

(predefined persistence threshold), The system triggers a security alert. This prevents singlewindow spikes from generating unnecessary alarms.

3.3.3. Session-Level Authentication Decision

The final authentication decision mechanism, formalized in Algorithm 2, is based on the cumulative risk shown by Equation (18):

Access Status = \{\begin{array}{l} Trusted, & if R_{T} \leq γ \\ High Risk, & if R_{T} > γ \end{array}

(18)

where

γ

is the system risk tolerance threshold. This decision mechanism enables continuous behavioural monitoring and risk-based access control using aggregated window-level analysis. The threshold parameters used in the proposed framework are set as follows: τ (anomaly score threshold) is fixed at 0.65, k (persistence window length) is set to 3 consecutive windows, and γ (risk escalation threshold) is set to 0.70. These values were determined through empirical optimization on a held-out validation partition of the training data. These values were selected based on empirical tuning using a validation subset of the dataset to balance detection performance and false positive rate.

Algorithm 2: Insider Threat Detection and Session-Level Behavioural Risk Monitoring Model

Input: Behavioral risk scores R
Output: Threat classification result C

1: Define risk threshold T
2: Initialize classification result set C
3: For each user session i do
4:          Retrieve risk score R(i)
5:          If R(i) > T then
6:                  Label session as malicious
7:                  Update C(i) = Insider Threat
8:          Else
9:                Label session as normal
10:                 Update C(i) = Legitimate User
11:         End If
12:         Monitor next session activity for Session-Level Behavioural Risk Monitoring Model
13: End For
14: Return classification results C

The selection of LSTM and Autoencoder as the core architectural components is motivated by their fundamentally complementary learning objectives. LSTM networks are optimised for supervised sequential pattern learning, making them well-suited to modelling the temporal evolution of user behavioural sequences where the order of activities carries diagnostic information. However, LSTM-based classification requires labelled training data and is optimised for matching known threat patterns; it does not inherently model what constitutes normal behaviour in a generative reconstruction sense. Autoencoders, trained exclusively on normal behavioural data, detect anomalies through reconstruction deviation and are sensitive to any behaviour that diverges from the learned normal distribution, regardless of whether it matches a known threat signature. This makes the autoencoder complementary to LSTM: where LSTM detects known sequential threat patterns, the autoencoder detects unknown deviations from normality, a critical property for insider threats which may not follow previously observed malicious patterns. Alternative sequential architectures were considered during framework development. GRU-based models were evaluated but showed marginal underperformance relative to LSTM, consistent with prior findings that LSTM’s distinct input and forget gates provide more effective modelling of long-range behavioural dependencies in enterprise log sequences. Transformer-based attention mechanisms were not selected due to substantially higher computational overhead in the continuous monitoring setting, where efficient inference across rolling 30-min windows is desirable for window-triggered enterprise deployment, subject to infrastructure-level validation. The LSTM–Autoencoder combination, therefore represents a principled architectural choice that balances detection coverage, computational efficiency, and practical suitability for continuous enterprise monitoring environments.

3.3.4. Threshold Selection and Optimization

The three parameters τ, k, and γ were empirically optimized using a held-out validation partition (10% of training data), with the test set remaining completely unseen during calibration to prevent information leakage. The three threshold parameters governing the Session-Level Behavioural Risk Monitoring Model and escalation decisions the anomaly score threshold τ, the persistence window k, and the risk escalation threshold γ were determined through a structured empirical optimization procedure applied exclusively to this held-out validation partition drawn from the training data.

Anomaly Score Threshold (τ): The threshold τ was optimized via a grid search over candidate values [0.50–0.80], selecting τ = 0.65 as it maximized the F1-score on the validation subset. Candidate values were swept across [0.50, 0.80] in increments of 0.05, and the F1-score was computed on the validation subset for each value. The value τ = 0.65 was selected as it maximized the validation F1-score and produced clear separation between normal and threat score distributions, as confirmed by the bimodal distribution pattern observed. further validates that τ = 0.65 yields the optimal F1-score of 0.9768 across all evaluated thresholds.

Persistence Parameter (k): The persistence parameter k was optimized by evaluating false positive escalation rates across k ∈ {1, 2, 3, 4, 5} on the validation subset. The parameter k defines the minimum number of consecutive windows exceeding τ before a security escalation is triggered, as formalized in Equation (17). At k = 1, the false positive escalation rate was 18%, which is operationally unacceptable for enterprise deployment, whereas k = 3 reduced it below 5% while maintaining recall above 99%. Therefore, k = 3 was selected as the optimal balance between escalation reliability and detection responsiveness.

Risk Escalation Threshold (γ): The threshold γ was empirically optimized at 0.70 based on the cumulative risk score distribution observed on the validation subset. The threshold γ governs the session-level authentication decision applied to the cumulative risk score

R_{T}

in Equation (18). Unlike τ which operates at the individual window level, γ confirms sustained anomalous behaviour across multiple consecutive windows before classifying a user as High Risk. Setting γ intentionally above τ = 0.65 enforces the requirement that session-level risk reflects persistent rather than incidental behavioural deviations before escalation is triggered.

4. Experimental Setup

4.1. Dataset Description

4.1.1. Dataset Overview

The dataset provides pre-structured enterprise behavioral activity records including user authentication data, file access records, network activity data, and communication patterns suitable for insider threat detection research. The dataset contains approximately 45,000 behavioral activity instances, which show user activity in an enterprise setting across different instances. The record contains multiple behavioral attributes that describe user interactions with enterprise resources and a binary class label that indicates normal behavior or insider threat activity [34]. The Insider Threat Dataset for Corporate Environments was selected over alternative datasets for several reasons. Unlike the raw CERT r4.2 and r5.2 corpora, which require extensive preprocessing across multiple disconnected log files, this dataset provides pre-structured and labelled behavioral feature vectors that directly support deep learning-based sequence modeling. Additionally, widely used alternatives such as CERT r4.2 suffer from severe class imbalance where insider threat instances represent less than 1% of total records, whereas the selected dataset maintains a near-balanced distribution of 52% normal and 48% insider threat instances, enabling more reliable evaluation of precision and recall. The dataset also consolidates all required behavioral interaction categories including authentication, file access, network usage, and communication behavior within a unified record structure, making it well-suited for the proposed session-level Session-Level Behavioural Risk Monitoring framework. Furthermore, the dataset is openly accessible without institutional registration, supporting reproducibility of the experimental results.

The CERT r4.2 corpus was pre-processed by merging its five raw activity log files covering logon events, file operations, device usage, HTTP browsing, and email communication, unified on user identifier and timestamp. The same 16 behavioural features were reconstructed from these raw fields: login frequency and session duration from logon timestamps; file count and sensitive access indicator from file operation records; network request frequency and external URL indicator from HTTP logs; email counts from email records; device interaction from device logs; and failed login attempts and privilege escalation events from logon failure codes. Users with no activity in a particular log domain were assigned zero values for the corresponding feature dimensions. The dataset exhibits severe class imbalance with malicious instances below 1%, so class weighting was applied during training and PR-AUC and MCC were used as primary evaluation metrics. A chronological per-user 80/20 train-test split was applied, with all normalization parameters and thresholds computed exclusively from the training partition to prevent leakage. The anomaly score threshold, persistence window, and risk escalation threshold were transferred at their primary-experiment values without recalibration, providing a conservative generalization estimate. All baseline models were independently retrained on the CERT r4.2 training partition under identical conditions and evaluated on the same held-out test partition.

The dataset exhibits a near-balanced class distribution comprising 52% normal and 48% insider threat instances. Unlike severely imbalanced datasets where accuracy is misleading for example, a classifier predicting only “normal” on a 99:1 dataset achieves 99% accuracy while detecting zero threats—the near-equal distribution here ensures accuracy genuinely reflects discriminative performance rather than majority-class bias. Nevertheless, Precision, Recall, F1-Score, and ROC-AUC are reported alongside accuracy since these metrics remain valid and comparable regardless of class proportions. Recall is specifically prioritized over accuracy as the primary detection metric, as missing a real insider threat carries far greater operational risk than a false alarm. To further confirm that results are not an artefact of the balanced distribution, the framework under simulated severe imbalance, reporting PR-AUC of 0.842 and MCC of 0.781 metrics robust to class skew demonstrating that detection capability is preserved under realistic enterprise conditions where insider threats are rare.

Table 1 illustrates the class imbalance handling strategy adopted in this study. As the dataset is nearly balanced, class weighting is applied during training instead of synthetic sampling methods, which helps improve recall stability and ensures balanced model learning without introducing artificial data. The dataset was divided into two parts for testing and training purposes through an 80:20 distribution which used stratified sampling to maintain the original class proportions and reduce class imbalance distribution. The researchers extracted features from behavioural log data which included information about user login patterns, file access behaviour, and the collection of data across different time periods. The researchers divided user activities into specific time intervals and used statistical methods to measure both how often events occurred and how users behaved during those intervals to identify potential insider threats.

Table 2 summarizes the key characteristics of the dataset used in the study, including sample size, class distribution, number of features, and temporal window configuration.

4.1.2. Imbalanced Data Evaluation Protocol

The proposed framework under realistic enterprise operating conditions, an additional evaluation was conducted using a synthetically imbalanced test partition that reflects the class distributions commonly observed in real-world insider threat environments. In operational enterprise settings, malicious insider events are typically highly sparse, constituting less than 5% of total user activity records, in contrast to the near-balanced 52%/48% distribution present in the primary Insider Threat Dataset for Corporate Environments.

The imbalanced evaluation scenario was constructed exclusively from the held-out test subset without any modification to the training data or the trained model. Specifically, random undersampling was applied to the insider threat (malicious) class within the test partition to simulate a severe class imbalance ratio of approximately 10:1 (normal versus insider threat). This procedure reduced the proportion of malicious instances in the test set from 48% to approximately 9%, producing a test distribution more representative of real enterprise activity logs. Importantly, no resampling, augmentation, or redistribution was applied to the training data; the model evaluated under these imbalanced conditions is identical to the model trained and validated during primary experiments. This design ensures that the imbalanced evaluation reflects genuine model generalization performance rather than the effect of retraining on a different class distribution.

Applying undersampling exclusively to the test partition, rather than retraining a separate model, enables a controlled assessment of how well the proposed framework’s learned behavioural representations and threshold calibrations transfer to imbalanced deployment conditions. Since threshold parameters τ = 0.65, k = 3, and γ = 0.70 were calibrated on the validation subset derived from the original near-balanced training data, this evaluation also reveals the sensitivity of the session-level escalation mechanism to distribution shift without the confound of retraining.

Standard metrics such as Accuracy and ROC-AUC can be misleading under severe class imbalance, as they remain artificially high even when a model fails to detect the rare positive class. Therefore, the imbalanced evaluation reports Precision–Recall Area Under Curve (PR-AUC) and Matthews Correlation Coefficient (MCC) as the primary performance indicators, since both metrics are robust to class skew and provide meaningful assessment of detection capability when malicious instances are rare.

4.1.3. Data Structure and Features

The data is a user-driven behavioural record of daily enterprise operations. Every instance is associated with a user session or a user activity window and consists of numerical and categorical attributes of behavioural patterns. The possible major categories of features are:

▪: Authentication Features: frequency of logins, session length, and abnormal timings of login.
▪: File Access Behavior: accessed files, sensitive file operations, suspicious access operations.
▪: System Usage Metrics: resource usage, interaction with devices.
▪: Network and Communication Indicators: frequency of browsing, behaviour of external access.
▪: Threat Label: binary classification label (Normal or Insider Threat)

These structured attributes facilitate the analysis of behavioural modelling and sequences of time.

4.1.4. Data Splitting Strategy

In order to guarantee high-quality performance evaluation and avoid overfitting, the dataset will be separated into two subsets:

➲: Training Set (80%)—This is used to train the LSTM model and autoencoder. An LSTM net acquires temporal behaviour patterns based on a sequence of user activities, and an autoencoder is mostly trained on regular user examples to acquire default system behavioural traits and to reconstruct normal system activities.
➲: Testing Set (20%)—It is only utilized when trained models are being evaluated. This dataset has hidden cases, which enable evaluation of classification quality, detection of anomalies, and the overall effectiveness of the insider threat detection system suggested.

Explicit data leakage prevention controls were enforced at every stage of the preprocessing pipeline. The dataset was partitioned at the raw activity record level before any feature computation, normalization, or sequence generation was performed. Min-max normalization parameters were computed exclusively from the training subset and subsequently applied to both subsets; the testing subset contributed to no normalization statistics. Categorical encoding mappings were similarly determined from the training subset alone. Temporal windowing and sequence generation were performed independently for each subset after partitioning, ensuring no sliding window sequence spanned records from both subsets. Additionally, the anomaly detection threshold for the Autoencoder reconstruction error was determined using a held-out validation partition drawn from the training set only (10% of training records), and all threshold parameters, including τ, k, and γ, were finalized on this validation partition without any adjustment after evaluation on the testing subset.

The dataset was split using a user-wise chronological partitioning strategy rather than random record-level sampling. For each individual user, activity records were ordered chronologically, and the earliest 80% of that user’s records were assigned to the training set while the most recent 20% were assigned to the test set. This approach ensures two critical properties: first, no future behavioural information of any user is used to train the model; second, the model is evaluated on its ability to generalize to the later behavioural period of each user, which more faithfully reflects the real-world deployment scenario where the system must authenticate users based on historical behaviour. Random splitting was explicitly avoided, as it would allow future activity records to appear in the training set while earlier records appear in the test set, artificially inflating temporal generalization performance. Stratification by class label was applied at the user level to maintain consistent class proportions across both partitions.

4.1.5. Data Leakage Prevention

Prevent data leakage arising from overlapping temporal sequences, the dataset partitioning was performed at the raw activity record level, prior to any temporal windowing, sequence construction, or feature computation. Specifically, the chronologically ordered activity records of each user were split such that the earliest 80% of records formed the training subset and the remaining 20% formed the test subset, preserving temporal ordering within each user’s activity history. This record-level split ensures a hard temporal boundary: no raw activity record contributed to both a training sequence and a test sequence simultaneously. Temporal windows and sliding sequences were constructed independently and separately within each subset after partitioning. For the training subset, 30-min behavioural windows were computed and organized into sequential inputs using a sliding step of 15 min. For the test subset, the same windowing procedure was applied independently, with the starting index of the first test window set to begin strictly after the last training record of each user. This design guarantees that no sliding window sequence in the test set overlaps with any window used during training. To further confirm the absence of leakage, the unique activity record indices of all windows appearing in test sequences were verified to have zero intersection with those in training sequences. Additionally, all normalization parameters (min-max scaling bounds), categorical encoding maps, and the autoencoder reconstruction error threshold were computed exclusively from the training subset and applied to the test subset without re-fitting, ensuring no statistical information from test data influenced model training or threshold calibration.

Temporal leakage was prevented at every stage of the pipeline through the following controls: (1) Dataset partitioning was performed at the raw activity record level, before any feature computation or sequence generation, using a chronological per-user split (earliest 80% of each user’s records to training, latest 20% to test). (2) All normalization parameters (min-max scaling bounds) were computed exclusively from the training subset and applied without re-fitting to the test subset. (3) Categorical encoding maps and the rare-category consolidation threshold were derived from training data only. (4) The autoencoder reconstruction error threshold (95th percentile) was determined from normal training samples only. (5) Sliding window sequences were generated independently within each partition; no sequence in the test set includes any record belonging to a training window. The first test sequence begins strictly after the last training record of each user. (6) All threshold parameters τ, k, and γ were calibrated on a held-out validation portion of the training data (10% of training records) and were never adjusted after test-set evaluation.

4.2. Data Preprocessing

The process creates enterprise activity logs, which enable the detection of insider threats. The data cleaning procedures handle the removal of both incomplete records and inconsistent data from the raw log data, which originates from the Insider Threat Dataset. The process of feature normalization establishes uniform value ranges through its implementation. The system divides user activities into distinct time intervals which enable the detection of their behavioural patterns, while essential features get transformed into structured vectors that LSTM and Autoencoder models use for analysis.

4.2.1. Data Cleaning

Data preprocessing begins with the cleaning of enterprise activity logs to remove noise, inconsistencies, and incomplete records. The behavioural logs obtained from the Insider Threat Dataset may contain missing values or corrupted entries due to logging errors or system failures. Let the raw dataset be represented as Equation (19),

W = \{x_{1}, x_{2}, x_{3}, \dots, x_{n}\}

(19)

where each

x_{i}

represents an individual activity record generated by a user in an enterprise network. Records containing missing or invalid attributes are removed to obtain a cleaned dataset. The cleaned dataset

W_{c}

can be expressed as shown in Equation (20).

W_{c} = W - \{x_{i} ∣ x_{i} \in W^{M i s s i n g (x_{i})} = 1\}

(20)

where

W_{c}

denotes the cleaned dataset,

W

represents the original dataset,

x_{i}

is an activity record, and

M i s s i n g (x_{i})

is a function that indicates whether the record contains missing or corrupted values.

4.2.2. Data Normalization

Enterprise behavioural features often have different numerical ranges, which may affect training stability and performance of machine learning models. To address this issue, feature normalization is applied using the min-max scaling method. The normalized feature value

x^{'}

is computed as denoted in Equation (21).

x^{'} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(21)

where

x

represents the original feature value,

x^{'}

denotes the normalized feature value,

x_{\min}

is the minimum value of the feature in the dataset, and

x_{\max}

is the maximum value of the feature.

4.2.3. Temporal Segmentation of User Activities

User activities in enterprise systems occur sequentially over time, making it necessary to analyse temporal behavioural patterns. Let the sequence of user activities be represented as shown in Equation (22).

U = \{u_{1}, u_{2}, u_{3}, \dots, u_{T}\}

(22)

where

U

denotes the chronological sequence of user activities and

u_{t}

represents the user activity at time step

t

. To capture temporal patterns, the activity logs are segmented into fixed-size windows. The segmented behavioral sequence

S_{k}

can be expressed as depicted in Equation (23).

S_{k} = \{u_{k}, u_{k + 1}, u_{k + 2}, \dots, u_{k + w - 1}\}

(23)

where

S_{k}

represents the segmented sequence,

k

denotes the starting index of the window, and

w

represents the predefined window size. The window size

w

. The time used in the experiments is set to 30 min, which defines the duration over which user activities are aggregated for feature computation. Behavioural features are first aggregated within each 30-min window into scalar statistics (intra-window aggregation), and these window-level vectors are subsequently arranged into chronological sequences across 30 consecutive windows (inter-window temporal modelling). Therefore, the framework does not treat features as either purely aggregated session statistics or raw time-series samples within a window; rather, it employs a two-level representation where scalar aggregation operates at the window level and temporal sequence modelling operates across windows. This design enables the LSTM to learn behavioural evolution patterns across time while maintaining computationally efficient feature representations. Temporal sequences were generated per user (using the two-level temporal architecture defined in Section 3), with each sequence assigned the label of its final window, consistent with a window-based detection scenario. Temporal sequences were generated on a per-user basis using a sliding window approach. Raw activity records were first segmented into non-overlapping 30-min intra-windows, each producing a 16-dimensional feature vector. These window-level vectors were then arranged into fixed-length sequences of 30 consecutive windows, producing input tensors of shape (30 × 16), representing approximately 7.5 h of continuous user activity. A sliding step of 1 window (30 min) was applied between consecutive sequences, meaning adjacent sequences overlap by 29 windows. Each sequence was assigned the class label of its final (most recent) window, consistent with a window-based detection scenario. Sequences were constructed strictly per user no sequence spanned activity records of more than one user. Users with fewer than 30 activity windows in the training partition were excluded to avoid sequence padding.

A sliding step of 1 window is applied, adjacent sequences share 29 of their 30 constituent windows. This degree of overlap is intentional: it maximises the number of training sequences available per user and ensures the model is exposed to sequences beginning at every possible behavioural starting point, improving temporal generalisation. The statistical dependency introduced by overlapping sequences is a known and accepted property of sliding-window temporal modelling; it is addressed through per-sequence label assignment based solely on the final window of each sequence, consistent with a window-based monitoring scenario. Critically, this overlap does not introduce leakage between the training and test partitions. Because sequences were generated independently within each partition after the chronological per-user record split, no sequence in the test partition shares any constituent window with any sequence in the training partition. The first test sequence for each user begins strictly at the first window constructed from that user’s test records, which start immediately after their last training record. This hard partition boundary guarantees that overlapping sequences remain entirely contained within their respective data splits.

4.2.4. Feature Vector Construction

The final behavioural representation included 16 features derived from enterprise logs, such as login frequency, session duration, failed logins, file access patterns, device usage, network activity, email behavior, and anomaly indicators like privilege escalation and irregular access patterns. An anomaly threshold of τ = 0.65 was set based on validation analysis, and sessions were flagged as high risk only if the threat score exceeded this threshold for k = 3 consecutive windows and the cumulative risk exceeded γ = 0.70, helping to reduce false positives and ensure stable detection. The distinction between intra-window feature aggregation and inter-window sequential modelling. Each of the 16 features is computed as a scalar statistic within a single 30-min window (e.g., total login events, total file operations). These window-level vectors are not raw time-series observations within a window but aggregated behavioural summaries. Sequential temporal modelling is then applied across 30 such consecutive windows, forming the (30 × 16) input tensor. The feature representation for each behavioural sequence is defined as shown in Equation (24).

F_{i} = [f_{1}, f_{2}, f_{3}, \dots, f_{m}]

(24)

where

F_{i}

denotes the feature vector corresponding to the

i^{th}

behavioural sequence,

f_{j}

represents the

j^{th}

extracted feature, and

m

indicates the total number of behavioral features used in the model. The feature vectors serve as inputs for deep learning models which include LSTM and Autoencoder to detect abnormal behavioral patterns that are linked to insider threats.

The Table 3 presents the categorized feature set used for modeling, capturing user behavior across authentication, file access, network activity, and system usage. These features are designed to represent both normal and anomalous patterns relevant to insider threat detection.

4.3. Computational Environment

A high-performance computer-based system was provided to build the insider threat detection framework. Python 3.10.19 was used to implement the framework and many other scientific computing libraries, machine learning libraries, and libraries for data visualization (NumPy 1.24.3; Pandas 2.3.3; Scikit-learn 1.3.0; TensorFlow 2.9.1; Imbalanced-learn 0.12.3; Matplotlib 3.10.8; and Seaborn 0.13.2) were used to build the models. The experiments were performed on a system with an Intel Core i9-14900K Processor (3.20 GHz), 32 GB of RAM (31.7 GB usable), and a 64-bit x64-based OS running Windows 11 Home (Version 25H2, OS Build 26200.7840) and Windows Feature Experience Pack 1000.26100.291.0. The combination of hardware and software provided adequate computational resources to effectively train and evaluate the deep learning and machine learning models that comprise the insider threat detection framework.

4.4. Hyperparameter Configuration

The study provides configuration parameters for LSTM-Autoencoder hybrid framework which detects insider threats. The LSTM model uses a sequential architecture which requires an input dimension of (1, features) because each input vector contains behavioural features obtained from user activity logs during a monitoring period. The LSTM layer in this setup contains 64 units which enables the model to learn nonlinear connections between different user activity behavioural patterns. The system uses a dense layer that contains 32 neurons with ReLU activation to identify advanced feature relationships which leads to a sigmoid output layer that classifies normal and malicious behavior. The system uses a dropout rate of 0.3 to stop model overfitting. The Adam optimizer trains model with a learning rate of 0.001 and binary cross-entropy loss for 30 epochs with a batch size of 64. The autoencoder system uses a symmetric encoder-decoder design that follows pattern of 32→ 16→32 to create compressed representations of standard behavioural patterns. The system conducts training for 30 epochs while utilizing a batch size of 32 and it uses a linear output layer together with mean squared error (MSE) as its evaluation method and Adam optimizer. This system uses the 95th percentile of reconstruction errors to establish an anomaly detection threshold which detects abnormal user behavior. The hybrid detection system uses weighted fusion to combine both model outputs because the LSTM component produces 70% of final score while autoencoder accounts for 30%. This hybrid approach uses supervised behavior classification together with unsupervised anomaly detection to enhance insider threat activity detection. Temporal segmentation follows the two-level feature extraction architecture defined.

All threshold parameters were determined through a structured validation procedure applied exclusively to a held-out validation partition of the training data. The anomaly score threshold τ = 0.65 was selected by sweeping candidate values from 0.50 to 0.80 and identifying the value that maximized the F1-score on the validation subset, supported by the bimodal score distribution observed in where normal and threat scores are clearly separated. The persistence window k = 3 was chosen after observing that k = 1 produced an 18% false positive escalation rate on the validation subset, while k = 3 reduced this below 5% while maintaining recall above 99%. The risk escalation threshold γ = 0.70 was set by inspecting the cumulative risk score distribution to ensure only users with sustained anomalous behaviour across multiple windows are escalated. The Autoencoder reconstruction error threshold was set at the 95th percentile of reconstruction errors from normal training samples, enabling automatic adaptation to the specific behavioural scale of the dataset. The fusion weight α was evaluated through validation-based sensitivity analysis across multiple candidate values. Although α = 0.6 produced the highest F1-score and precision, α = 0.7 achieved higher recall performance, which was considered more important for insider threat detection scenarios. In enterprise security environments, false negatives correspond to undetected malicious insider activities and therefore present substantially higher operational risk than moderate increases in false positives. Consequently, α = 0.7 was selected as the final configuration to prioritize detection sensitivity and minimize the probability of missed insider threats, despite a marginal reduction in F1-score compared with α = 0.6.

4.5. Performance Evaluation Metrics

The proposed LSTM-Autoencoder Session-Level Behavioural Risk Monitoring system evaluation uses standard classification performance metrics to measure its effectiveness. The insider threat detection system works as a binary classification system which identifies Normal and Insider Threat situations so it uses multiple complementary metrics to achieve complete system evaluation.

Accuracy measures it represented by Equation (25),

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(25)

Accuracy provides a general performance indicator but may be misleading in imbalanced datasets.

Precision measures how many detected insider threats are actually malicious is defined in Equation (26):

Precision = \frac{T P}{T P + F P}

(26)

High precision delivers two main advantages because it minimizes false positive detection which protects organizations from spending resources on unneeded security alerts.

Recall evaluates how many actual insider threats are correctly detected as shown in Equation (27):

Recall = \frac{T P}{T P + F N}

(27)

The F1-score balances precision and recall using their harmonic mean shown by Equation (28),

F 1 - Score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(28)

The ROC curve assesses model’s discriminant ability over multiple thresholds.

(TPR) shown by Equation (29),

T P R = \frac{T P}{T P + F N}

(29)

(FPR) illustrated by Equation (30),

F P R = \frac{F P}{F P + T N}

(30)

The Area Under the Curve (AUC) measures how well two different types of behavior can be separated from each other. A higher AUC value indicates better classification performance. Experimental reproducibility and reduce stochastic variation associated with neural network initialization and mini-batch optimization, all deep learning experiments were independently repeated using five different random seeds. The reported mean and standard deviation values in Table 4 and Table 5 and Table 9 were computed across these five independent runs. For each seed, the complete training and evaluation pipeline, including model initialization, data shuffling, and optimization, was executed independently under identical experimental conditions.

Precision, Recall, F1-Score, ROC-AUC, and MCC were reported alongside Accuracy to provide robust evaluation under both near-balanced and realistically imbalanced insider-threat conditions. While the primary dataset exhibits a near-balanced 52%/48% distribution, real enterprise environments are typically highly imbalanced, where Accuracy alone can be misleading. In particular, Recall, F1-Score, ROC-AUC, and MCC remain reliable and informative under class imbalance, enabling more meaningful assessment of insider-threat detection capability across varying data distributions.

5. Results and Discussion

5.1. Behavioural Risk Score and Session Monitoring Analysis

The behavioural risk assessment tests how well proposed system identifies normal user behavior and detects possible insider threats. The system establishes immediate risk evaluations by tracking user activities across various time intervals which help identify patterns that diverge from established normal patterns. The Session-Level Behavioural Risk Monitoring system needs users to be verified throughout their session because it does not rely on their first authentication check. The dynamic monitoring system identifies insider threats with greater efficiency while maintaining consistent authentication processes during operational activities.

The unified anomaly scoring system divides normal user behavior from insider threat activities according to their risk score distribution. The Figure 4 shows that normal user activities are concentrated near low-risk values between 0.00 and 0.05, which shows users perform their activities according to standard procedures without any noticeable changes. Insiders commit authentically suspicious activities which define specific threat methods that operate between 0.60 to 0.70 range with 0.65 being their most frequent value. The two distributions show a distinct separation which proves that anomaly detection system successfully identifies normal user behavior and detects unauthorized activities. The detection framework shows successful performance because it correctly identifies suspicious users through its decision thresholds while maintaining low false alarm rates in enterprise environments.

The hybrid LSTM–Autoencoder system produces constant threat detection results because it continuously monitors insider threat risks throughout its entire assessment duration. The raw risk scores in Figure 5 display a range from 0.60 to 0.70 which shows how different activities were performed during specific timeframes. The aggregation mechanism leads to an established risk score which maintains a range between 0.66 and 0.70 while diminishing sudden changes and unpredictable elements present in the initial predictions. The aggregation process enhances risk assessment accuracy by removing temporary irregularities which would otherwise disrupt essential patterns of behavior. The results demonstrate that proposed model delivers consistent insider threat risk assessments which help organizations improve their ongoing authentication processes throughout their operational environments.

Figure 6, Figure 7 and Figure 8 collectively confirm the behavioural stability characteristics of the proposed framework. Figure 6 shows a clear separation between user classes across time: normal users consistently maintain risk scores between 0.00 and 0.15, while insider threats sustain elevated scores between 0.85 and 0.98. Figure 7 shows that session authentication confidence fluctuates naturally between 0.30 and 0.75 due to legitimate contextual variation, yet remains bounded and predictable, confirming system stability under normal behavioural drift. Figure 8 reinforces these findings at the population level, demonstrating that this separation is consistent across extended monitoring periods rather than isolated windows. Together, these results confirm that the persistence-based escalation mechanism (k = 3) successfully distinguishes transient behavioural spikes from sustained insider threat activity.

The false positive trend graph evaluates the reliability of the proposed insider threat detection framework across multiple monitoring windows. Figure 9 demonstrates that the false positive rate remains low throughout the monitoring period because most measurements stay between 0.02 and 0.08 and only reach 0.20 to 0.23 in extremely rare instances. The overall average false positive rate remains around 0.05 which shows that the model successfully reduces threat alarms that are incorrectly detected as real threats. The security systems of enterprises must maintain low false positive rates because excessive security alerts will reduce user trust in the detection system and generate additional tasks for security personnel. The observed trend confirms that the proposed hybrid model achieves stable anomaly detection performance with minimal false alarms.

5.2. Insider Threat Detection Performance Evaluation

The assessment of performance evaluates how well the proposed framework detects insider threats in enterprise systems. The experimental results demonstrate that the system successfully identifies threats with high accuracy because it achieved strong performance results in accuracy and precision and recall and F1-score measurements throughout the evaluation process.

The performance metrics provide complete assessment of the insider threat detection framework which Figure 10 displays. The system achieved an accuracy of 97.65% which shows that most user behavior instances received correct classification. A precision score of 96.35% shows that most predicted insider threats correspond to actual malicious activities which reduces unnecessary security alerts. The model demonstrates strong capability to detect insider threats because it achieved a 99.05% recall value. The F1-score of 97.68% shows the system performs equally well in precision and recall while the ROC–AUC value of 99.20% shows a system can effectively distinguish between normal and malicious activities. The results were fully demonstrated by the suggested framework enables enterprises to achieve Session-Level Behavioural Risk Monitoring Model which protects their security systems.

The proposed insider threat detection system shows its effectiveness through user behavior pattern detection which is demonstrated in Figure 11. The model classified 21,658 normal user activities correctly while it misclassified 788 cases as insider threats. The system successfully detected 22,218 insider threat activities whereas 228 malicious samples were wrongly identified as normal behavior. The results demonstrate strong capability to differentiate between typical behavior patterns and unusual behavior patterns. The LSTM–Autoencoder architecture achieves high performance because it can learn complex sequential and behavioral features that exist in enterprise environments. The results obtained from this study show nearly balanced dataset performance which does not represent actual class distribution that exists in real-world scenarios. The system shows dependable results because it can recognize and learn complex user behavior patterns while protecting the Session-Level Behavioural Risk Monitoring system’s security.

The detection system shows its operational reliability through business environments because the system’s false positive rate and false negative rate results which appear in Figure 12. The model achieved a false positive rate of approximately 3.75% which showed that the system only falsely identified a small number of actual user activities as suspicious behavior. The false negative rate of 0.95% confirms that the system successfully detected nearly all actual insider threat instances, leaving very few malicious sessions undetected. Minimizing the false negative rate is critical in insider threat detection, as undetected malicious activity can lead to significant security breaches and organizational harm. The results showed that proposed framework achieves its goal of maintaining security protection while enabling system operational stability.

The Figure 13 illustrate learning behavior during the training process. This model achieves its first accuracy improvements when it starts to identify the basic usage patterns of enterprise users. The training process leads to stable training and validation accuracy results which show that the model has reached its final performance level. The close alignment between training and validation accuracy curves confirms strong generalization capability and the absence of significant overfitting. The final validation accuracy approaching approximately 97.6% confirms the robustness of the proposed framework for Session-Level Behavioural Risk Monitoring Model and insider threat detection tasks.

The Figure 14 demonstrate how the proposed deep learning model reaches its training goals through testing learning performance. The training process begins at epoch 0 with training loss set to 0.35 and verification loss at 0.29 which shows high error rates until the model acquires dataset behavioral knowledge. The training process shows effective parameter optimization through the continuous decrease of both loss values. The training loss at epoch 10 reaches 0.14 and the validation loss decreases to 0.11 which shows that prediction accuracy has improved. The training process reaches its final stage at epoch 29 when training loss decreases to 0.10 and validation loss reaches 0.08. The small gap between training and validation loss confirms that the model generalizes well without significant overfitting which proves that the proposed LSTM–Autoencoder framework reaches stable convergence.

Figure 15 presents the Precision–Recall curve comparison of different models for insider threat detection. The proposed hybrid LSTM + Autoencoder framework achieves the highest Average Precision (AP) score of 0.9898, demonstrating superior overall detection capability. The standalone LSTM model also performs strongly with an AP of 0.9678, indicating the effectiveness of temporal behavioural modelling. Random Forest and Logistic Regression achieve moderate performance, while the Autoencoder alone shows lower detection capability. The proposed hybrid framework maintains higher precision across most recall levels, indicating improved behavioural stability and more reliable anomaly detection. These results confirm that integrating temporal sequence learning with reconstruction-based anomaly analysis enhances continuous session-level risk monitoring performance.

The Figure 16 evaluates the discrimination capability of different classification models including Logistic Regression, Random Forest, LSTM, Autoencoder, and recent methods. These results show that Logistic Regression achieves an AUC value of 0.8898, which demonstrates moderate classification performance. The Random Forest model achieves better results because it has an AUC of 0.9422, which shows that its predictive ability has improved. The Autoencoder model records the lowest AUC value of 0.8076, which demonstrates its inability to independently capture complicated behavioral patterns. The LSTM model achieves a high AUC value of 0.9922, which shows its exceptional ability to track temporal patterns in user behavior sequences. The proposed LSTM–Autoencoder hybrid framework achieves the highest AUC score of 0.9923, which demonstrates its ability to differentiate between normal users and insider threats while proving the Session-Level Behavioural Risk Monitoring system works effectively.

Table 4 compares the proposed framework against baseline models using mean and standard deviation values across multiple runs. Traditional models such as Logistic Regression and Random Forest achieve moderate performance with accuracies of 91.0% and 94.0% respectively, while the standalone Autoencoder records the weakest results with an accuracy of 89.0% and ROC-AUC of 81.2%, confirming that reconstruction-based detection alone is insufficient for capturing complex behavioural patterns. The standalone LSTM performs strongly with 97.5% accuracy and 99.0% ROC-AUC, demonstrating the value of temporal sequence modelling. The proposed LSTM–Autoencoder framework outperforms all baselines across every metric, though the improvement over the standalone LSTM is intentionally modest, with the primary contribution residing in detection stability, false positive suppression, and session-persistent risk monitoring rather than marginal accuracy gain. All baseline and state-of-the-art methods, including Logistic Regression, Random Forest, GRU, standalone LSTM, and standalone Autoencoder were independently reimplemented and evaluated under identical experimental conditions rather than adopting results directly from their original publications. All models were trained and tested exclusively on the same Insider Threat Dataset for Corporate Environments, processed through the same data cleaning, min-max normalization, 30-min temporal segmentation, and 16-feature vector construction pipeline. A uniform stratified 80:20 train-test split was applied across all methods, with the test set remaining completely unseen during training, and hyperparameters for each baseline were tuned using the same validation subset employed for the proposed framework. All methods were assessed using the same evaluation metrics Accuracy, Precision, Recall, F1-Score, and ROC-AUC computed on the identical held-out test set.

In contrast, the deep-learning LSTM model demonstrates superior performance with its ability to learn and recognize temporal behavioural characteristics in an accuracy of 0.975 and an ROC–AUC of 0.99. The standalone autoencoder model exhibits limited performance since it utilizes reconstruction-type anomaly detection exclusively, as such, it lacks the temporal modelling capabilities of the hybrid model, thus limiting its potential effectiveness for insider threats. The hybrid LSTM-autoencoder model combines both temporal-based sequences and behavioural reconstruction elements; therefore, the overall method achieves the highest levels of accuracy (0.9765), precision (0.9635), recall (0.9903), F1-score (0.9758) and ROC–AUC (0.992). Consequently, this illustrates how the use of combined temporal-based sequence modelling and behavioural reconstruction optimizes expert systems used to detect security threats pertaining to insider actors within an enterprise-based environment.

The performance comparison of baseline models Logistic Regression, Random Forest, LSTM, Autoencoder, and recent state-of-the-art methods against the proposed hybrid LSTM–Autoencoder framework is illustrated in Figure 17. The traditional models that include Logistic Regression and Random Forest produce moderate results because their accuracy and F1-scores reach between 0.90 and 0.95. The LSTM and Autoencoder models demonstrate better results through their recall and ROC-AUC performance because these models can track time-based patterns while identifying unusual activities. The hybrid framework demonstrates superior performance against all baseline models according to all assessment metrics because it achieves the highest values in accuracy and precision and recall and F1-score and ROC-AUC measurements. The combination of sequential behavioral modeling with reconstruction-based anomaly detection methods demonstrates its capability to enhance insider threat detection through strengthened detection methods. The integrated LSTM–Autoencoder system successfully tracks both temporal user behavior patterns and minor user behavior changes which makes it appropriate for Session-Level Behavioural Risk Monitoring Model processes and risk management in business setting.

Table 5 presents the ablation study results of different model configurations. The results show that combining LSTM and autoencoder improves performance, while the inclusion of Session-Level Behavioural Risk Monitoring scoring further stabilizes detection. The proposed full framework achieves the best overall performance, and the low standard deviation values indicate consistent results across multiple runs.

The LSTM Plus Autoencoder No Risk Fusion system shows only a minor performance drop when compared to the LSTM Only system because it lacks a functional system that can merge its time-based and reconstruction signals. The LSTM and autoencoder outputs do not achieve their best alignment because risk fusion is missing, which results in system performance degradation through noise and conflicting anomaly detection between the two systems. The system demonstrates this pattern when different models are merged without using an appropriate method to combine their output. When Session-Level Behavioural Risk Monitoring scoring is introduced, the framework incorporates temporal aggregation and persistence-based smoothing, which stabilizes anomaly predictions across multiple time windows.

The selection of the fusion parameter α in Equation (14), a sensitivity analysis was conducted by varying α from 0.1 to 0.9 in increments of 0.1. The parameter α controls the contribution of the LSTM-based temporal score and the autoencoder-based reconstruction error in the final risk score.

The results indicate that performance improves as the contribution of the LSTM component increases, with peak performance observed at α = 0.7. Beyond this value, the performance gain saturates, while lower α values reduce the influence of temporal behavioural modelling, leading to a slight decrease in detection performance. This confirms that α = 0.7 provides an optimal balance between sequential pattern learning and anomaly reconstruction.

Table 6 compares the performance of the proposed model with classical machine learning and deep learning baseline methods. The results show that the proposed model achieves superior or competitive performance across all evaluation metrics, demonstrating its effectiveness and robustness.

Table 7 shows that α = 0.6 achieved the highest F1-score and precision during validation analysis. However, the proposed framework ultimately adopted α = 0.7 because it provided higher recall performance. Since insider threat detection is a security-critical application where undetected malicious behaviour may lead to severe organizational damage, the framework prioritizes recall and threat detection sensitivity over marginal improvements in balanced performance metrics.

Table 8 shows the model performance across different insider threat categories. The results indicate consistently high performance, with slightly better detection for data exfiltration, while sabotage and privilege misuse show marginally lower performance due to subtler behavioural patterns.

This Figure 18 illustrates the impact of varying the fusion weight (α) on the F1-score of the proposed model. The results show that performance improves as α increases, reaching an optimal value at α = 0.6, after which a slight decline is observed. This indicates that the model achieves the best balance between contributing components at α = 0.6 while maintaining stable performance across nearby values (red dot).

Table 9 presents the robustness analysis of the proposed model and LSTM baseline using mean and standard deviation across multiple runs. The results indicate that the proposed model achieves competitive performance with lower variance, demonstrating improved stability and reliability.

The Table 10 shows the impact of varying decision thresholds on precision, recall, and F1-score. The results indicate that the model maintains stable performance across different thresholds, with optimal performance observed at a threshold of 0.65. The proposed model demonstrates lower variance compared to the LSTM baseline, indicating improved stability. Although the performance improvement is marginal, the consistent gains in ROC-AUC and reduced standard deviation confirm that the model performance is reliable and not due to random chance.

Table 11 presents the performance of the proposed model across 5-fold cross-validation. The results demonstrate consistent performance across all folds with very low standard deviation, indicating strong generalization capability and absence of overfitting.

Evaluate whether the performance improvements achieved by the proposed framework over baseline models were statistically significant, McNemar’s test was performed on the held-out test set predictions. McNemar’s test is appropriate for paired binary classification problems where multiple classifiers are evaluated on the same test samples. The test analyzes disagreement patterns between the proposed framework and competing baseline methods using identical prediction instances. The resulting p-values reported in Table 12 indicate whether the observed differences between classifiers are statistically significant. A significance threshold of p < 0.05 was adopted throughout the study.

The performance improvements achieved by the proposed framework are statistically reliable and not caused by random variation, statistical significance analysis was performed using Precision–Recall Area Under Curve (PR-AUC), Matthews Correlation Coefficient (MCC), p-value analysis, and McNemar’s test. PR-AUC and MCC were selected because they provide reliable evaluation under moderately imbalanced insider threat detection conditions. McNemar’s test was further applied to evaluate whether the classification differences between the proposed framework and baseline models were statistically significant at the prediction level.

Table 12 presents the mean and standard deviation values obtained across multiple experimental runs together with p-values computed relative to the proposed framework. The results demonstrate that the proposed framework consistently outperforms all baseline models in both PR-AUC and MCC metrics. The obtained p-values remain below the significance threshold of 0.05 for all comparisons, confirming that the observed performance improvements are statistically significant rather than resulting from random variation. Furthermore, the McNemar’s test results indicate statistically significant differences in classification behaviour between the proposed framework and competing methods, validating the robustness and reliability of the proposed insider threat detection framework.

The proposed framework was additionally evaluated under a simulated severe class imbalance to assess its operational robustness under realistic enterprise insider threat conditions. This evaluation was conducted using the same trained model without retraining, by applying random under sampling exclusively to the held-out test partition to reduce malicious class representation from 48% to approximately 9%, approximating a 10:1 normal-to-threat ratio. No resampling or synthetic augmentation was applied to the training data, ensuring that the evaluation reflects genuine model behaviour under distribution shift rather than an optimized retraining outcome. The threshold parameters τ, k, and γ were retained at their validation-calibrated values without recalibration on the imbalanced test set, providing a conservative and realistic performance estimate. Table 13 reports the results of this evaluation using PR-AUC and MCC as primary metrics, both of which are robust to class skew and provide reliable performance assessment when insider threat instances constitute a small minority of total activity records.

The Table 13 shows how well the proposed model performs under actual imbalanced data conditions which contain much lower numbers of insider threat cases compared to regular activities. The evaluation process uses metrics that specifically apply to scenarios involving imbalanced data distribution, which includes PR-AUC and MCC measurement. The results show that the model maintains consistent performance capacity which performs well across all class distributions because it detects real-world insider threat situations.

Figure 19 illustrates the effect of increasing class imbalance on the proposed insider threat detection framework. The evaluation was conducted across multiple normal-to-malicious class ratios ranging from balanced (50:50) to highly imbalanced (95:5) conditions. The results show that both PR-AUC and MCC progressively decrease as the imbalance ratio increases, indicating that insider threat detection becomes more challenging when malicious activities are rare. While the framework achieves near-perfect performance under balanced conditions, the evaluation under realistic enterprise-like imbalance scenarios still maintains meaningful PR-AUC and MCC values, demonstrating partial robustness under severe class imbalance.

Table 14 compares malicious-class detection performance between the near-balanced experimental dataset and the more realistic CERT-like evaluation condition. The results indicate that precision, recall, and F1-score decrease noticeably under realistic imbalance conditions, confirming that insider-threat detection becomes substantially more challenging when malicious behaviour is rare. Nevertheless, the proposed framework maintains meaningful malicious-class detection capability under enterprise-like settings.

5.3. Cross-Dataset Generalization Evaluation

The generalization capability of the proposed framework, experiments were conducted using an alternative insider threat benchmark dataset. The objective of this evaluation was to examine whether the proposed LSTM–Autoencoder framework maintains stable detection performance across datasets exhibiting different behavioural distributions, user activity patterns, and insider threat characteristics. The same preprocessing pipeline, temporal segmentation strategy, feature extraction procedure, and hyperparameter configuration used in the primary experiments were consistently applied to the additional dataset to ensure fair evaluation. The dataset was partitioned using the same stratified 80:20 train–test strategy while preserving chronological ordering during sequence generation. Experimental results demonstrate that the proposed framework maintains strong detection capability across multiple datasets, confirming its robustness under varying behavioural conditions. Although slight performance variations are observed due to differences in behavioural distributions and class imbalance characteristics, the framework consistently achieves high recall and ROC-AUC values, indicating stable insider threat detection performance.

Table 15 presents the cross-dataset evaluation results of the proposed LSTM–Autoencoder framework on the Insider Threat Dataset for Corporate Environments and the CERT r4.2 dataset [35]. The framework achieves strong performance across both datasets, demonstrating stable insider threat detection capability under different behavioural conditions. Although performance on CERT r4.2 is slightly lower due to higher behavioural variability and dataset complexity, the proposed model maintains high ROC-AUC performance, confirming improved robustness and generalization capability.

5.4. Discussion

The proposed behavioural biometrics–based insider threat detection framework effectively detects abnormal user activities in enterprise environments according to the experimental results. The proposed framework should be viewed as an integrated enhancement of existing behavioural analytics methodologies rather than a fundamentally new insider threat detection paradigm. Its primary contribution lies in combining complementary temporal learning and anomaly detection mechanisms with persistence-aware risk assessment to improve behavioural monitoring stability and operational applicability in enterprise environments. The system uses LSTM-based sequential behaviour modelling together with autoencoder-based anomaly detection to detect both temporal behaviour patterns and abnormal user activity. The results consistently demonstrate that deep learning approaches outperform traditional machine learning models in capturing the complexity of insider threat behaviour patterns. Session-Level Behavioural Risk Monitoring Model combined with dynamic risk scoring enhances detection accuracy through the assessment of behavioural variations throughout different time periods. The ablation finding results demonstrate that all framework components boost system performance while the complete model demonstrates superior accuracy and ROC-AUC results. Although the framework was evaluated in an offline setting, its window-based behavioural analysis architecture supports incremental processing of user activity logs, making it amenable to window-based, deployment architecture. The model supports expansion of enterprise conditions with extremely sizable user activity data emanating continuously. Enterprise monitoring systems which include log management and SIEM platforms can use deep learning models after their training because the inference phase demands minimal computational resources. The research findings show that the proposed framework serves as an effective security solution which scales to handle continuous insider threat monitoring in enterprise systems.

A precise architectural distinction must be drawn between the proposed framework and prior LSTM-Autoencoder hybrid systems to clarify the nature of the technical contribution. Existing hybrid approaches combine sequential and reconstruction-based models at the output layer, producing threat scores that are evaluated independently or through fixed ensemble logic. These systems function as enhanced binary classifiers they improve detection accuracy over single-model approaches but do not alter the fundamental architecture of threat assessment, which remains session-level and static. The proposed framework redefines the detection architecture itself: rather than classifying sessions in isolation, it implements a Session-Level Behavioural Risk Monitoring Model engine where the risk state of a user is a temporally evolving quantity updated at every monitoring window. The escalation mechanism introduces temporal causality as a first-class design principle a threat decision is not valid unless it is reproducible across k consecutive observations, directly analogous to clinical diagnostic criteria that require symptom persistence across multiple assessments before diagnosis. This design choice is validated empirically: the persistence constraint reduces false positive escalation by 78% relative to single-window detection while maintaining recall above 99%, a trade-off that no existing hybrid model reports because no existing hybrid model implements session-persistent escalation. The contribution is therefore not a new model architecture but a new authentication framework design that repurposes well-understood component models for a fundamentally different operational objective.

A key architectural distinction of the proposed framework from existing hybrid approaches lies in the integration strategy rather than the individual model components. Prior hybrid approaches that combine sequential and reconstruction-based models typically produce independent anomaly scores evaluated separately or concatenated as features for a downstream classifier. The proposed framework instead introduces a unified risk fusion mechanism with validation-tuned weighting (α = 0.7), combined with a persistence-based escalation layer operating at the session level rather than the instance level. This design enables the framework to function as a Session-Level Behavioural Risk Monitoring Model engine not merely a batch threat classifier where authentication confidence is dynamically updated across consecutive behavioural windows. This session-persistent monitoring property distinguishes the proposed work from prior LSTM-only, autoencoder-only, and loosely combined hybrid models, and is directly responsible for the observed suppression of false positive escalation while maintaining 99.05% recall for sustained insider threat activity. The threshold values reported in this study τ = 0.65, k = 3, and γ = 0.70 were empirically calibrated on the Insider Threat Dataset for Corporate Environments and should not be treated as universal constants across all enterprise deployments. Organizations with denser activity logs, higher baseline behavioural variability, or significantly different class distributions would require recalibration of these parameters using organization-specific validation data. However, several architectural properties of the framework inherently support cross-environment adaptation: the min-max normalization is re-fit on each organization’s local training data, the Autoencoder threshold adapts automatically via percentile-based calibration on local normal behaviour, and the LSTM is retrained on organization-specific sequential logs rather than relying on transferred weights. Formal cross-organizational generalizability evaluation is constrained by the limited availability of multiple labelled enterprise insider threat datasets, and future work will investigate federated learning and domain adaptation strategies to improve transferability across heterogeneous enterprise environments.

6. Conclusion and Future Works

The primary objective of this research is a framework that incorporates behavioral characteristics for detecting insider threats Session-Level Behavioural Risk Monitoring Model and anomaly detection utilizing an LSTM-Autoencoder system. The LAE system analyses user behaviors such as authentication activities, access to files, and other session events to create real-time assessments of insider threats, thus helping to determine whether there has been an occurrence of a security breach as opposed to normal user behavior. Evaluated on the Insider Threat Dataset for Corporate Environments, the framework demonstrated strong detection performance, achieving an accuracy of 97.65%, a precision of 96.35%, recall of 99.05%, and F1 score of 97.68%. The ROC-AUC of 99.20% confirms a high degree of discriminative capability between normal and malicious user behaviour, supported by a consistently low false positive rate throughout evaluation. Furthermore, the LAE modelling tracked user behaviors over time, and provided evidence of the existence of potential insider threats through providing a measure of the ongoing stability of session authentication and behavioural risk score analysis. These results demonstrate that the proposed LSTM–Autoencoder framework offers a secure, scalable solution for ongoing window-based insider threat monitoring, where the primary contribution resides not in marginal accuracy gains over the standalone LSTM baseline but in the 78% reduction in false positive escalation, improved cross-run stability, and the session-persistent risk monitoring architecture, collectively providing enterprises with the analytical capability needed to mitigate the operational and financial impact of insider threats.

The framework requires ongoing monitoring of user conduct for enterprise systems which creates significant ethical and confidentiality issues. The system tracks employee activities through login pattern analysis and file access monitoring and network activity evaluation and communication log examination which creates issues about employee monitoring and protection of personal information. The framework requires operation through organisational governance frameworks which control access to data while processing anonymised information and following enterprise security regulations. The system enables decision-making support by validating user access requests which allows administrators to maintain control over access permissions. The upcoming research will investigate privacy-preserving methods that include federated learning and differential privacy to safeguard sensitive information while maintaining effective detection capabilities. The research will identify methods to reduce false positive errors which will help protect users from wrongful identification and maintain business operations.

The proposed insider threat detection framework has strong detection capabilities. However, it requires additional enhancements in order to perform effectively in real-world contexts. Future research will incorporate more contextual behaviour data into the creation of risk score estimates and will also incorporate more advanced deep learning models into the solution in order to improve the ability to track extended behavioural patterns of individuals. The proposed system will also be tested within actual business settings where the system will allow for ongoing surveillance/monitoring of users as well as dynamic assessment of risk. Future research will also explore the use of federated learning methods, which allow multiple organisations to work together on the detection of insider threats while maintaining confidentiality of their data. As a result of these improvements, the developed insider threat detection system will also be scalable, reliable, and usable in ‘in-use’ environments.

Author Contributions

Conceptualization, N.K. and O.M.; methodology, A.A. and O.M.;; software, N.K.; validation, A.A., A.Y. and N.K.; formal analysis, A.Y.; investigation, N.K.; resources, A.A.; data curation, A.Y.; writing—original draft preparation, N.K.; writing—review and editing, O.M.; visualization, A.Y.; supervision, O.M.; project administration, A.A.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used in this study is publicly available through Kaggle as the Insider Threat Dataset for Corporate Environments. The dataset can be accessed at: https://www.kaggle.com/datasets/ahmeduzaki/insider-threat-dataset-for-corporate-environments (accessed on 20 March 2026). The repository includes trained model checkpoints, dataset split indices, and random seeds used across all five experimental runs. used in this study are available through the following GitHub repository: https://github.com/NursultanKuldeyev/Level-Risk-Monitoring-for-Insider-Threat-Detection-in-Enterprise-Networks (accessed on 20 March 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Manoharan, P.; Hong, W.; Yin, J.; Wang, H.; Zhang, Y.; Ye, W. Optimising Insider Threat Prediction: Exploring BiLSTM Networks and Sequential Features. Data Sci. Eng. 2024, 9, 393–408. [Google Scholar] [CrossRef]
Hong, W.; Yin, J.; You, M.; Wang, H.; Cao, J.; Li, J.; Liu, M.; Man, C. A graph empowered insider threat detection framework based on daily activities. ISA Trans. 2023, 141, 84–92. [Google Scholar] [CrossRef]
Yuan, L.; Chang, D.; Hu, H.; Jiang, Y.; Chang, H.; Fang, L.; Liu, Y. FusionITD: Enhanced cross-modal insider threat perception framework via behavior-semantic fusion. Cybersecurity 2026, 9, 119. [Google Scholar] [CrossRef]
Bamashmoos, F. Adaptive Privacy-Preserving Insider Threat Detection Using Generative Sequence Models. Future Internet 2026, 18, 11. [Google Scholar] [CrossRef]
D’amelio, A.; Patania, S.; Bursic, S.; Cuculo, V.; Boccignone, G. Using Gaze for Behavioural Biometrics. Sensors 2023, 23, 1262. [Google Scholar] [CrossRef]
Villarreal-Vasquez, M.; Modelo-Howard, G.; Dube, S.; Bhargava, B. Hunting for Insider Threats Using LSTM-Based Anomaly Detection. IEEE Trans. Dependable Secur. Comput. 2023, 20, 451–462. [Google Scholar] [CrossRef]
Khan, N.; Houghton, R.J.; Sharples, S. Understanding factors that influence unintentional insider threat: A framework to counteract unintentional risks. Cogn. Technol. Work 2022, 24, 393–421. [Google Scholar] [CrossRef]
Georgiadou, A.; Mouzakitis, S.; Askounis, D. Detecting Insider Threat via a Cyber-Security Culture Framework. J. Comput. Inf. Syst. 2022, 62, 706–716. [Google Scholar] [CrossRef]
Zangana, H.M.; Sallow, Z.B.; Omar, M. The Human Factor in Cybersecurity: Addressing the Risks of Insider Threats. J. Ilm. Comput. Sci. 2025, 3, 76–85. [Google Scholar] [CrossRef]
Bansal, P.; Ouda, A. Continuous Authentication in the Digital Age: An Analysis of Reinforcement Learning and Behavioral Biometrics. Computers 2024, 13, 103. [Google Scholar] [CrossRef]
Sawicki, A.; Saeed, K.; Walendziuk, W. Behavioral Biometrics in VR: Changing Sensor Signal Modalities. Sensors 2025, 25, 5899. [Google Scholar] [CrossRef]
Tian, T.; Zhang, C.; Jiang, B.; Feng, H.; Lu, Z. Insider threat detection for specific threat scenarios. Cybersecurity 2025, 8, 17. [Google Scholar] [CrossRef]
Randive, K.; Mohan, R.; Sivakrishna, A.M. An efficient pattern-based approach for insider threat classification using the image-based feature representation. J. Inf. Secur. Appl. 2023, 73, 103434. [Google Scholar] [CrossRef]
Muzaffar, J.; Mazher, N. AI-Powered Behavioral Analysis for Insider Threat Detection in Enterprise Networks. Balt. J. Multidiscip. Res. 2024, 1, 1–11. [Google Scholar]
Mahfouz, A.; Hamdy, A.; Eldin, M.A.; Mahmoud, T.M. B2auth: A contextual fine-grained behavioral biometric authentication framework for real-world deployment. Pervasive Mob. Comput. 2024, 99, 101888. [Google Scholar] [CrossRef]
He, D.; Lv, X.; Xu, X.; Chan, S.; Choo, K.-K.R. Double-Layer Detection of Internal Threat in Enterprise Systems Based on Deep Learning. IEEE Trans. Inf. Forensics Secur. 2024, 19, 4741–4751. [Google Scholar] [CrossRef]
Saminathan, K.; Mulka, S.T.R.; Damodharan, S.; Maheswar, R.; Lorincz, J. An Artificial Neural Network Autoencoder for Insider Cyber Security Threat Detection. Future Internet 2023, 15, 373. [Google Scholar] [CrossRef]
Mohamed, M.S.; Arabo, A. A SIEM-Integrated Cybersecurity Prototype for Insider Threat Anomaly Detection Using Enterprise Logs and Behavioural Biometrics. Electronics 2026, 15, 248. [Google Scholar] [CrossRef]
Pennada, S.S.P.; Nayak, S.K. Insider Threat Detection Using Behavioural Analysis through Machine Learning and Deep Learning Techniques. Int. Res. J. Multidiscip. Technovation 2025, 7, 74–86. [Google Scholar] [CrossRef]
Abba, S.S.; Obioha-Val, O.A.; Ejiofor, V.O.; Olaniyi, O.M.; Mayeke, N.R. Behavioral Biometrics-Powered Continuous Authentication for Zero-trust Remote Work Environments: A Multi-factor Identity Verification Framework. Asian J. Res. Comput. Sci. 2025, 18, 20–41. [Google Scholar] [CrossRef]
Aramide, O.O. AI-Driven Identity Verification and Authentication in Networks: Enhancing Accuracy, Speed, and Security through Biometrics and Behavioral Analytics. Adhyayan J. Manag. Sci. 2023, 13, 60–69. [Google Scholar] [CrossRef]
Ibraheem, I.; Morufat, A.T.; Segbefia, S.K.; Abdulrasaq, A.A. Detecting Malicious Insider Threats through Anomaly-Based User Behaviour Analytics in Enterprise Networks: Machine Learning Approach. S. Afr. J. Secur. 2025, 3, 18099. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Wu, W.; Chuan, T.; Peng, Q. Enterprise Internal Threat Authentication Traceability Technology Based on Key Authentication System. J. Cyber Secur. Mobil. 2025, 14, 623–652. [Google Scholar] [CrossRef]
Uslu, U.; İncel, Ö.D.; Alptekin, G.I. Evaluation of Deep Learning Models for Continuous Authentication Using Behavioral Biometrics. Procedia Comput. Sci. 2023, 225, 1272–1281. [Google Scholar] [CrossRef]
Orun, A.; Orun, E.; Kurugollu, F. Cognitive behavioural characteristics identification for remote user authentication for cybersecurity. J. Parallel Distrib. Comput. 2025, 202, 105102. [Google Scholar] [CrossRef]
Tao, X.; Yu, Y.; Fu, L.; Liu, J.; Zhang, Y. An insider user authentication method based on improved temporal convolutional network. High-Confid. Comput. 2023, 3, 100169. [Google Scholar] [CrossRef]
Yi, J.; Tian, Y. Insider Threat Detection Model Enhancement Using Hybrid Algorithms between Unsupervised and Supervised Learning. Electronics 2024, 13, 973. [Google Scholar] [CrossRef]
Song, S.; Gao, N.; Zhang, Y.; Ma, C. BRITD: Behavior rhythm insider threat detection with time awareness and user adaptation. Cybersecurity 2024, 7, 2. [Google Scholar] [CrossRef]
Ayanbode, N.; Abieba, O.A.; Chukwurah, N.; Ajayi, O.O.; Daraojimba, A.I. Human Factors in Fintech Cybersecurity: Addressing Insider Threats and Behavioral Risks. Int. J. Multidiscip. Res. Growth Eval. 2024, 5, 1350–1356. [Google Scholar] [CrossRef]
Baig, A.F.; Eskeland, S.; Yang, B. Privacy-preserving continuous authentication using behavioral biometrics. Int. J. Inf. Secur. 2023, 22, 1833–1847. [Google Scholar] [CrossRef]
Al-Shehari, T.; Al-Razgan, M.; Alfakih, T.; Alsowail, R.A.; Pandiaraj, S. Insider Threat Detection Model Using Anomaly-Based Isolation Forest Algorithm. IEEE Access 2023, 11, 118170–118185. [Google Scholar] [CrossRef]
Nowroozi, E.; Mohammadi, M.; Rahdari, A.; Taheri, R.; Conti, M. A Random Deep Feature Selection Approach to Mitigate Transferable Adversarial Attacks. IEEE Trans. Netw. Serv. Manag. 2025, 22, 5301–5310. [Google Scholar] [CrossRef]
Kittler, J.; Hatef, M.; Duin, R.; Matas, J. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 226–239. [Google Scholar] [CrossRef]
Zaki, A.M. Insider Threat Dataset. Available online: https://www.kaggle.com/datasets/ahmeduzaki/insider-threat-dataset-for-corporate-environments (accessed on 9 March 2026).
Lindauer, B. Insider Threat Test Dataset; Carnegie Mellon University: Pittsburgh, PA, USA, 2020. [Google Scholar] [CrossRef]

Figure 1. Proposed Behavioural Biometrics-Based Insider Threat Detection Framework.

Figure 2. LSTM-Based Behavioral Pattern Modeling for Insider Threat Detection.

Figure 3. Autoencoder-Based Behavioral Anomaly Detection for Insider Threat Identification.

Figure 4. Behavioural Risk Score Distribution for Normal and Malicious Users.

Figure 5. Aggregated Insider Risk Score Analysis.

Figure 6. Temporal Evolution of Insider Threat Risk.

Figure 7. Session Authentication Stability Evaluation.

Figure 8. Behavioural Stability Comparison Between Normal and Malicious Users.

Figure 9. False Positive Trend Analysis Over Time.

Figure 10. Overall Performance Metrics.

Figure 11. Confusion Matrix of the Proposed LSTM–Autoencoder Framework.

Figure 12. False Positive Rate (FPR) and False Negative Rate (FNR).

Figure 13. Training and Validation Accuracy Across Epochs.

Figure 14. Model Training and Validation Loss Convergence.

Figure 15. Precision–Recall Curve Analysis of Detection Models.

Figure 16. ROC Curve Comparison and AUC Performance.

Figure 17. Comparison of Baseline Models Proposed Hybrid Framework.

Figure 18. Effect of Fusion Weight (α) on Model Performance.

Figure 19. Impact of Class Imbalance on Insider Threat Detection Performance.

Table 1. Class Imbalance Handling Strategy.

Method	Applied	Reason
SMOTE	No	Dataset nearly balanced
Class Weights	Yes	Improve recall stability

Table 2. Dataset Summary.

Property	Value
Total Samples	45,000
Number of Features	16
Class Distribution	Normal: 52%, Insider Threat: 48%
Window Size	30 min

Table 3. Feature Categories and Extracted Features.

Feature Category	Features
Authentication	Login frequency, session duration
File Access	File count, sensitive access
Network	Request frequency, URL access
Email	Sent/received emails
System Usage	Device interaction
Anomaly Indicators	Privilege escalation

Table 4. Robust Performance Comparison with baseline.

Model	Accuracy	Precision	Recall	F1-Score	ROC-AUC
Logistic Regression (Baseline)	0.910 ± 0.004	0.902 ± 0.005	0.918 ± 0.006	0.909 ± 0.004	0.930 ± 0.003
Random Forest(Baseline)	0.940 ± 0.003	0.948 ± 0.004	0.942 ± 0.005	0.945 ± 0.003	0.960 ± 0.002
LSTM based (Baseline)	0.974 ± 0.002	0.968 ± 0.003	0.983 ± 0.002	0.975 ± 0.002	0.989 ± 0.001
Autoencoder-based (Baseline)	0.892 ± 0.006	0.855 ± 0.007	0.883 ± 0.006	0.868 ± 0.005	0.815 ± 0.004
Proposed Model	0.9765 ± 0.0015	0.9635 ± 0.002	0.9905 ± 0.001	0.9768 ± 0.0018	0.9920 ± 0.001

Table 5. Ablation Study of the Proposed Insider Threat Detection Framework.

Model Configuration	Accuracy	Precision	Recall	F1-Score	ROC-AUC
Behavioral Features + ML Classifier	0.912 ± 0.004	0.901 ± 0.005	0.918 ± 0.006	0.909 ± 0.004	0.926 ± 0.003
LSTM Only (Sequential Behavior)	0.975 ± 0.002	0.970 ± 0.003	0.985 ± 0.002	0.977 ± 0.002	0.990 ± 0.001
Autoencoder Only (Anomaly Detection)	0.890 ± 0.006	0.852 ± 0.007	0.880 ± 0.006	0.865 ± 0.005	0.812 ± 0.004
LSTM + Autoencoder (No Risk Fusion)	0.972 ± 0.002	0.961 ± 0.003	0.982 ± 0.002	0.971 ± 0.002	0.988 ± 0.001
+ Session-Level Behavioural Risk Monitoring Scoring	0.975 ± 0.002	0.963 ± 0.003	0.987 ± 0.002	0.975 ± 0.002	0.991 ± 0.001
Proposed Full Framework	0.9765 ± 0.0015	0.9635 ± 0.002	0.9905 ± 0.001	0.9768 ± 0.0018	0.992 ± 0.001

Table 6. Performance Comparison with Baseline Models.

Model	Accuracy	Precision	Recall	F1-Score	ROC-AUC
Logistic Regression	0.91	0.9	0.92	0.91	0.93
Random Forest	0.94	0.95	0.94	0.945	0.96
GRU (Advanced Seq Model)	0.973	0.968	0.984	0.976	0.989
LSTM	0.975	0.97	0.985	0.977	0.99
Autoencoder	0.89	0.85	0.88	0.865	0.81
Proposed Model	0.9765	0.9635	0.9905	0.9768	0.992

Table 7. Sensitivity Analysis of Fusion Parameter (α) on Model Performance.

α (LSTM Weight)	Accuracy	Precision	Recall	F1-Score	ROC-AUC
0.1	0.942	0.935	0.9502	0.9425	0.9605
0.3	0.9615	0.9552	0.9708	0.9629	0.9802
0.5	0.9735	0.9678	0.9832	0.9734	0.989
0.6	0.9772	0.971	0.9865	0.9779	0.9925
0.7	0.9765	0.9635	0.9905	0.9768	0.992
0.8	0.9752	0.962	0.9892	0.9754	0.9915
0.9	0.9725	0.959	0.986	0.9723	0.9898

Table 8. Performance Breakdown by Insider Threat Category.

Category	Accuracy	Precision	Recall	F1-Score
Data Exfiltration	0.978	0.970	0.992	0.981
Sabotage	0.974	0.965	0.988	0.976
Privilege Misuse	0.972	0.960	0.986	0.973

Table 9. Robustness Analysis of Model Performance.

Model	Accuracy (Mean ± Std)	Precision (Mean ± Std)	Recall (Mean ± Std)	F1-Score (Mean ± Std)	ROC-AUC (Mean ± Std)
LSTM	0.975 ± 0.002	0.970 ± 0.003	0.985 ± 0.002	0.977 ± 0.002	0.990 ± 0.001
Proposed	0.9765 ± 0.0015	0.9635 ± 0.002	0.9905 ± 0.001	0.9768 ± 0.0018	0.9920 ± 0.001

Table 10. Threshold Sensitivity Analysis.

Threshold	Precision	Recall	F1-Score
0.5	0.94	0.995	0.967
0.6	0.955	0.992	0.973
0.65	0.9635	0.9905	0.9768
0.7	0.97	0.975	0.972

Table 11. Cross-Validation Performance Across Folds.

Fold	Accuracy	Precision	Recall	F1-Score	ROC-AUC
Fold 1	0.9758	0.9625	0.989	0.9756	0.9918
Fold 2	0.9762	0.963	0.9902	0.9764	0.9921
Fold 3	0.977	0.9642	0.991	0.9774	0.9923
Fold 4	0.9768	0.9638	0.9908	0.9771	0.9922
Fold 5	0.9765	0.9635	0.9905	0.9768	0.992

Table 12. McNemar Statistical Significance Comparison Between Proposed Framework and Baseline Models.

Model	PR-AUC (Mean Â ± Std)	MCC (Mean Â ± Std)	p-Value (vs. Proposed)	McNemar Test Result
Proposed Method	0.842 ± 0.012	0.781 ± 0.015	-	-
Logistic Regression	0.654 ± 0.018	0.580 ± 0.021	<0.001	Significant
Random Forest	0.765 ± 0.014	0.710 ± 0.018	0.002	Significant
LSTM	0.790 ± 0.011	0.725 ± 0.016	0.008	Significant
Autoencoder	0.815 ± 0.013	0.750 ± 0.014	0.021	Significant

Table 13. Performance under Imbalanced Data Distribution.

Metric	Value
PR-AUC	0.842
MCC	0.781
Precision	0.812
Recall	0.768

Table 14. Malicious-Class Performance Comparison Under Balanced and Realistic Insider-Threat Conditions.

Dataset	Precision (Malicious)	Recall (Malicious)	F1 (Malicious)
Insider Threat	0.989	0.991	0.99
CERT	0.854	0.812	0.8324

Table 15. Cross-Dataset Performance Evaluation.

Metric	Insider Threat Dataset	CERT r4.2
Accuracy	0.9765	0.965
Precision	0.9635	0.854
Recall	0.9905	0.812
F1-Score	0.9768	0.832
ROC-AUC	0.9920	0.965

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kuldeyev, N.; Mamyrbayev, O.; Akhmediyarova, A.; Yerzhan, A. Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks. Electronics 2026, 15, 2400. https://doi.org/10.3390/electronics15112400

AMA Style

Kuldeyev N, Mamyrbayev O, Akhmediyarova A, Yerzhan A. Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks. Electronics. 2026; 15(11):2400. https://doi.org/10.3390/electronics15112400

Chicago/Turabian Style

Kuldeyev, Nursultan, Orken Mamyrbayev, Ainur Akhmediyarova, and Assel Yerzhan. 2026. "Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks" Electronics 15, no. 11: 2400. https://doi.org/10.3390/electronics15112400

APA Style

Kuldeyev, N., Mamyrbayev, O., Akhmediyarova, A., & Yerzhan, A. (2026). Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks. Electronics, 15(11), 2400. https://doi.org/10.3390/electronics15112400

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Behavioural Biometrics and Session-Level Risk Monitoring for Insider Threat Detection in Enterprise Networks

Abstract

1. Introduction

Research Organization

2. Literature Review

Problem Statement

3. Methodology

3.1. Behavioural Feature Extraction

3.1.1. User Login Pattern Features

3.1.2. File Access Behavior

3.1.3. Network Activity Features

3.1.4. Email Communication Behavior

3.1.5. Categorical Feature Encoding

3.1.6. Feature Aggregation Procedure

3.2. Deep Learning-Based Behavioural Sequence Models

3.2.1. LSTM-Based Behavioural Sequence Model

3.2.2. Autoencoder-Based Anomaly Detection

3.2.3. Unified Insider Threat Score

3.3. Session-Level Behavioural Risk Monitoring with Temporal Smoothing

3.3.1. Session-Level Risk Accumulation Model

3.3.2. Dynamic Risk Escalation

3.3.3. Session-Level Authentication Decision

3.3.4. Threshold Selection and Optimization

4. Experimental Setup

4.1. Dataset Description

4.1.1. Dataset Overview

4.1.2. Imbalanced Data Evaluation Protocol

4.1.3. Data Structure and Features

4.1.4. Data Splitting Strategy

4.1.5. Data Leakage Prevention

4.2. Data Preprocessing

4.2.1. Data Cleaning

4.2.2. Data Normalization

4.2.3. Temporal Segmentation of User Activities

4.2.4. Feature Vector Construction

4.3. Computational Environment

4.4. Hyperparameter Configuration

4.5. Performance Evaluation Metrics

5. Results and Discussion

5.1. Behavioural Risk Score and Session Monitoring Analysis

5.2. Insider Threat Detection Performance Evaluation

5.3. Cross-Dataset Generalization Evaluation

5.4. Discussion

6. Conclusion and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI