1. Introduction
Ransomware attacks continue to cause significant disruptions for organizations, leading to costly impacts and loss of critical information. For example, the number of ransomware attacks on healthcare organizations more than doubled annually from 2016 to 2021 [
1]. Furthermore, the severity of ransomware attacks varies across sectors, with private organizations more likely to experience severe consequences compared with public ones [
2]. Cybercriminals are continually evolving their tactics to evade detection and intensify pressure on victims [
3].
As ransomware threats show no signs of slowing down, investing in cybersecurity has become a strategic imperative. A survey of 569 cybersecurity leaders found that 91% expect budget increases for technologies such as centralized monitoring, machine learning, artificial intelligence, and next-generation firewalls to enable faster threat detection and response [
4]. However, attackers can bypass detection using techniques like packers, crypters, polymorphism, and metamorphism [
5], highlighting the need for proactive strategies beyond detection alone.
This study focuses on predicting emerging ransomware threats by analyzing historical attack patterns. Using time-series analysis, we monitor the number of ransomware incidents over time, identify trends, seasonality, and noise, and forecast future threats. This approach enables organizations to allocate security resources effectively and mitigate potential breaches before they occur.
While most prior research has centered on detecting or preventing ransomware, few studies have explored forecasting future incidents using historical time-series data [
6]. Existing prediction techniques are often limited to specific ransomware families, narrow attack scopes, and small datasets [
7,
8,
9], reducing the generalizability of results. This study addresses these gaps by applying time-series analysis to a broader range of ransomware strains and technologies, including industry-specific trends.
We use publicly reported ransomware incidents from the Critical Infrastructure Ransomware Attacks (CIRA) dataset covering 2013–2024 to build and test our prediction models. Our contributions include: (1) employing time-series analysis that emphasizes recent data while accounting for trend, seasonality, and noise; (2) providing generalizable results by analyzing a diverse set of ransomware families; and (3) conducting the first industry-specific time-series analysis of ransomware attacks. These findings have practical implications for security teams and decision-makers, offering actionable insights to anticipate threats and optimize resource allocation.
The rest of this paper is organized as follows:
Section 2 reviews relevant literature;
Section 3 introduces the time-series methodology;
Section 4 describes data collection, model development, and performance evaluation; and
Section 5 presents the findings, implications, limitations, and future research directions.
2. Research Background and Literature Review
Prior research on ransomware countermeasures can be broadly categorized into detection, prevention, and prediction. Detection approaches focus on identifying malicious behaviors such as abnormal file operations and registry changes [
10]. While effective in recognizing known patterns, these methods are challenged by ransomware’s morphing nature and the continual emergence of new variants [
6,
11]. Prevention techniques, in contrast, emphasize proactive or reactive defenses during an attack [
12]. However, because they often rely on manually crafted procedures, they tend to be error-prone and struggle to keep pace with the rapid evolution of ransomware. Finally, ransomware prediction involves anticipating ransomware attacks by monitoring for early signs and identifying the likelihood of ransomware attacks [
13].
Prediction methods have gained increasing attention, aiming to anticipate ransomware activity before or during its early stages. Prior works in this space vary in focus. Some target deployment methods (e.g., Hull et al. [
14], who used classifiers to identify early warning signs of 18 ransomware families). Others concentrate on financial indicators, such as blockchain analytics, where Akcora et al. [
15] and Xu [
16] analyzed Bitcoin transactions to forecast ransomware families. A third stream emphasizes behavioral monitoring, as in Jeon et al. [
17], who applied deep learning to API call sequences in healthcare IoT, and Rhode et al. [
18], who used recurrent neural networks to detect ransomware within seconds of execution. Finally, several studies have applied time-series forecasting to ransomware activity. For example, Quinkert et al. [
7] leveraged malicious domain signals but restricted their scope to a single ransomware strain, whereas Albulayhi and Al-Haija [
19] adopted neural network regression on annual attack counts without decomposing time-series components such as trend or seasonality. More recent efforts, including Gazzan and Sheldon [
9], Mathane and Lakshmi [
8], Li [
20], and Gogineni et al. [
21], have explored context-aware frameworks, supply chain datasets, and deep neural models. Yet, these approaches often remain narrow in scope or lack empirical validation across diverse ransomware families.
In summary, while existing predictive studies have advanced ransomware forecasting, they share two key limitations: (1) reliance on restricted or non-representative datasets, limiting generalizability, and (2) insufficient attention to decomposing ransomware activity into fundamental components (e.g., trend, seasonality, noise), which are essential for understanding long-term patterns. Our study addresses these gaps by leveraging the CIRA global ransomware database and applying time-series analysis to uncover the structural components of ransomware attacks.
3. Time-Series Analysis
Time-series analysis refers to a set of statistical techniques and procedures used to extract information from time-ordered data. In the information security field, time-series analysis has been applied to predict various threats, including software vulnerabilities [
22], stages of cyber-intrusions [
23], and distributed denial-of-service attacks [
24]. While the analysis helps in understanding historical trends, patterns, forecasting future values, and making informed decisions [
25], it is not intended to quantify or explain the causal factors behind the behavior of the data. Unlike other statistical models that often treat observations as independent from each other, time-series models account for the temporal dependency of data points and recognize that current values are frequently influenced by their past values [
26]. In the context of this study, time-series analysis can help to identify key components of ransomware attacks such as the trend, seasonality, and noise. Specifically, it can identify upward or downward movements in ransomware attacks, fluctuations over fixed periods, fluctuations that exhibit cyclical behavior, and random variations not accounted for by other components [
27]. In this study, we focus on the following time-series models fitted to our dataset, namely, seasonal and trend decomposition using LOESS, the autoregressive integrated moving average, and exponential smoothing. It is worth noting that we tested several machine learning models, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), but they suffered from overfitting, which is probably due to having only 8 years of data.
Seasonal and Trend Decomposition Using LOESS (STL)
STL is a non-parametric smoothing technique which decomposes time-series data into three components: trend (
T), seasonal (
S), and residuals (noise) (
R) [
28]. By using locally estimated scatterplot smoothing (LOESS), a regression technique to produce smooth estimates of the three components, STL can extract seasonality and trend, as well as calculate residuals as unexplained variance or random noise in the ransomware data [
29]. Specifically, STL iteratively applies LOESS smoothing to refine and improve the estimates of the trend, seasonal, and residuals components of the original time series Y at time t. The flexibility of LOESS smoothing enables it to effectively handle complex time-series data. The decomposition equation can be expressed as follows:
STL can be combined with other methods, such as autoregressive integrated moving average (ARIMA) and exponential smoothing, to leverage the strengths of each method in forecasting [
30]. ARIMA is one of the most popular time-series forecasting models, which has been widely applied in research domains such as economics, finance, and healthcare. An ARIMA model uses lagged values and lagged forecast errors to model the data and it is represented as (p, d, q), where p indicates the number of autoregressive terms (AR), d represents the number of differences applied to achieve stationarity, and q denotes the number of moving average (MA) terms. For instance, an ARIMA(1,1,1) model suggests that the number of ransomware attacks at time t is dependent solely on the number of attacks at time
t − 1, the data have been differenced twice to achieve stationarity, and that the predicted number of ransomware attacks at time t is based solely on the past prediction errors at time
t − 1. Exponential smoothing, on the other hand, is another forecasting technique that assumes that future values are shaped by recent observations; therefore, it uses exponentially weighted averages of past observations and places greater emphasis on more recent data to predict future values [
31]. In the context of this study, exponential smoothing would place more weight on recent ransomware attacks than older data points. An exponential smoothing model (ETS) requires three parameters: error (E), trend (T) and seasonality (S), each of which can take one of three values: additive (A), multiplicative (M), or none (N). An ETS model determines whether trend and seasonality are independent of the series’ level or if they scale with the series’ level, leading to variations that grow or shrink proportionally to the overall magnitude [
32]. For example, the ETS (A,N,N) model indicates that forecasting ransomware attacks involves an error term, which is added to the previous ransomware observation, with no consideration for trend or seasonal patterns in the data.
By combining STL with ARIMA or exponential smoothing, STL handles non-constant seasonal patterns, while ARIMA or exponential smoothing captures any remaining correlations in the non-seasonal part of the data by independently fitting a model and forecasting the residual component of the decomposed time series [
30]. The general form of the STL-ARIMA or STL-ETS forecasting equation is as follows:
where:
is the forecasted value of the time series at time
t +
h,
h represents the number of steps into the future that are being predicted, and
are the forecasted trend, seasonal, and residual components.
5. Discussion and Implications
Based on the analysis, STL decomposition and time-series modeling proved effective for understanding patterns and trends and predicting future ransomware threats. Our models, built solely on counts of ransomware attacks, successfully captured the overall intensity of cyberattacks, making them a compelling approach for forecasting. However, given the relatively high residual noise in several sectoral models, this result should be viewed as suggestive of long-term structural dynamics rather than a definitive explanation of ransomware activity.
For the all-ransomware model, the decomposed components showed a flat trend. This may indicate that countermeasures by organizations, along with growing public awareness, are slowing the growth of threats. Alternatively, attackers might be shifting to other methods or targets, such as more sophisticated, harder-to-detect techniques [
38]. In the healthcare and public health sector, however, the trend is rising, consistent with previous studies [
1]. The value of sensitive medical data makes this sector especially attractive to cybercriminals. Thus, prevention should focus on addressing unpatched vulnerabilities through risk-based patch management [
39], and on training users to detect phishing and malicious emails [
40,
41]. Foundational security measures, such as honeypots [
42], incident response plans, and regular backup restoration drills, can further strengthen resilience and recovery. In contrast, governmental facilities, educational facilities, information technology, and critical infrastructure showed only minor trend contributions to time-series variability.
No seasonal component was detected, suggesting that ransomware attacks lack a regular monthly pattern. While other reports note more attacks on weekends and holidays [
43], the seasonality in our dataset may have been too small relative to noise or nonlinear components. Future research could examine whether daily, weekly, or quarterly patterns reveal seasonality.
Forecasts were effectively fitted for government facilities, healthcare and public health, educational facilities, and critical manufacturing, though accuracy varied. Models for government facilities and healthcare achieved reasonably accurate predictions, enabling strategic resource allocation and improved resilience. In contrast, predictions for educational facilities, information technology, and critical manufacturing were less accurate. One possible reason for the varying accuracy in the prediction models may be due to structural and operational factors. Heavy dependence on third-party vendors can introduce uncontrolled vulnerabilities that vary widely in security posture, whereas the breadth and complexity of a sector’s attack surface, shaped by legacy systems, interconnected devices, and distributed networks, can create irregular and harder-to-predict attack patterns [
44]. Furthermore, variations in organizational security maturity, incident-reporting practices, and regulatory oversight further contribute to differences in predictability. These factors manifest differently across sectors. For instance, healthcare and public health are relatively predictable because they maintain standardized IT practices, enforce regulatory compliance, and have established incident reporting mechanisms [
45]. Government facilities, on the other hand, exhibit moderate predictability due to structured networks, though third-party vendor dependencies and varying local security practices can introduce variability [
46]. However, educational facilities are less predictable because of diverse IT systems, frequent onboarding and offboarding of users, and inconsistent security enforcement across campuses [
47]. Similarly, the IT and critical manufacturing sectors are highly unpredictable due to complex, heterogeneous networks, broad attack surfaces, and reliance on legacy or industrial control systems (ICS) that are frequently targeted [
48]. These sector-specific characteristics may help to explain why some models produce higher forecast errors than others. Another possible reason is the absence of external variables such as economic conditions. Economic conditions may also influence a sector’s ability to maintain robust security infrastructure [
49]. Underfunded sectors often prioritize operational costs over security, making them more vulnerable and more willing to pay ransoms [
50]. Future work could incorporate additional external and structural variables into the modeling process to improve predictive accuracy and provide actionable, sector-tailored threat forecasts.
To maximize the utility of the proposed time-series forecasting models, we outline a four-step framework for embedding predictions into operational cybersecurity strategy:
Forecast Integration and Monitoring: Incorporate updated forecasts (e.g., monthly ransomware incident projections) into security dashboards where automated alerts can be used to flag when predicted attack volumes exceed historical baselines, enabling early situational awareness.
Risk Prioritization: Compare forecasted trends against the organization’s threat landscape, considering both projected attack volumes and potential business impact. Prioritize sectors, assets, or systems that combine high forecasted activity with high criticality, ensuring that limited resources are allocated to the most consequential risks.
Resource Allocation and Mitigation: Align staffing, budget, and technical defenses with predicted demand. For instance, scale up incident response capacity ahead of forecasted peaks, intensify patching cycles for at-risk systems, or pre-position backup and recovery resources where threat levels are expected to rise.
Evaluation and Feedback Loop: Measure actual attack occurrences against forecasted values to assess predictive accuracy. These results can be fed back into the time-series models to continuously improve accuracy and responsiveness. This ensures the forecasting system evolves with the threat landscape rather than remaining static.
Finally, the proposed forecasting model aligns with NIST’s Cybersecurity Framework (CSF) by strengthening the Identify and Protect functions through the early detection of emerging ransomware trends and high-risk sectors [
51]. By integrating these predictive capabilities into CSF implementation, it enables organizations to shift from reactive to proactive cybersecurity management.
This study contributes to both theory and practice. Theoretically, it extends the cybersecurity literature by focusing on proactive methods, as most prior research emphasizes reactive measures [
49]. We identify critical time-series components of ransomware incidents and introduce models capable of forecasting future threats. By demonstrating that ransomware can be anticipated through trends, we offer a systematic approach to signaling impending attacks. This addresses the call by Razaulla et al. [
6] for improved ransomware defenses. To our knowledge, this is the first study to apply STL decomposition and time-series analysis to publicly disclosed ransomware incidents.
Our results suggest four key implications. First, organizations should monitor ransomware trends to guide proactive risk management. For example, anticipating months with higher attack likelihood can improve preparedness. Healthcare and public health organizations, in particular, should strengthen security protocols, tools, and training to counter the upward trend. Second, our models can help to prioritize security investments and allocate resources efficiently. Third, because we used publicly available data, practitioners can replicate the models and monitor attack patterns over time. Fourth, the findings can inform policymakers in developing sector-specific regulations and guidelines to protect critical infrastructure.
Limitations and Future Research
We acknowledge several limitations in this study which provide ample opportunities for further research. Firstly, the data collected were restricted to the CIRA database, which contains only publicly reported ransomware attacks. Therefore, the generalizability of the findings could be limited given the voluntary nature of reporting ransomware threats. Future studies may benefit from the upcoming changes to the Cybersecurity and Infrastructure Security Agency’s (CISA’s) reporting requirements, which mandate certain sectors and organizations to report ransomware incidents [
52]. Additionally, future work may benefit from obtaining incident data from private cybersecurity companies and threat intelligence firms that offer subscription-based services. Secondly, our analysis was limited to a single independent variable, namely the monthly number of ransomware attacks, without considering other factors that may affect the predicative accuracy of the models. This is a limitation of ARIMA and exponential smoothing methods. Future research could benefit from using causal models or Autoregressive Integrated Moving Average with Exogenous Inputs (ARIMAX), which extends the traditional ARIMA model by incorporating multiple independent variables. Inputs variables that measure economic factors such as unemployment rate or variables that consider geopolitical factors such as sanctions or political tensions might help in predicting the number of ransomware attacks more accurately. Thirdly, the subanalyses were limited to the top five sectors with the highest number of ransomware attacks. While the dataset included other important sectors, they were excluded due to the low number of observations. As the CIRA database is continuously being updated, we anticipate that future research could examine additional sectors when more data becomes available.
Building on the findings of this study, subsequent research could explore whether external factors such as geopolitical events, economic factors, or industry-specific threats could increase the predictive accuracy of the forecasting models. Furthermore, prospective research can perhaps focus on building predictive model for specific ransomware types (e.g., Ransomware-as-a-Service (RaaS), ransomware families, or ransomware attack techniques (e.g., double extortion). Finally, subsequent studies could explore the use of hybrid models that combine traditional time-series methods with artificial intelligence techniques, such as deep learning, to improve the accuracy of ransomware attack predictions.