Next Article in Journal
Generative Neural Networks for Addressing the Bioequivalence of Highly Variable Drugs
Previous Article in Journal
Existence and Mittag–Leffler Stability for the Solution of a Fuzzy Fractional System with Application of Laplace Transforms to Solve Fractional Differential Systems
Previous Article in Special Issue
Early Risk Prediction in Acute Aortic Syndrome on Clinical Data Using Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques

School of Engineering and Computer Science, Laurentian University, Sudbury, ON P3E 2C6, Canada
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(5), 265; https://doi.org/10.3390/a18050265 (registering DOI)
Submission received: 11 February 2025 / Revised: 5 April 2025 / Accepted: 30 April 2025 / Published: 4 May 2025

Abstract

:
This study analyzes cancer trends in Canada using machine learning techniques to extract insights from extensive cancer data sourced from the Canadian Cancer Society and Statistics Canada. It aims to enhance the understanding of cancer epidemiology and inform better prevention, diagnosis, and treatment strategies. Data preprocessing addressed issues like missing values and normalization, ensuring reliability. The findings indicate a steady increase in new cancer cases, with estimates reaching 248,700 in 2026, up from 244,000 in 2022. Male incidence rates are projected to rise slightly to 602.3 per 100,000, while female rates may decline to 530.6. Regions such as Alberta, British Columbia, Ontario, and Quebec show rising incidence rates, contrasted by declines in Newfoundland and Labrador, Nunavut, and Yukon. Notably, this research reveals significant increases in cancer cases among individuals aged 60 and older, particularly those 70+. The hybrid ARIMA-LSTM model demonstrated superior forecasting accuracy compared with the other selected models. These findings offer valuable insights for health policymakers and highlight the potential of machine learning in public health forecasting, providing a framework for future research in other disease areas.

1. Introduction

Cancer is a multifaceted and devastating disease with a significant impact on individuals, communities, and economies. Its complex nature is due to genetic, environmental, and behavioral factors [1]. Despite substantial advances in research, cancer remains a global challenge, hampered by economic, political, and legislative barriers [2]. Advances in technology, such as cancer informatics and molecular genetics, have paved the way for new tools and methods for prevention, diagnosis, and treatment. However, this development’s ethical and social implications and the need for comprehensive global strategies to address the growing cancer burden require careful consideration [3].
Machine learning techniques have revolutionized computing by enabling the extraction of patterns and knowledge from large and complex datasets [4]. These techniques, which include supervised, unsupervised, semi-supervised, and reinforcement learning, are particularly effective in solving big data problems [5]. This field has made significant progress, especially in deep learning, which could analyze and learn from vast amounts of real data [6]. Therefore, ML is widely used in various industries to obtain relevant information for analysis [7].
Advanced technologies such as mitochondrial, epigenomic, and metabolic profiling offer significant potential in cancer epidemiology, especially in identifying at-risk populations and treatment responses [8]. However, using electronic health record data in oncology presents challenges, including missing or unstructured data elements [9]. AI techniques, including machine learning and deep learning, have shown promise in predicting and detecting cancer, sometimes better than doctors [10]. The use of big data in cancer treatment is promising. Still, it is hindered by incomplete and fragmented data, which can be solved by integrating health systems and using AI [11].
Recent studies have highlighted the potential of ML in cancer prediction and survival research [12,13]. These techniques transform healthcare by providing insights into patient care, operational efficiency, and cost reduction [14]. The application of ML in cancer research and care is particularly promising, with the potential to construct real-world data cohorts and improve predictive modeling [15]. However, concerns over patient data privacy and security, algorithmic bias, and needing trained individuals to interpret results remain important considerations [14].
In Canada, a country that is struggling with the ever-changing landscape of cancer diagnoses, it is critical to use state-of-the-art techniques to analyze incidence patterns. Canada’s diverse population, regional differences, and changing medical practices provide an exceptional environment for a thorough cancer data analysis. For delivering insightful information, proactive healthcare management, and policy creation, this research aims to utilize ML skills to identify current trends and predict future ones.
This study’s backdrop essentially stems from the realization that cancer is a complex, multifaceted phenomenon with implications for society and the economy in addition to being a medical problem. Despite significant advancements in oncology, the dynamic nature of cancer trends and disparities across populations demands more adaptive and predictive analytical tools. Given these challenges, leveraging machine learning—a powerful tool for analyzing large-scale, complex datasets—offers promising new avenues to more accurately understand and forecast cancer trends. By integrating machine learning’s computational capabilities with the intricacies of cancer data, this research aims to uncover new insights that may lead to tailored treatment plans, more efficient interventions, and, ultimately, a significant decrease in cancer incidence in Canada and other countries. Additionally, the research emphasizes the need for collaborative efforts across disciplines, bringing together oncology, data science, and public health experts to address the intricate challenges cancer poses on a global scale.

2. Related Works

The Canadian population and healthcare systems are significantly affected by cancer. It is the leading cause of death in Canada, as stated by multiple sources [16,17,18,19]. Studies have estimated that 43% of all Canadians are expecting to receive a cancer diagnosis in their lifetime [16,20]. As the population increases and ages, the number of new cancer cases and deaths in Canada is also growing [16,21]. Additionally, cancer is a costly disease, with the economic burden of cancer care in Canada rising from CAD 2.9 billion in 2005 to CAD 7.5 billion in 2012 annually, as reported by various sources [16,22].
Because cancer significantly impacts Canadian health and the economy, accurate and comprehensive surveillance data are critical for determining progress and allocating resources accordingly. The Canadian Advisory Committee on Cancer Statistics collaborates with the Canadian Cancer Society, Statistics Canada, and the Public Health Agency of Canada to produce the latest statistics on cancer surveillance in Canada [16].
Cancer data can take several years to catch up to the present due to the lengthy process of collecting, verifying, and analyzing the information. However, statistical models can project short-term incidence and mortality rates by extrapolating past trends. This provides a more current understanding of the cancer landscape in Canada, which is crucial for resource planning, research, and informing cancer control programs. Canadian Cancer Statistics 2021 offers detailed estimates of cancer incidence, mortality, and survival in Canada for 22 cancer types, broken down by age, sex, geographic region, and over time. Brenner et al. [16,17,18] also provided updated estimates for 2020, 2022, and 2024 for new cancer cases and deaths expected for all ages, broken down by sex, province, and territory.
To acquire the latest cancer incidence and mortality estimates, the study utilized the CANPROJ projection R package to project counts and rates until 2024. CANPROJ employs trends in factual data to determine the most suitable model for forthcoming years through a decision algorithm composed of a range of age-based, period-based, and cohort-based models [16,17,18,23].
According to Brenner [16], cancer case data from Quebec, starting from 2011, were unavailable as they had not yet been submitted to the Canadian Cancer Registry. Because only data up to 2010 were accessible for Quebec, estimates for cancer cases and incidence rates from 2011 to 2022 were generated by initially applying the cancer rates of Canada (excluding Quebec) to Quebec’s population. Based on the average rate for the rest of the country, adjusted Quebec rates were then modified using the ratio of sex- and age-specific cancer estimates for Quebec relative to the rest of Canada from 2006 to 2010. Additional correction factors were incorporated, considering provisional 2011 counts for cancers that are typically underreported, such as prostate and melanoma. These projections provided estimates for 22 cancer types, categorized by sex assigned at birth and geographic region (provinces and territories). The national estimates for Canada were derived by summing the individual projections for each province and territory. All incidence and mortality rates were age-standardized to the 2011 Canadian standard population using the direct method.
Brenner’s study [16] utilized historical cancer data from the National Cancer Reporting System (1984–1991) and the Canadian Cancer Registry (1992–2018), as well as mortality data from the Vital Statistics Canada Death Database (1984–2019). It applied the CANPROJ projection package, which uses trends in historical data to select the most appropriate model to predict future cancer incidence and mortality. This approach involved a decision algorithm from a series of six age-period-cohort (APC) models aimed to provide the most accurate projections up to 2022. For the province of Quebec, particular adjustments were made for incidence estimates due to missing data after 2010.
While Brenner’s studies do not specify numerical values for the accuracy of its projections, the methodology implies a reliance on historical trends and demographic data to make informed predictions. The projections estimated that there were an estimated 225,800 new cancer cases and 83,300 cancer deaths in 2020. This increased to 233,900 new cases and 85,100 deaths in 2022. Projections for 2024 indicate a further rise, with 247,100 new cancer cases and 88,100 cancer deaths expected [16,17,18].
While Brenner’s studies offered valuable cancer projections using the CANPROJ statistical framework, their approach primarily relies on historical trend extrapolation using age-period-cohort models. These models assume a consistent pattern in past data and select the best-fit trend-based model for projection. However, they do not incorporate modern performance evaluation metrics (e.g., MAE, RMSE, R2) and are limited in handling incomplete, noisy, or complex datasets.
In contrast, our study applies machine learning models—such as LSTM, ARIMA, Prophet, and a hybrid ARIMA-LSTM—which not only provide greater flexibility in modeling both linear and nonlinear relationships but also allow for rigorous evaluation through quantitative performance metrics. ML models are also better suited for large-scale healthcare data where variability and missing values are common. This methodological distinction enables our research to offer more accurate and adaptable forecasts for cancer incidence in Canada.

3. Materials and Methods

3.1. Research Design

This research employs a combination of quantitative methodologies to analyze cancer incidence trends in Canada, utilizing advanced machine learning models. By applying LSTM, Prophet, ARIMA, and hybrid ARIMA-LSTM models, the research aims to predict future cancer incidence rates and the number of new cases. This comprehensive approach allows for a deep understanding of cancer trends, contributing to better healthcare planning and policymaking. The study employs a longitudinal design, analyzing time series data of cancer incidence rates in Canada over multiple years. This approach facilitates the identification of trends and patterns in cancer rates, allowing for accurate predictions. Using multiple predictive models ensures the robustness and reliability of the forecasts, catering to the nonlinear and complex nature of the data. The study’s predictive nature addresses a significant gap in current knowledge, offering insights into potential future cancer rates and enabling proactive measures in public health strategies. Figure 1 shows the research process.
The dataset was divided into training (80%) and testing (20%) subsets to ensure robust model evaluation. This partitioning strategy ensures that the models are trained on historical cancer incidence data while reserving a separate portion for performance evaluation.
The models were trained using historical data (1992–2016) and validated with a test set (2017–2021). Once validated and finalized, each model (ARIMA, LSTM, Prophet, hybrid ARIMA-LSTM) was then used to forecast the number of new cancer cases and incidence rates sequentially for future years (2022 to 2026).
The study implemented a validation procedure for forecasting models, including time series cross-validation. Due to the sequential nature of data, traditional random cross-validation methods are not used, and 10-fold cross-validation must be used to verify the time series. This process is repeated by moving the cutoff point through the time series to ensure that each model’s performance is tested in different periods and conditions.
Evaluation metrics such as mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R-squared (R2) were calculated on the test set to assess prediction accuracy.

3.2. Data Source

This study obtained data on cancer incidence from the Canadian Cancer Registry Tabulation Master File, which Statistics Canada released on 31 January 2024. The table includes cancer cases diagnosed from 1992 to 2021 and is available by cancer type, region, age group, and sex.
The Canadian Cancer Registry (CCR) is a population-based registry with data collected and reported to Statistics Canada by each provincial/territorial cancer registry. This person-based system aims to collect information about each new primary cancer diagnosed among Canadian residents since 1992 [24]. Cancer incidence refers to the number of new cancers in a population during a specific period (usually a year). It is generally expressed as the number of new cancer cases per 100,000 population. Data presented in [25] were age-adjusted using the 2011 Canadian standard population to ensure accuracy and consistency [24].

3.3. Data Preprocessing

Prior to model training and evaluation, several preprocessing steps were applied to ensure data quality and model compatibility:

3.3.1. Handling Missing Data

Quebec’s cancer incidence data was unavailable after 2017 due to incomplete sub-missions to the Canadian Cancer Registry. To address this gap, Quebec’s missing incidence data from 2018 to 2022 were imputed using a multiple imputation strategy based on available historical trends and region-specific cancer patterns. Other provinces with minor missing values, less than 3% of the data, were handled using multiple imputations again.

3.3.2. Normalization

To ensure consistency across features and models, all incidence rate and case count values were normalized using min-max scaling to a range between 0 and 1. This was especially important for neural network models (e.g., LSTM), which are sensitive to the scale of input features. Normalization helped stabilize the training process and improve model convergence.

3.3.3. Temporal Indexing

Dates were converted into a standardized time index to ensure uniform time series input formats across all models, especially for ARIMA and Prophet, which require a consistent datetime structure.

3.3.4. Stationarity Checks

For models like ARIMA and ARIMA-LSTM, stationarity was assessed using the Augmented Dickey–Fuller test. Non-stationary series were differenced as required to satisfy model assumptions.
These preprocessing steps were essential for maintaining data integrity and enabling fair comparison across diverse model architectures.

3.4. Model Selection

To investigate the trends in cancer incidence in this study, four popular machine learning algorithms and statistical techniques for time series prediction were selected based on their proven track record in time series forecasting: Prophet, long short-term memory (LSTM) networks, autoregressive integrated moving average (ARIMA), and the hybrid ARIMA-LSTM model.

3.4.1. Long Short-Term Memory (LSTM)

Introduced by Hochreiter and Schmidhuber in 1997 [26], LSTM networks are a variant of RNNs that overcome the shortcomings of standard RNNs. These disadvantages include poor performance in handling long-term dependencies and the vanishing or exploding gradient problem. In 1999, a forgotten gate was added to LSTM to restore cell memory, improve the original structure, and become the standard structure for LSTM networks. Unlike deep feedforward neural networks (DFNN), LSTMs contain feedback connections and can process data sequences, not just individual data points such as vectors or arrays [27].
In LSTM networks, the fundamental building block is called a memory block or LSTM unit. Composed of a cell that acts as the memory component and three gates (input, output, forget/keep), these units can retain information over arbitrary periods. The gates of the LSTM unit are responsible for regulating the flow of data through the cell. One of the most prominent features of the LSTM cell is the “constant error carousel” (CEC). An LSTM network is structured similarly to an RNN, except the hidden layers comprise memory blocks instead of neurons [27].
Input gate: The unit features a sigmoidal function that regulates the inflow of data into the cell. It obtains activation from the previous output h(t−1) and the current input x(t). By means of the sigmoid function, an input gate produces values ranging from zero to one. A value of zero acts as a complete blockage of information, while a value of one permits the passage of all information. Equation (1) shows this process [27].
i t = σ ( W i x   x t + U i h   h t 1 + b i )
Cell input layer: The input to the cell is like the input gate. It takes in the previously hidden state h(t−1) and the current input x(t). However, a “tanh” activation function is used to squash the input values to a range between −1 and 1, which is indicated by the symbol lt in the Equation (2) [27].
l t = t a n h   ( W l x   x t + U l h   h t 1 + b l )
Forget gate: A unit using a sigmoidal function decides what information from previous steps of the cell should be remembered or discarded. The forget gate takes input from h(t−1) and x(t) and assumes values between zero and one. The next step involves a Hadamard product with the old cell state ct−1 to update to a new cell state ct in the below equation. If the forget gate has a value of zero, it is closed and will completely forget the information of the old cell state ct−1. On the other hand, the value of one will make all information memorable. Thus, the forget gate has the authority to reset the cell state if the old data are deemed irrelevant. Equation (3) summarizes the forget gate mechanism [27].
f t = σ ( W f x   x t + U f h   h t 1 + b f )
Cell state: The cell state is responsible for storing a cell’s memory over an extended period. Each cell contains a self-connected linear unit known as a constant error carousel (CEC), which recurrently operates to prevent the vanishing or exploding gradient problem in an LSTM network. The CEC incorporates a forget gate that regulates and resets the gate as needed. At time t, the present cell state ct is modified by the previous cell state ct−1, controlled by the forget gate, and the current input and cell input product (it ∘ lt). Equation (4) summarizes the overall update of a cell state [27].
c t = f t c t 1 + i t l t
Output gate: A unit equipped with a sigmoidal function has the ability to regulate the passage of information from a cell. In contrast, an LSTM network utilizes the output gate values at a particular time (represented by ot) to govern the present cell state ct, which is then stimulated by a “tanh” function to produce the ultimate output vector h(t). Equations (5) and (6) show the mechanism of the output gate [27].
o t = σ ( W o x   x t + U o h   h t 1 + b o )
h t = o t tanh c t

3.4.2. Facebook Prophet Forecasting Model

Facebook’s Prophet network is a strong tool that can accurately predict time-series data using daily observations of patterns at different scales. Forecast time-series data are derived from additional models, where nonlinear trends account for seasonal, weekly, and daily periods, including holiday results. This tool works best during periods with stable seasonal results and a few seasons of historical data. Prophet deals with missing data and trend changes and can often adapt to deviations. It allows users to make more accurate forecasts faster than other time series forecasting strategies, requiring very little computer time. Prophet is at the level of other models and quickly generates predictions in seconds. This tool can be used to generate accurate weather forecasts even from incomplete or black data without manual work. In addition, the Prophet has many “human” seasons of the week and year [28,29].
As a powerful tool for Python and R released in 2017 by Facebook, Prophet models time series datasets with trends, seasonality, and holidays. Prophet takes a few seconds to fit the model with tunable parameters, and it is represented by the Formula (7) [30]:
y t = g t + s t + h t + ϵ t
The equation provides a comprehensive prediction formula for the Prophet forecasting model, wherein the anticipated outcome, y(t), is determined by the linear or logistic equation, g(t); seasonality based on the chosen period, such as yearly, monthly, or daily, is denoted as s(t); holiday-related anomalies are denoted as h(t); and unforeseen errors are denoted as ϵt. The model encompasses multiple parameters that can be fine-tuned to enhance forecasting accuracy. Depending on the intended use, the model can be classified as linear or logistic. Linear models normalize outliers and do not impose a maximum or minimum threshold. Conversely, logistic models are suitable for saturated forecasts that require defining the highest and lowest values [30].

3.4.3. Autoregressive Integrated Moving Average (ARIMA)

ARIMA, a classic statistical model, uses patterns such as trends and seasonality to predict future scores in a series. It is a generalized version of the ARMA model specially designed to handle non-stationary time series. Unlike the ARMA model, which assumes stationarity of the analyzed time series, non-stationary time series must first undergo a transformation process to remove seasonality and trends through finite-point differentiation. A stationary time series is a combination of signal and noise. The ARIMA model separates the time signal from the noise and gives its forecast for a later time point. As indicated by the method’s acronym, its structural components are the following [31,32]:
AR for autoregression: a regression model that uses the dependence relationship between an observation and several lagged observations (model parameter p).
I for integration: calculating the differences between observations at different time points (model parameter d), aiming to make the time series stationary.
MA for moving average: this approach considers the dependence that may exist between observations and the error terms created when a moving average model is used on observations that have a time lag (model parameter q) [33].
One way to represent an AR model with order p, or AR (p), is through a linear process, as shown in Equation (8). The stationary variable is denoted by x and the constant by c. The autocorrelation coefficients at lags 1, 2, …, p are represented by ∅, while the residuals are the Gaussian white noise series with a mean of zero and a variance of σ [34].
x t = c + i = 1 p i x t i + ε t
Equation (9) represents an MA (q) order model, wherein the θ terms denote the weights given to the current and previous values of a stochastic term in the time series. Here, μ is the expectation of x and is generally assumed to be zero, while θ equals one. We consider ε to be a Gaussian white noise series with a mean of zero and variance of σ [34].
x t = μ + i = o q θ i ε t i
Equation (10) shows how these two models are combined to create an ARMA model of order (p, q), where ∅ ≠ 0, θ ≠ 0, and σ > 0. The parameters represent the AR and MA orders p and q, respectively. ARIMA forecasting, also known as Box and Jenkins forecasting, can handle non-stationary time series data due to its “integration” step. This step involves differencing the time series, converting a non-stationary time series into a stationary one [34].
x t = c + i = 1 p i x t i + ε t + i = o q θ i ε t i

3.4.4. ARIMA-LSTM Hybrid Model

The hybrid ARIMA-LSTM model is designed to capture the linear and nonlinear aspects of time series data. The time series must be stationary to apply the ARIMA model and predict future values. It should be checked with the Dickey–Fuller test to see if it is in place, and it should be performed if it is not already. Then, optimal parameters will be found to build the model, and finally, predictions will be made using the built model. LSTM works well for non-stationary parts of data as well as has relatively larger memory. The residuals obtained from ARIMA are fed into an LSTM model and trained to tap the pattern and predict the residuals for the next future period [35].
These two models have been chosen for their ability to break down a time series into linear and nonlinear trends, as expressed by Equation (11). In this equation, Lt illustrates the linear component of the time series at time step t. Also, Nt defines the nonlinear component, and ɛt represents the error term in the xt time series [36].
xt = Lt + Nt + ɛt
The reason for choosing different types of algorithms to conduct the research was to include the wide range of data found in cancer rate numbers and new case amounts and incidence rates with different qualities. LSTM networks provide flexibility in understanding long-term connected changes and complex patterns common in information. As a time-series analysis model, Prophet’s ability to deal with unexpected values, missing data, and shifting trends is beneficial for the unpredictable nature of healthcare information, like cancer-related datasets. As a statistical analysis model, ARIMA is strong at examining and predicting linear sequences and is vital for ensuring that a complete analytical method can compare against more complex models.

3.5. Evaluation Metrics

This study selected a suite of evaluation metrics that best represent the models’ forecasting accuracy, reliability, and applicability in a real-world context to comprehensively assess machine learning models, including LSTM, Prophet, and ARIMA, deployed in forecasting cancer incidence trends in Canada.
Different criteria, such as forecast error measurements, the speed of calculation, interpretability, and others, have been used to assess forecasting quality, where y is the measured value at time t. Forecast error measures or forecast accuracy are the most important in solving practical problems. Typically, the commonly used forecast error measurements are applied to estimate the quality of forecasting methods and choose the best forecasting mechanism for multiple objects. Despite their drawbacks, a set of “traditional” error measurements in every domain is applied. These error measurements are used as presets in domains despite the drawbacks. This section provides an analysis of existing and quite common forecast error measures that are used in forecasting. Measures are divided into groups according to the calculating method and value of error for a specific time t. The formula for calculating and the names of assessments are considered for each error measure [37].

3.5.1. Mean Absolute Error (MAE)

MAE measures the average magnitude of absolute errors between actual and predicted values. Lower MAE values indicate better accuracy. It is widely used in forecasting tasks because it provides an intuitive measure of prediction error in the same unit as the data.
Mean Absolute Error, is given by [37]:
M A E = 1 n   i = 1 n F i A i
where n is the number of observations, Fi is the forecasted value for observation i, and Ai is the actual value for observation i.

3.5.2. Mean Squared Error (MSE)

MSE calculates the average squared difference between actual and predicted values. MSE penalizes significant errors more heavily, making it useful for detecting significant deviations in predictions.
Mean Squared Error, is given by [37]:
M S E =   1 n     i = 1 n ( F i   A i   ) 2
where n is the number of observations, Fi is the forecasted value for observation i, and Ai is the actual value for observation i.

3.5.3. Root Mean Squared Error (RMSE)

RMSE is the square root of MSE, which provides an interpretable error value in the same unit as the target variable. RMSE is sensitive to large errors and is commonly used in forecasting applications to measure model performance.
Root Mean Squared Error is given by [37]:
R M S E =   1 n     i = 1 n ( F i   A i   ) 2
where n is the number of observations, Fi is the forecasted value for observation i, and Ai is the actual value for observation i.

3.5.4. Mean Absolute Percentage Error (MAPE)

MAPE expresses the prediction error as a percentage of actual values, making it easier to interpret in relative terms. As it provides a percentage-based error measure, MAPE is particularly useful for comparing errors across datasets with different scales.
Mean Absolute Percentage Error is given by [37]:
M A P E = 1 n   t = 1 n A i F i A i
where n is the number of observations, Fi is the forecasted value for observation i, and Ai is the actual value for observation i.

3.5.5. Coefficient of Determination (R2)

R2 represents the proportion of variance in the dependent variable that is predictable from the independent variables.
The Coefficient of Determination is given by:
R 2 = 1 R S S T S S
R², or the coefficient of determination, is a widely used metric in regression analysis. It measures the proportion of the variance in the dependent variable that is predictable from the independent variables. However, it has limitations, such as being biased statistics and providing invalid results in the presence of measurement errors. Therefore, while R-squared is a valuable metric, it should be interpreted cautiously and in conjunction with other criteria [38]. It is widely used in regression and forecasting models to indicate how well the model explains the variability in the data. A value closer to 1 indicates a better fit.
R2 is not always suitable for time series forecasting due to the complex nature of time series data and the potential for model uncertainty. Goel [39] highlights the need for hybrid models that can capture both linear and nonlinear components in time series data, suggesting that a single metric like R2 may not adequately capture the predictive performance. Chatfield [40] further emphasizes the importance of considering model uncertainty in time series analysis, which can affect the accuracy of forecasts. Hewamalage [41] underscores the need for robust and efficient forecasting methods, indicating that R2 may not be the most suitable metric in all cases. Lastly, Kim [42] discusses the impact of estimation error on forecast mean squared errors, which can also affect the reliability of R2 in time series forecasting.
The limitations mentioned regarding the R2 metric, such as potential bias and sensitivity to measurement errors, are intended to highlight practical issues related to real-world model interpretation and data quality considerations. Specifically, R2 can be influenced by factors such as model specification, omitted variables, and data inaccuracies, rather than being inherently flawed mathematically. Therefore, R2 values should always be interpreted with careful attention to these practical and contextual limitations, especially in complex forecasting scenarios.

4. Results

4.1. Performance of the Models

After deploying all models for every category, evaluation metrics were extracted to assess the models’ performance. The results help us to choose the best models for categories. This section gives the error rates for each model for forecasting new cancer cases and incidence rates for 2022 to 2026 for different categories. The models were trained on data from 1992 to 2021.
It is important to note that negative R2 values, as observed for the Prophet model in our analysis, indicate that the model does not improve predictive accuracy compared with a baseline mean-only (intercept-only) model. Such results do not inherently suggest that the Prophet model is incorrectly specified or invalid. Rather, these negative values highlight that, within the specific context of our forecasting problem, the Prophet model’s predictive strength was comparatively weaker than the other evaluated models (ARIMA, LSTM, and Hybrid ARIMA-LSTM). Therefore, R2 values, particularly negative ones, should be interpreted cautiously as indicators of relative model performance rather than absolute measures of model validity.

4.1.1. Error Rates for Geography Categories

Table 1 shows the error rates for new cancer cases and cancer incidence rates in geographical regions. The error rates have been interpreted to indicate the best models, which have been highlighted in bold and italic. The error rates show that the hybrid model performs the best in forecasting new cancer cases and cancer incidence rates for geographic regions.

4.1.2. Error Rates for Age Categories

Table 2 shows the error rates for new cancer cases and cancer incidence rates in age group categories. The error rates have been interpreted to indicate the best models, indicated by bold and italic highlights. The error rates show that the hybrid model performs the best in forecasting new cancer cases and cancer incidence rates for age group categories.

4.1.3. Error Rates for Sex Categories

Table 3 shows the error rates for new cancer cases and cancer incidence rate in sex categories. The error rates have been interpreted to indicate the best models, which have been highlighted in bold. The error rates show that the LSTM model performs the best in forecasting new cancer cases for sex categories.

4.2. Forecasting New Cancer Cases and Incidence Rate

Among the other models, the hybrid model is the optimum model for predicting the values for geography categories from 2022 to 2026. Table 4 shows the results extracted from the mentioned model.
Compared with the other models, the hybrid model is the optimum model for predicting the values for age group categories from 2022 to 2026. Table 5 demonstrates the results extracted from the model mentioned.
Compared with the other models, the LSTM model is the best model for forecasting the values for sex-based categories from 2022 to 2026. Table 6 shows the results extracted from the model mentioned.

4.3. Visualization Insights

Figure 2, Figure 3 and Figure 4 visually show the results of forecasted new cancer cases from 2022 to 2026 for regions, age groups, and sex, respectively.
Also, Figure 5, Figure 6 and Figure 7 visually show the results of the forecasted cancer incidence rate per 100,000 people from 2022 to 2026 for regions, age groups, and sexes, respectively.

5. Discussion

The findings of this study align with previous research on cancer incidence forecasting in Canada but extend existing methodologies by incorporating machine learning (ML) models to enhance predictive accuracy. Past studies, such as those by Brenner et al. [16,17,18], have relied heavily on statistical models like CANPROJ, which use age-period-cohort (APC) frameworks to extrapolate trends. While effective in capturing historical linear patterns, such models assume that past trends will continue predictably, limiting their ability to reflect complex, nonlinear dynamics in cancer data. In contrast, the machine learning models used in this study, particularly the hybrid ARIMA-LSTM model, are adaptive and data-driven, capable of learning from fluctuations and underlying patterns that may not follow a strict statistical progression.
For instance, Brenner et al. projected new cancer cases up to 2024 using historical data and demographic adjustments. Our approach not only extends these projections to 2026, offering a more updated outlook, but also improves accuracy by integrating deep learning with statistical time series models. The hybrid ARIMA-LSTM model consistently yielded lower error rates than other models, demonstrating that this combined approach effectively captures both linear and nonlinear trends in cancer incidence forecasting.
The analysis reveals distinct patterns and trends across sex, geography, and age groups. Males exhibit a steady increase in both new cancer cases and incidence rates, while females experience a rise in absolute cases but a decline in incidence rates.
The observed trend in female cancer incidence—where absolute cancer cases are increasing while incidence rates are declining—suggests demographic shifts, notably population growth and aging, rather than improvements in cancer treatments alone. Additionally, improvements in screening programs and early detection efforts (such as breast and cervical cancer screening) documented in the existing literature likely play a role in stabilizing or reducing incidence rates for certain cancer types.
Brenner et al. confirmed that despite an overall decline in cancer incidence and mortality rates, Canada is expected to see an increase in new cancer cases and deaths in 2024, primarily due to its growing and aging population. While advancements in prevention, screening, and treatment have mitigated the effects of certain cancers, these near-term projections emphasize the ongoing burden cancer may pose to individuals and the Canadian healthcare system [18].
Geographically, regions like Alberta, British Columbia, Ontario, and Quebec show increasing cancer cases and incidence rates, indicating a growing cancer burden. In contrast, provinces such as Newfoundland and Labrador demonstrate a declining trend, possibly reflecting effective local cancer control measures or public health strategies. Age-wise, cancer incidence remains stable among children but increases significantly among middle-aged and older adults, especially in individuals aged 75 and above.
These findings underscore the need for targeted public health interventions explicitly tailored to demographic groups and geographic regions showing an increased cancer burden. The steady rise in cancer cases among older adults and specific provinces highlights the importance of developing age-specific and region-specific cancer prevention and control strategies. Meanwhile, stability in pediatric cancer incidence suggests that existing pediatric cancer strategies may be effective but require sustained support.
Given the complexity of cancer incidence trends, selecting an accurate forecasting model is critical for public health planning. An evaluation of model performance reveals that the hybrid ARIMA-LSTM model consistently achieves the lowest error rates across most geographic and age group categories, making it the most suitable choice for cancer incidence forecasting. This model outperforms the ARIMA, LSTM, and Prophet models in key accuracy metrics, including MAE, MSE, RMSE, MAPE, and R².

5.1. Model Performance Across Demographics and Regions

Geographic performance: The hybrid ARIMA-LSTM model yields significantly lower RMSE values across multiple provinces, with R2 values exceeding 0.80 in most cases, indicating strong predictive accuracy. In contrast, the Prophet model exhibits the highest error rates, often producing negative R² values, suggesting poor model fit.
Age group performance: The hybrid model provides the most accurate predictions for middle-aged and older adults (50+ years), achieving the lowest MAPE and RMSE values. In contrast, the Prophet model underperforms in most age categories.
Gender performance: While the hybrid model excels in most categories, LSTM performs best in gender-based forecasting, particularly for male and female cancer incidence rates, as indicated by its superior MAE and RMSE values.
The hybrid ARIMA-LSTM model’s superior performance is attributed to its ability to capture both linear and nonlinear trends in cancer incidence data. ARIMA alone is well suited for handling linear trends but struggles with complex patterns, whereas LSTM excels in capturing long-term dependencies. By integrating these approaches, the hybrid model effectively mitigates their individual limitations, resulting in more accurate and reliable cancer incidence forecasts.

5.2. Implications for Public Health and Cancer Prevention

The findings highlight the critical role of advanced forecasting models in improving cancer prevention and resource allocation strategies. Rising incidence rates in middle-aged and older adults necessitate robust screening programs and preventive measures tailored to these demographics. Similarly, region-specific public health initiatives that address unique risk factors are essential to mitigating the cancer burden. By leveraging accurate forecasting models, policymakers and healthcare professionals can develop more effective intervention strategies, optimize resource allocation, and ultimately improve cancer outcomes across diverse populations.
Our study provides a foundation for further integrating AI-driven forecasting models in epidemiological research. Future research could explore ensemble methods, combining traditional epidemiological models (e.g., APC) with deep learning architectures to refine cancer incidence predictions further. Additionally, incorporating real-time clinical and genetic data could enhance prediction accuracy and improve personalized cancer risk assessments.

5.3. Limitations

One of the key limitations of this study is the imputation of Quebec’s cancer incidence data after 2017 due to the absence of official records. While we employed a multiple imputation approach to enhance accuracy and capture uncertainty, projections remain dependent on historical trends and national averages rather than direct registry data. The variability in imputed estimates reflects the inherent uncertainty in forecasting missing data. Future studies should integrate officially updated Quebec data when available to refine predictive models further and validate imputed estimates.
Also, this study analyzes cancer incidence as a whole, whereas prevention programs typically target specific cancers. Future research should extend this analysis to individual cancer types, allowing for a more detailed examination of how prevention efforts impact site-specific cancer trends.
Moreover, our study assumes that biological and environmental factors primarily drive cancer incidence trends, but access to healthcare services plays a crucial role in early detection and diagnosis. Screening disparities, specialist availability, and healthcare system delays may introduce biases in observed cancer incidence rates across age groups and regions. Future studies should incorporate healthcare accessibility indices to better quantify the role of healthcare infrastructure in shaping cancer trends.

Author Contributions

Conceptualization, E.K. and K.P.; methodology, E.K. and K.P.; software, E.K.; validation, E.K. and K.P.; formal analysis, E.K.; investigation, E.K.; resources, E.K. and K.P.; data curation, E.K.; writing—original draft preparation, E.K.; writing—review and editing, E.K. and K.P.; visualization, E.K.; supervision, K.P.; project administration, K.P.; funding acquisition, K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in the Statistics Canada portal, published on 31 January 2024 at this link https://www150.statcan.gc.ca/n1/daily-quotidien/240131/dq240131d-eng.htm.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Kibbe, W.A.; Klemm, J.D.; Quackenbush, J. Cancer Informatics: New Tools for a Data-Driven Age in Cancer Research. Cancer Res. 2017, 77, e1–e2. [Google Scholar] [CrossRef] [PubMed]
  2. Biemar, F.; Foti, M. Global progress against cancer—Challenges and opportunities. Cancer Biol. Med. 2013, 10, 183–186. [Google Scholar] [CrossRef] [PubMed]
  3. Sikora, K. Developing a global strategy for cancer. Eur. J. Cancer 1999, 35, 24–31. [Google Scholar] [CrossRef] [PubMed]
  4. Ivanović, M.; Radovanović, M. Modern machine learning techniques and their applications. In Electronics, Communications and Networks IV; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  5. Rathor, A.; Gyanchandani, M. A review at Machine Learning algorithms targeting big data challenges. In Proceedings of the 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), Mysuru, India, 15–16 December 2017; pp. 1–7. [Google Scholar] [CrossRef]
  6. Nguyen, G.T.; Dlugolinsky, S.; Bobák, M.; Tran, V.D.; López García, Á.; Heredia, I.; Malík, P.; Hluchý, L. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: A survey. Artif. Intell. Rev. 2019, 52, 77–124. [Google Scholar] [CrossRef]
  7. Udousoro, I.C. Machine Learning: A Review. Semicond. Sci. Inf. Devices 2020, 2, 5–14. [Google Scholar] [CrossRef]
  8. Verma, M.; Khoury, M.J.; Ioannidis, J.P. Opportunities and Challenges for Selected Emerging Technologies in Cancer Epidemiology: Mitochondrial, Epigenomic, Metabolomic, and Telomerase Profiling. Cancer Epidemiol. Biomark. Prev. 2012, 22, 189–200. [Google Scholar] [CrossRef]
  9. Berger, M.L.; Curtis, M.D.; Smith, G.; Harnett, J.; Abernethy, A.P. Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol. 2016, 12, 1261–1274. [Google Scholar] [CrossRef]
  10. Gupta, S.; Gupta, A.; Kumar, Y. Artificial intelligence techniques in Cancer research: Opportunities and challenges. In Proceedings of the 2021 International Conference on Technological Advancements and Innovations (ICTAI), Tashkent, Uzbekistan, 10–12 November 2021; pp. 411–416. [Google Scholar] [CrossRef]
  11. Schlick, C.J.; Castle, J.P.; Bentrem, D.J. Utilizing Big Data in Cancer Care. Surg. Oncol. Clin. N. Am. 2018, 27, 641–652. [Google Scholar] [CrossRef]
  12. Shweta; Riya; Kumar, A. Cancer Prediction Using Machine Learning Algorithm. Int. J. Sci. Res. (IJSR) 2022, 11, 873–875. [Google Scholar] [CrossRef]
  13. Kaur, I.; Doja, M.N.; Ahmad, T. Data mining and machine learning in cancer survival research: An overview and future recommendations. J. Biomed. Inform. 2022, 128, 104026. [Google Scholar] [CrossRef]
  14. Shruti Trivedi, N.K. Predictive Analytics in Healthcare using Machine Learning. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
  15. Meropol, N.J.; Donegan, J.; Rich, A.S. Progress in the Application of Machine Learning Algorithms to Cancer Research and Care. JAMA Netw. Open 2021, 4, e2116063. [Google Scholar] [CrossRef] [PubMed]
  16. Brenner, D.R.; Poirier, A.; Woods, R.R.; Ellison, L.F.; Billette, J.M.; Demers, A.A.; Zhang, S.X.; Yao, C.; Finley, C.; Fitzgerald, N.; et al. Canadian Cancer Statistics Advisory Committee. Projected estimates of cancer in Canada in 2022. CMAJ 2022, 194, E601–E607. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  17. Brenner, D.R.; Weir, H.K.; Demers, A.A.; Ellison, L.F.; Louzado, C.; Shaw, A.; Turner, D.; Woods, R.R.; Smith, L.M. Projected estimates of cancer in Canada in 2020. Can. Med. Assoc. J. 2020, 192, E199–E205. [Google Scholar] [CrossRef] [PubMed]
  18. Brenner, D.R.; Gillis, J.; Demers, A.A.; Ellison, L.F.; Billette, J.-M.; Zhang, S.X.; Liu, J.L.; Woods, R.R.; Finley, C.; Fitzgerald, N.; et al. Projected estimates of cancer in Canada in 2024. Can. Med. Assoc. J. 2024, 196, E615–E623. [Google Scholar] [CrossRef]
  19. Table 13-10-0394-01: Leading Causes of Death, Total Population, by Age Group. Statistics Canada: Ottawa, ON, Canada, 2025; Available online: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310039401 (accessed on 4 August 2018).
  20. Canadian Cancer Statistics Advisory Committee in Collaboration with the Canadian Cancer Society Statistics Canada and the Public Health Agency of Canada. Canadian Cancer Statistics; Canadian Cancer Society: Toronto, ON, Canada, 2021; Available online: www.cancer.ca/Canadian-Cancer-Statistics-2021-EN (accessed on 28 March 2022).
  21. Xie, L.; Semenciw, R.; Mery, L. Cancer incidence in Canada: Trends and projections (1983–2032). Health Promot. Chronic Dis. Prev. Can. 2015, 35 (Suppl. S1), 2–186. [Google Scholar] [CrossRef]
  22. de Oliveira, C.; Weir, S.; Rangrej, J.; Krahn, M.D.; Mittmann, N.; Hoch, J.S.; Chan, K.K.W.; Peacock, S. The economic burden of cancer care in Canada: A population-based cost study. CMAJ Open 2018, 6, E1–E10. [Google Scholar] [CrossRef]
  23. Qiu, Z.; Hatcher, J. Cancer Projection Analytical Network Working Team CANPROJ: The Rpackage of Cancer Projection Methods Based on Generalized Linear Models for Age Period/or Cohort Version, I; Alberta Health Services: Edmonton, AB, Canada, 2013. [Google Scholar]
  24. Government of Canada, S.C. Cancer Incidence in Canada, 2021. The Daily. Available online: https://www150.statcan.gc.ca/n1/daily-quotidien/240131/dq240131d-eng.htm (accessed on 31 January 2024).
  25. Government of Canada, S.C. Canadian Cancer Registry—Age-Standardization: Incidence; Government of Canada, Statistics Canada: Ottawa, ON, Canada, 2025; Available online: https://www.statcan.gc.ca/en/statistical-programs/document/3207_D12_V4 (accessed on 17 November 2021).
  26. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  27. Emmert-Streib, F.; Yang, Z.; Feng, H.; Tripathi, S.; Dehmer, M. An Introductory Review of Deep Learning for Prediction Models with Big Data. Front. Artif. Intell. 2020, 3, 4. [Google Scholar] [CrossRef]
  28. Kaninde, S.; Mahajan, M.; Janghale, A.; Joshi, B. Stock Price Prediction using Facebook Prophet. ITM Web Conf. 2022, 44, 3060. [Google Scholar] [CrossRef]
  29. Korstanje, J. The Prophet Model. In Advanced Forecasting with Python; Apress: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
  30. Shen, J.; Valagolam, D.; McCalla, S. Prophet forecasting model: A machine learning approach to predict the concentration of air pollutants (PM2.5, PM10, O3, NO2, SO2, CO) in Seoul, South Korea. PeerJ 2020, 8, e9961. [Google Scholar] [CrossRef]
  31. Rundo, F.; Trenta, F.; di Stallo, A.L.; Battiato, S. Machine learning for quantitative finance applications: A survey. Appl. Sci. 2019, 9, 5574. [Google Scholar] [CrossRef]
  32. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
  33. Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.; Matsopoulos, G.K. A Review of ARIMA vs. Machine Learning Approaches for Time Series Forecasting in Data Driven Networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
  34. Sima, S.N.; Akbar, S.N. Forecasting Economics and Financial Time Series: ARIMA vs. LSTM. arXiv 2018, arXiv:1803.06386. [Google Scholar] [CrossRef]
  35. Kulshreshtha, S.; Vijayalakshmi, A. An ARIMA-LSTM hybrid model for stock market prediction using live data. J. Eng. Sci. Technol. Rev. 2020, 13, 117–123. [Google Scholar] [CrossRef]
  36. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
  37. Maxim, S.; Adriaan, B.; Shcherbakova, N.L.; Anton, T.; Janovsky, T.A.; Kamaev, V.A. A survey of forecast error measures. World Appl. Sci. J. 2013, 24, 171–176. [Google Scholar]
  38. Cheng, C.; Shalabh Garg, G. Coefficient of determination for multiple measurement error models. J. Multivar. Anal. 2014, 126, 137–152. [Google Scholar] [CrossRef]
  39. Goel, H.; Melnyk, I.; Banerjee, A. R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting. arXiv 2017, arXiv:1709.03159. [Google Scholar]
  40. Chatfield, C. Model uncertainty and forecast accuracy. J. Forecast. 1996, 15, 495–508. [Google Scholar] [CrossRef]
  41. Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions. arXiv 2019, arXiv:1909.00590. [Google Scholar] [CrossRef]
  42. Kim, T.; Leybourne, S.J.; Newbold, P. Asymptotic mean-squared forecast error when an autoregression with linear trend is fitted to data generated by an I(0) or I(1) process. J. Time Ser. Anal. 2004, 25, 583–602. [Google Scholar] [CrossRef]
Figure 1. Research process.
Figure 1. Research process.
Algorithms 18 00265 g001
Figure 2. Forecasted new cancer cases across various regions in Canada (the yellow-shaded area shows the predicted values).
Figure 2. Forecasted new cancer cases across various regions in Canada (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g002
Figure 3. Forecasted new cancer cases across various age groups in Canada (the yellow-shaded area shows the predicted values).
Figure 3. Forecasted new cancer cases across various age groups in Canada (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g003
Figure 4. Forecasted new cancer cases in Canada by sexes (the yellow-shaded area shows the predicted values).
Figure 4. Forecasted new cancer cases in Canada by sexes (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g004
Figure 5. Forecasted cancer incidence rate per 100,000 people across various regions in Canada (the yellow-shaded area shows the predicted values).
Figure 5. Forecasted cancer incidence rate per 100,000 people across various regions in Canada (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g005
Figure 6. Forecasted cancer incidence rate per 100,000 people across various age groups in Canada (the yellow-shaded area shows the predicted values).
Figure 6. Forecasted cancer incidence rate per 100,000 people across various age groups in Canada (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g006
Figure 7. Forecasted cancer incidence rate per 100,000 people in Canada by sex (the yellow-shaded area shows the predicted values).
Figure 7. Forecasted cancer incidence rate per 100,000 people in Canada by sex (the yellow-shaded area shows the predicted values).
Algorithms 18 00265 g007
Table 1. Error rates of geography regions for new cancer cases and cancer incidence rate.
Table 1. Error rates of geography regions for new cancer cases and cancer incidence rate.
GeographyCategoryModelMSEMAERMSEMAPER2
AlbertaNew Cancer CasesARIMA0.00280.03770.05280.04440.6795
LSTM0.00280.03790.05270.04520.6905
Hybrid0.00150.03440.03850.04020.8296
Prophet0.00320.05090.05640.05960.3438
Cancer Incidence RateARIMA0.01020.06290.10120.07860.0832
LSTM0.00810.06710.09070.08200.0898
Hybrid0.00130.02930.03570.03150.8651
Prophet0.00510.05680.07140.06560.3887
British ColumbiaNew Cancer CasesARIMA0.00320.03830.05670.04680.6612
LSTM0.00280.04160.05250.04910.7100
Hybrid0.00180.03170.04190.03820.8151
Prophet0.00380.05440.06140.06650.3208
Cancer Incidence RateARIMA0.01120.08160.09610.09350.0659
LSTM0.00910.07690.09540.09340.1547
Hybrid0.00640.06280.07990.07660.3516
Prophet0.00710.05950.08450.07650.0301
ManitobaNew Cancer CasesARIMA0.00920.07620.09600.08440.4569
LSTM0.00470.05470.06870.06770.6866
Hybrid0.00040.01690.02050.01990.9752
Prophet0.00610.05830.07790.06610.5133
Cancer Incidence RateARIMA0.01480.09100.12180.12020.1289
LSTM0.00540.05640.07320.05980.2090
Hybrid0.00150.02400.03900.02790.8844
Prophet0.00890.08770.11740.0922−1.7234
New BrunswickNew Cancer CasesARIMA0.00590.07050.07680.08350.3275
LSTM0.00380.04930.06190.06130.5627
Hybrid0.00180.02580.04270.03430.7917
Prophet0.00480.05940.06940.07260.4504
Cancer Incidence RateARIMA0.00600.06050.07760.06570.4920
LSTM0.00230.03500.04800.04330.5112
Hybrid0.00110.02520.03250.02810.7727
Prophet0.00680.07410.08230.0851−1.1331
Newfoundland and LabradorNew Cancer CasesARIMA0.00620.06340.07900.07480.2867
LSTM0.01550.07780.12440.09970.6262
Hybrid0.00030.01410.01690.01540.9763
Prophet0.01220.08160.11050.0895−2.8241
Cancer Incidence RateARIMA0.00600.06580.07760.07180.3345
LSTM0.01190.06920.10930.08440.7782
Hybrid0.00070.02110.02740.02220.9157
Prophet0.01090.07660.10460.0819−0.3781
Northwest TerritoriesNew Cancer CasesARIMA0.01940.13430.13940.16760.1355
LSTM0.01280.09080.11320.11700.6193
Hybrid0.00520.06010.07210.08420.8061
Prophet0.02520.13970.15890.19230.1366
Cancer Incidence RateARIMA0.02000.09950.14140.12930.07207
LSTM0.01640.08090.12790.11130.2239
Hybrid0.00760.06720.08690.09220.5949
Prophet0.01550.11010.12450.13750.3650
Nova ScotiaNew Cancer CasesARIMA0.00620.06340.07900.07480.2867
LSTM0.00040.01630.02060.01190.7926
Hybrid0.00020.01180.01390.01250.8699
Prophet0.00910.06940.09510.07670.4746
Cancer Incidence RateARIMA0.00540.06610.07370.07800.6996
LSTM0.00870.08660.09390.09600.3972
Hybrid0.00030.01450.01650.01640.9188
Prophet0.00580.06410.09900.0735−1.7068
NunavutNew Cancer CasesARIMA0.00380.03790.06170.08260.3726
LSTM0.00300.04310.05490.07440.9006
Hybrid0.00220.02890.04690.05570.9356
Prophet0.03090.13480.17570.20210.2435
Cancer Incidence RateARIMA0.00100.02550.02600.05660.4159
LSTM0.00410.03730.06400.08370.7505
Hybrid0.00090.02330.02980.04220.9471
Prophet0.02310.11170.15210.10670.1507
OntarioNew Cancer CasesARIMA0.00810.06140.09010.06930.3200
LSTM0.00430.04960.06580.05630.2001
Hybrid0.00190.02830.04360.03290.6904
Prophet0.00350.05170.05890.05700.1763
Cancer Incidence RateARIMA0.01910.09100.10830.09360.55701
LSTM0.01070.06310.10350.08400.4479
Hybrid0.00210.03440.04590.03880.6580
Prophet0.01280.10110.12120.1162−0.5615
Prince Edward IslandNew Cancer CasesARIMA0.01100.07760.10470.08790.3742
LSTM0.01870.06930.13680.07840.1700
Hybrid0.00360.04510.06040.06320.8212
Prophet0.02270.12290.15070.10650.2397
Cancer Incidence RateARIMA0.00590.05880.07670.06130.1394
LSTM0.00390.06230.06260.06540.1945
Hybrid0.00350.04690.05900.05320.2424
Prophet0.01810.10590.13470.1042−2.7544
QuebecNew Cancer CasesARIMA0.00840.05110.09190.05580.1893
LSTM0.00200.03980.04440.04400.5558
Hybrid0.00060.02420.02500.02600.5084
Prophet0.00520.04880.07190.0522−0.6552
Cancer Incidence RateARIMA0.00070.02500.02670.02710.2899
LSTM0.00120.03530.04080.09960.6675
Hybrid0.00050.01610.02310.02070.8603
Prophet0.01120.09180.10610.1077−3.3572
SaskatchewanNew Cancer CasesARIMA0.01150.07410.10750.09580.4375
LSTM0.01110.08650.10520.10050.1892
Hybrid0.00480.04860.06910.06070.3505
Prophet0.00860.06880.09290.08450.2633
Cancer Incidence RateARIMA0.01790.10950.13400.16520.1713
LSTM0.02080.12490.14420.18690.3781
Hybrid0.01220.07160.11030.09650.2076
Prophet0.01430.08480.11940.1119−0.5023
YukonNew Cancer CasesARIMA0.01060.08620.10300.10740.5719
LSTM0.01600.11350.11450.09110.8823
Hybrid0.00260.03440.05130.04490.8793
Prophet0.01370.10210.11690.12930.3063
Cancer Incidence RateARIMA0.02050.10550.14320.12830.3311
LSTM0.00440.03610.06640.04830.8559
Hybrid0.00160.03110.03950.04510.9427
Prophet0.02630.12440.16210.1606−1.1203
Table 2. Error rates of age group categories for new cancer cases and cancer incidence rate.
Table 2. Error rates of age group categories for new cancer cases and cancer incidence rate.
Age GroupCategoryModelMSEMAERMSEMAPER2
0 to 04 yearsNew Cancer CasesARIMA0.01860.10790.13620.17650.1522
LSTM0.01980.12490.14070.23380.4500
Hybrid0.01780.12670.13330.23490.1454
Prophet0.11870.29560.34450.6277−6.2204
Cancer Incidence RateARIMA0.00190.03430.04360.09560.2485
LSTM0.00800.06890.08960.16960.1359
Hybrid0.00030.01240.01690.02700.1520
Prophet0.09610.28220.31000.2458−1.9227
05 to 09 yearsNew Cancer CasesARIMA0.04450.19010.21080.41170.1155
LSTM0.05540.20000.23530.40750.0097
Hybrid0.02370.14420.15410.31360.3751
Prophet0.10570.26560.32510.4799−2.0613
Cancer Incidence RateARIMA0.00200.04170.04530.07120.2856
LSTM0.00180.03100.04200.04970.3977
Hybrid0.00100.02800.03190.03440.1493
Prophet0.01480.08590.12150.1345−1.4655
10 to 14 yearsNew Cancer CasesARIMA0.01120.09290.10560.16790.0206
LSTM0.01400.10610.11830.18460.0688
Hybrid0.01010.08450.10040.15040.0770
Prophet0.02730.1430.16520.1652−0.9431
Cancer Incidence RateARIMA0.00580.05720.07610.12400.3283
LSTM0.00790.06390.08900.11690.2946
Hybrid0.00290.04560.05370.06320.6257
Prophet0.03170.16190.17800.2154−1.0946
15 to 19 yearsNew Cancer CasesARIMA0.02570.14590.15710.19690.0586
LSTM0.02700.13650.16420.18460.1824
Hybrid0.02560.13410.16010.17920.2258
Prophet0.07690.22430.27740.3919−0.6949
Cancer Incidence RateARIMA0.00190.04280.04430.05360.2529
LSTM0.02150.11330.14670.15210.2127
Hybrid0.00120.02640.03480.02790.3815
Prophet0.02930.14620.17110.1793−0.5151
20 to 24 yearsNew Cancer CasesARIMA0.00820.05840.09090.06820.2217
LSTM0.01350.10710.11600.13100.5725
Hybrid0.00180.03610.04200.04360.8290
Prophet0.02560.13090.16010.1498−2.2898
Cancer Incidence RateARIMA0.00450.05170.06750.07040.2005
LSTM0.01470.11440.12140.14270.4443
Hybrid0.00210.03520.04530.03920.4128
Prophet0.03190.15950.17840.2025−1.7085
25 to 29 yearsNew Cancer CasesARIMA0.00910.09310.09990.10451.5091
LSTM0.00450.05010.06720.06600.5181
Hybrid0.00390.04650.06250.06200.6534
Prophet0.02070.12930.14400.1597−1.8633
Cancer Incidence RateARIMA0.00170.04070.04130.04470.6333
LSTM0.01040.07150.10200.08660.3196
Hybrid0.00020.01300.01430.01540.5169
Prophet0.01030.07760.10160.0942−0.162
30 to 34 yearsNew Cancer CasesARIMA0.02120.12110.14570.15360.2017
LSTM0.01340.09660.11590.11410.1338
Hybrid0.00790.07210.08890.08490.1797
Prophet0.32140.27380.32140.3220−1.5855
Cancer Incidence RateARIMA0.00150.03670.03960.04470.2534
LSTM0.01640.10750.12820.12440.4915
Hybrid0.00040.01700.02050.01980.4139
Prophet0.01850.11280.13600.1310−0.6678
35 to 39 yearsNew Cancer CasesARIMA0.00400.05040.06350.06980.8737
LSTM0.01150.09840.10700.13780.5311
Hybrid0.00390.04840.06270.06790.8771
Prophet0.00520.06020.07240.08650.7172
Cancer Incidence RateARIMA0.00290.04380.05430.04940.5215
LSTM0.01140.10090.10650.11310.0491
Hybrid0.00140.02820.03680.03100.2678
Prophet0.02540.13070.15940.1395−0.3627
40 to 44 yearsNew Cancer CasesARIMA0.00770.10230.13290.13900.6226
LSTM0.00360.05740.05980.06850.4477
Hybrid0.00150.03310.03910.03710.5223
Prophet0.02000.12690.14160.16410.1681
Cancer Incidence RateARIMA0.00240.04660.04990.05400.1890
LSTM0.00980.08930.09900.09810.1507
Hybrid0.00010.00720.01050.00740.2190
Prophet0.01120.08550.10580.1006−1.7236
45 to 49 yearsNew Cancer CasesARIMA0.00700.05920.08350.09890.8396
LSTM0.00350.05450.05900.09000.2453
Hybrid0.00200.03700.04500.06130.7002
Prophet0.05540.19190.23550.1978−0.3463
Cancer Incidence RateARIMA0.00670.05660.08230.13020.2436
LSTM0.01220.08050.11060.17370.0910
Hybrid0.00010.00840.01000.01740.9899
Prophet0.01350.09980.12860.2087−1.6021
50 to 54 yearsNew Cancer CasesARIMA0.00530.06600.07270.09030.3711
LSTM0.00550.04910.07420.07360.2727
Hybrid0.00200.03280.04480.04440.8250
Prophet0.11130.30370.33360.3778−0.7868
Cancer Incidence RateARIMA0.00020.01370.01440.03550.4636
LSTM0.00740.05780.08590.13510.2222
Hybrid0.00010.00340.00390.00980.9734
Prophet0.03200.16130.17900.1922−3.2580
55 to 59 yearsNew Cancer CasesARIMA0.00530.05420.07260.05740.6032
LSTM0.00560.05040.07500.05781.2721
Hybrid0.00230.03680.04750.04010.1101
Prophet0.02300.13420.15180.1448−1.794
Cancer Incidence RateARIMA0.00680.05420.08290.03410.4939
LSTM0.00430.04830.06550.08970.6194
Hybrid0.00020.01120.01500.02920.7621
Prophet0.01220.09260.11060.1004−0.8536
60 to 64 yearsNew Cancer CasesARIMA0.00440.05950.06650.06420.7781
LSTM0.00340.04630.05860.05100.2373
Hybrid0.00230.02310.04810.02620.5316
Prophet0.00260.04060.05130.04620.6347
Cancer Incidence RateARIMA0.00480.06390.07390.04020.4163
LSTM0.00930.08610.09630.04260.5478
Hybrid0.00020.01420.01500.01820.5654
Prophet0.01190.09870.11030.07430.2763
65 to 69 yearsNew Cancer CasesARIMA0.00830.06620.09140.07720.3804
LSTM0.00710.06430.08400.07860.2028
Hybrid0.00300.04730.05460.05460.6826
Prophet0.04340.19770.20830.1373−7.8478
Cancer Incidence RateARIMA0.00490.07010.08670.04230.7709
LSTM0.00080.02260.02780.03130.3561
Hybrid0.00010.00700.00830.02350.6739
Prophet0.00690.06120.08450.06790.5764
70 to 74 yearsNew Cancer CasesARIMA0.00800.08060.08920.09310.2429
LSTM0.00390.05200.06270.06340.8156
Hybrid0.00310.04590.05590.06200.8845
Prophet0.05790.23020.24090.1138−2.4798
Cancer Incidence RateARIMA0.00010.01010.01160.03270.4240
LSTM0.00450.04980.06720.1582−0.4997
Hybrid0.00010.00860.01170.02750.2875
Prophet0.00560.07250.09350.2287−1.1505
75 to 79 yearsNew Cancer CasesARIMA0.00350.05240.05920.06690.7772
LSTM0.00610.07570.07780.09640.5426
Hybrid0.00330.05120.05720.05840.5831
Prophet0.00540.06330.07320.09030.4110
Cancer Incidence RateARIMA0.00180.02180.04220.05630.0188
LSTM0.00230.03260.04790.08350.3755
Hybrid0.00100.02050.03160.04650.4501
Prophet0.01250.08350.112010.08430.5386
80 to 84 yearsNew Cancer CasesARIMA0.00230.04420.04760.04620.9672
LSTM0.00240.04070.04850.04280.7745
Hybrid0.00070.01960.02610.02100.8385
Prophet0.02670.05000.06340.05750.1867
Cancer Incidence RateARIMA0.00130.03580.03630.06230.2378
LSTM0.00420.06200.06500.10960.5701
Hybrid0.00020.01260.01540.01300.6267
Prophet0.00760.06940.08690.11630.5425
85 to 89 yearsNew Cancer CasesARIMA0.00400.05240.06290.05380.2693
LSTM0.00230.03980.04800.04250.1433
Hybrid0.00190.03550.04320.03760.1592
Prophet0.00560.05680.07450.0601−1.6369
Cancer Incidence RateARIMA0.00930.09500.09630.20530.0619
LSTM0.01050.08840.10270.21350.2379
Hybrid0.00140.02860.03750.06500.2018
Prophet0.01920.11390.13860.12368.1164
90 years and overNew Cancer CasesARIMA0.00990.06710.09950.07374.7627
LSTM0.00940.07990.09690.08941.5906
Hybrid0.00320.04790.05640.05360.4497
Prophet0.01820.12570.13500.1387−2.9161
Cancer Incidence RateARIMA0.00110.02580.03380.05600.5235
LSTM0.00370.04500.06070.09560.4617
Hybrid0.00030.01550.01840.03270.6609
Prophet0.00760.07140.08740.0755−0.6405
Table 3. Error rates of sex-based categories for new cancer cases and cancer incidence rate.
Table 3. Error rates of sex-based categories for new cancer cases and cancer incidence rate.
SexCategoryModelMSEMAERMSEMAPER2
MalesNew Cancer CasesARIMA0.00610.05060.07810.05520.0908
LSTM0.00230.02190.04810.02440.5148
Hybrid0.00290.03640.05370.03990.3304
Prophet0.00310.04330.05570.04780.4621
Cancer Incidence RateARIMA0.00690.07280.08330.07720.1768
LSTM0.00220.04150.04650.04540.0470
Hybrid0.00360.04910.06000.05300.0662
Prophet0.00930.08270.09670.08276−1.8708
FemalesNew Cancer CasesARIMA0.00550.04840.07400.05290.4091
LSTM0.00190.02480.04360.02880.3378
Hybrid0.00190.02590.04370.02920.3402
Prophet0.00210.03380.04630.03770.5566
Cancer Incidence RateARIMA0.00580.07140.07610.07140.1504
LSTM0.00170.03240.04130.03290.2201
Hybrid0.00450.06060.06720.06260.8291
Prophet0.00650.05820.08050.0587−0.1023
Both SexesNew Cancer CasesARIMA0.00540.04450.07410.04890.5487
LSTM0.00210.02260.04540.02580.4082
Hybrid0.00240.02970.04930.03300.3410
Prophet0.00320.04780.05620.05320.3917
Cancer Incidence RateARIMA0.00360.05200.06030.05320.0001
LSTM0.00130.02920.03660.03010.0165
Hybrid0.00380.05620.06180.05920.3813
Prophet0.00750.06790.08670.0703−0.6117
Table 4. Predicted number of new cancer cases and cancer incidence rates by regions in Canada.
Table 4. Predicted number of new cancer cases and cancer incidence rates by regions in Canada.
Geography/Predicted Years20222023202420252026
AlbertaNew Cancer Cases23,90724,38724,70224,97525,927
Cancer Incidence Rate443.8462.2461.7433.7458.9
British ColumbiaNew Cancer Cases31,80132,43832,72932,72833,456
Cancer Incidence Rate531.3539.5540.6537.9545.7
ManitobaNew Cancer Cases81818300837484188530
Cancer Incidence Rate505.3516.0517.2513.1509.3
New BrunswickNew Cancer Cases59595999603360766187
Cancer Incidence Rate656.8657.3648.5647.9662.2
Newfoundland and LabradorNew Cancer Cases41714116406239764012
Cancer Incidence Rate693.4690.7678.8650.3596.1
Northwest TerritoriesNew Cancer Cases241237244240263
Cancer Incidence Rate389.0400.4396.4375.7341.8
Nova ScotiaNew Cancer Cases74677434744575117562
Cancer Incidence Rate693.5674.2652.9652.6653.8
NunavutNew Cancer Cases8479777981
Cancer Incidence Rate200.1196.7188.3179.4173.0
OntarioNew Cancer Cases94,67794,76394,97595,39897,951
Cancer Incidence Rate582.2573.6555.0545.0543.7
Prince Edward IslandNew Cancer Cases11021111115711611181
Cancer Incidence Rate620.0620.8626.4632.8636.9
QuebecNew Cancer Cases63,80363,84163,93264,07364,264
Cancer Incidence Rate675.7668.7669.5671.8670.4
SaskatchewanNew Cancer Cases65856636666666246716
Cancer Incidence Rate491.8495.5496.5494.9493.7
YukonNew Cancer Cases199198188183188
Cancer Incidence Rate445.4464.5445.8393.0333.6
Table 5. Predicted number of new cancer cases and cancer incidence rate by age groups in Canada.
Table 5. Predicted number of new cancer cases and cancer incidence rate by age groups in Canada.
Age Group/Predicted Years20222023202420252026
0 to 4 yearsNew Cancer Cases453454452450450
Cancer Incidence Rate23.223.423.423.423.4
5 to 9 yearsNew Cancer Cases280270265264266
Cancer Incidence Rate13.813.913.813.813.8
10 to 14 yearsNew Cancer Cases286290291286290
Cancer Incidence Rate13.213.513.313.413.5
15 to 19 yearsNew Cancer Cases534531535537530
Cancer Incidence Rate24.424.124.124.124.2
20 to 24 yearsNew Cancer Cases937929936930935
Cancer Incidence Rate36.435.735.835.935.9
25 to 29 yearsNew Cancer Cases16051625163316651715
Cancer Incidence Rate59.660.160.460.560.3
30 to 34 yearsNew Cancer Cases25142526253625452555
Cancer Incidence Rate91.490.990.791.191.5
35 to 39 yearsNew Cancer Cases39604040407340954179
Cancer Incidence Rate137.6136.9135.8135.5136.0
40 to 44 yearsNew Cancer Cases56605678570557325808
Cancer Incidence Rate210.2210.4210.1209.5209.3
45 to 49 yearsNew Cancer Cases85958489837982878234
Cancer Incidence Rate315.5314.2315.8316.5315.9
50 to 54 yearsNew Cancer Cases15,32514,93014,50914,01813,488
Cancer Incidence Rate509.6510.0510.6510.9510.9
55 to 59 yearsNew Cancer Cases22,93122,84522,74022,57622,228
Cancer Incidence Rate775.0775.4774.7774.8774.6
60 to 64 yearsNew Cancer Cases32,18632,41932,48132,64133,517
Cancer Incidence Rate1140.01144.11142.81139.61140.5
65 to 69 yearsNew Cancer Cases38,61739,03239,39739,96541,398
Cancer Incidence Rate1604.81608.21604.81603.01603.0
70 to 74 yearsNew Cancer Cases39,76440,58441,43142,14543,771
Cancer Incidence Rate2040.02037.92037.42033.92031.8
75 to 79 yearsNew Cancer Cases31,02431,43131,89032,30333,313
Cancer Incidence Rate2362.22369.22372.22372.22368.4
80 to 84 yearsNew Cancer Cases22,49422,57822,68222,78722,883
Cancer Incidence Rate2590.62591.12590.82588.92586.9
85 to 89 yearsNew Cancer Cases14,75615,09415,18115,34215,602
Cancer Incidence Rate2667.22671.32671.52672.32674.6
90 years and aboveNew Cancer Cases81858311843285508685
Cancer Incidence Rate2446.32449.32444.82437.52440.8
Table 6. Predicted number of new cancer cases and cancer incidence rate by sex in Canada.
Table 6. Predicted number of new cancer cases and cancer incidence rate by sex in Canada.
Sex/Predicted Years20222023202420252026
MalesNew Cancer Cases129,012129,515130,009131,690133,088
Cancer Incidence Rate600.6598.9599.7600.8602.3
FemalesNew Cancer Cases119,514120,428121,119121,614123,904
Cancer Incidence Rate540.1539.2537.3534.4530.6
Both SexesNew Cancer Cases244,007244,941245,242245,051248,705
Cancer Incidence Rate567.6566.3565.3563.8561.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kaviani, E.; Passi, K. Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms 2025, 18, 265. https://doi.org/10.3390/a18050265

AMA Style

Kaviani E, Passi K. Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms. 2025; 18(5):265. https://doi.org/10.3390/a18050265

Chicago/Turabian Style

Kaviani, Ehsan, and Kalpdrum Passi. 2025. "Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques" Algorithms 18, no. 5: 265. https://doi.org/10.3390/a18050265

APA Style

Kaviani, E., & Passi, K. (2025). Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms, 18(5), 265. https://doi.org/10.3390/a18050265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop