Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques

Kaviani, Ehsan; Passi, Kalpdrum

doi:10.3390/a18050265

Open AccessArticle

Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques

by

Ehsan Kaviani

and

Kalpdrum Passi

^*

School of Engineering and Computer Science, Laurentian University, Sudbury, ON P3E 2C6, Canada

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(5), 265; https://doi.org/10.3390/a18050265

Submission received: 11 February 2025 / Revised: 5 April 2025 / Accepted: 30 April 2025 / Published: 4 May 2025

(This article belongs to the Special Issue Algorithms and Applications of Machine Learning Techniques for Healthcare)

Download

Browse Figures

Versions Notes

Abstract

This study analyzes cancer trends in Canada using machine learning techniques to extract insights from extensive cancer data sourced from the Canadian Cancer Society and Statistics Canada. It aims to enhance the understanding of cancer epidemiology and inform better prevention, diagnosis, and treatment strategies. Data preprocessing addressed issues like missing values and normalization, ensuring reliability. The findings indicate a steady increase in new cancer cases, with estimates reaching 248,700 in 2026, up from 244,000 in 2022. Male incidence rates are projected to rise slightly to 602.3 per 100,000, while female rates may decline to 530.6. Regions such as Alberta, British Columbia, Ontario, and Quebec show rising incidence rates, contrasted by declines in Newfoundland and Labrador, Nunavut, and Yukon. Notably, this research reveals significant increases in cancer cases among individuals aged 60 and older, particularly those 70+. The hybrid ARIMA-LSTM model demonstrated superior forecasting accuracy compared with the other selected models. These findings offer valuable insights for health policymakers and highlight the potential of machine learning in public health forecasting, providing a framework for future research in other disease areas.

Keywords:

machine learning; cancer incidence; data analysis; forecasting; Canada

1. Introduction

Cancer is a multifaceted and devastating disease with a significant impact on individuals, communities, and economies. Its complex nature is due to genetic, environmental, and behavioral factors [1]. Despite substantial advances in research, cancer remains a global challenge, hampered by economic, political, and legislative barriers [2]. Advances in technology, such as cancer informatics and molecular genetics, have paved the way for new tools and methods for prevention, diagnosis, and treatment. However, this development’s ethical and social implications and the need for comprehensive global strategies to address the growing cancer burden require careful consideration [3].

Machine learning techniques have revolutionized computing by enabling the extraction of patterns and knowledge from large and complex datasets [4]. These techniques, which include supervised, unsupervised, semi-supervised, and reinforcement learning, are particularly effective in solving big data problems [5]. This field has made significant progress, especially in deep learning, which could analyze and learn from vast amounts of real data [6]. Therefore, ML is widely used in various industries to obtain relevant information for analysis [7].

Advanced technologies such as mitochondrial, epigenomic, and metabolic profiling offer significant potential in cancer epidemiology, especially in identifying at-risk populations and treatment responses [8]. However, using electronic health record data in oncology presents challenges, including missing or unstructured data elements [9]. AI techniques, including machine learning and deep learning, have shown promise in predicting and detecting cancer, sometimes better than doctors [10]. The use of big data in cancer treatment is promising. Still, it is hindered by incomplete and fragmented data, which can be solved by integrating health systems and using AI [11].

Recent studies have highlighted the potential of ML in cancer prediction and survival research [12,13]. These techniques transform healthcare by providing insights into patient care, operational efficiency, and cost reduction [14]. The application of ML in cancer research and care is particularly promising, with the potential to construct real-world data cohorts and improve predictive modeling [15]. However, concerns over patient data privacy and security, algorithmic bias, and needing trained individuals to interpret results remain important considerations [14].

In Canada, a country that is struggling with the ever-changing landscape of cancer diagnoses, it is critical to use state-of-the-art techniques to analyze incidence patterns. Canada’s diverse population, regional differences, and changing medical practices provide an exceptional environment for a thorough cancer data analysis. For delivering insightful information, proactive healthcare management, and policy creation, this research aims to utilize ML skills to identify current trends and predict future ones.

This study’s backdrop essentially stems from the realization that cancer is a complex, multifaceted phenomenon with implications for society and the economy in addition to being a medical problem. Despite significant advancements in oncology, the dynamic nature of cancer trends and disparities across populations demands more adaptive and predictive analytical tools. Given these challenges, leveraging machine learning—a powerful tool for analyzing large-scale, complex datasets—offers promising new avenues to more accurately understand and forecast cancer trends. By integrating machine learning’s computational capabilities with the intricacies of cancer data, this research aims to uncover new insights that may lead to tailored treatment plans, more efficient interventions, and, ultimately, a significant decrease in cancer incidence in Canada and other countries. Additionally, the research emphasizes the need for collaborative efforts across disciplines, bringing together oncology, data science, and public health experts to address the intricate challenges cancer poses on a global scale.

2. Related Works

The Canadian population and healthcare systems are significantly affected by cancer. It is the leading cause of death in Canada, as stated by multiple sources [16,17,18,19]. Studies have estimated that 43% of all Canadians are expecting to receive a cancer diagnosis in their lifetime [16,20]. As the population increases and ages, the number of new cancer cases and deaths in Canada is also growing [16,21]. Additionally, cancer is a costly disease, with the economic burden of cancer care in Canada rising from CAD 2.9 billion in 2005 to CAD 7.5 billion in 2012 annually, as reported by various sources [16,22].

Because cancer significantly impacts Canadian health and the economy, accurate and comprehensive surveillance data are critical for determining progress and allocating resources accordingly. The Canadian Advisory Committee on Cancer Statistics collaborates with the Canadian Cancer Society, Statistics Canada, and the Public Health Agency of Canada to produce the latest statistics on cancer surveillance in Canada [16].

Cancer data can take several years to catch up to the present due to the lengthy process of collecting, verifying, and analyzing the information. However, statistical models can project short-term incidence and mortality rates by extrapolating past trends. This provides a more current understanding of the cancer landscape in Canada, which is crucial for resource planning, research, and informing cancer control programs. Canadian Cancer Statistics 2021 offers detailed estimates of cancer incidence, mortality, and survival in Canada for 22 cancer types, broken down by age, sex, geographic region, and over time. Brenner et al. [16,17,18] also provided updated estimates for 2020, 2022, and 2024 for new cancer cases and deaths expected for all ages, broken down by sex, province, and territory.

To acquire the latest cancer incidence and mortality estimates, the study utilized the CANPROJ projection R package to project counts and rates until 2024. CANPROJ employs trends in factual data to determine the most suitable model for forthcoming years through a decision algorithm composed of a range of age-based, period-based, and cohort-based models [16,17,18,23].

According to Brenner [16], cancer case data from Quebec, starting from 2011, were unavailable as they had not yet been submitted to the Canadian Cancer Registry. Because only data up to 2010 were accessible for Quebec, estimates for cancer cases and incidence rates from 2011 to 2022 were generated by initially applying the cancer rates of Canada (excluding Quebec) to Quebec’s population. Based on the average rate for the rest of the country, adjusted Quebec rates were then modified using the ratio of sex- and age-specific cancer estimates for Quebec relative to the rest of Canada from 2006 to 2010. Additional correction factors were incorporated, considering provisional 2011 counts for cancers that are typically underreported, such as prostate and melanoma. These projections provided estimates for 22 cancer types, categorized by sex assigned at birth and geographic region (provinces and territories). The national estimates for Canada were derived by summing the individual projections for each province and territory. All incidence and mortality rates were age-standardized to the 2011 Canadian standard population using the direct method.

Brenner’s study [16] utilized historical cancer data from the National Cancer Reporting System (1984–1991) and the Canadian Cancer Registry (1992–2018), as well as mortality data from the Vital Statistics Canada Death Database (1984–2019). It applied the CANPROJ projection package, which uses trends in historical data to select the most appropriate model to predict future cancer incidence and mortality. This approach involved a decision algorithm from a series of six age-period-cohort (APC) models aimed to provide the most accurate projections up to 2022. For the province of Quebec, particular adjustments were made for incidence estimates due to missing data after 2010.

While Brenner’s studies do not specify numerical values for the accuracy of its projections, the methodology implies a reliance on historical trends and demographic data to make informed predictions. The projections estimated that there were an estimated 225,800 new cancer cases and 83,300 cancer deaths in 2020. This increased to 233,900 new cases and 85,100 deaths in 2022. Projections for 2024 indicate a further rise, with 247,100 new cancer cases and 88,100 cancer deaths expected [16,17,18].

While Brenner’s studies offered valuable cancer projections using the CANPROJ statistical framework, their approach primarily relies on historical trend extrapolation using age-period-cohort models. These models assume a consistent pattern in past data and select the best-fit trend-based model for projection. However, they do not incorporate modern performance evaluation metrics (e.g., MAE, RMSE, R²) and are limited in handling incomplete, noisy, or complex datasets.

In contrast, our study applies machine learning models—such as LSTM, ARIMA, Prophet, and a hybrid ARIMA-LSTM—which not only provide greater flexibility in modeling both linear and nonlinear relationships but also allow for rigorous evaluation through quantitative performance metrics. ML models are also better suited for large-scale healthcare data where variability and missing values are common. This methodological distinction enables our research to offer more accurate and adaptable forecasts for cancer incidence in Canada.

3. Materials and Methods

3.1. Research Design

This research employs a combination of quantitative methodologies to analyze cancer incidence trends in Canada, utilizing advanced machine learning models. By applying LSTM, Prophet, ARIMA, and hybrid ARIMA-LSTM models, the research aims to predict future cancer incidence rates and the number of new cases. This comprehensive approach allows for a deep understanding of cancer trends, contributing to better healthcare planning and policymaking. The study employs a longitudinal design, analyzing time series data of cancer incidence rates in Canada over multiple years. This approach facilitates the identification of trends and patterns in cancer rates, allowing for accurate predictions. Using multiple predictive models ensures the robustness and reliability of the forecasts, catering to the nonlinear and complex nature of the data. The study’s predictive nature addresses a significant gap in current knowledge, offering insights into potential future cancer rates and enabling proactive measures in public health strategies. Figure 1 shows the research process.

The dataset was divided into training (80%) and testing (20%) subsets to ensure robust model evaluation. This partitioning strategy ensures that the models are trained on historical cancer incidence data while reserving a separate portion for performance evaluation.

The models were trained using historical data (1992–2016) and validated with a test set (2017–2021). Once validated and finalized, each model (ARIMA, LSTM, Prophet, hybrid ARIMA-LSTM) was then used to forecast the number of new cancer cases and incidence rates sequentially for future years (2022 to 2026).

The study implemented a validation procedure for forecasting models, including time series cross-validation. Due to the sequential nature of data, traditional random cross-validation methods are not used, and 10-fold cross-validation must be used to verify the time series. This process is repeated by moving the cutoff point through the time series to ensure that each model’s performance is tested in different periods and conditions.

Evaluation metrics such as mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R-squared (R²) were calculated on the test set to assess prediction accuracy.

3.2. Data Source

This study obtained data on cancer incidence from the Canadian Cancer Registry Tabulation Master File, which Statistics Canada released on 31 January 2024. The table includes cancer cases diagnosed from 1992 to 2021 and is available by cancer type, region, age group, and sex.

The Canadian Cancer Registry (CCR) is a population-based registry with data collected and reported to Statistics Canada by each provincial/territorial cancer registry. This person-based system aims to collect information about each new primary cancer diagnosed among Canadian residents since 1992 [24]. Cancer incidence refers to the number of new cancers in a population during a specific period (usually a year). It is generally expressed as the number of new cancer cases per 100,000 population. Data presented in [25] were age-adjusted using the 2011 Canadian standard population to ensure accuracy and consistency [24].

3.3. Data Preprocessing

Prior to model training and evaluation, several preprocessing steps were applied to ensure data quality and model compatibility:

3.3.1. Handling Missing Data

Quebec’s cancer incidence data was unavailable after 2017 due to incomplete sub-missions to the Canadian Cancer Registry. To address this gap, Quebec’s missing incidence data from 2018 to 2022 were imputed using a multiple imputation strategy based on available historical trends and region-specific cancer patterns. Other provinces with minor missing values, less than 3% of the data, were handled using multiple imputations again.

3.3.2. Normalization

To ensure consistency across features and models, all incidence rate and case count values were normalized using min-max scaling to a range between 0 and 1. This was especially important for neural network models (e.g., LSTM), which are sensitive to the scale of input features. Normalization helped stabilize the training process and improve model convergence.

3.3.3. Temporal Indexing

Dates were converted into a standardized time index to ensure uniform time series input formats across all models, especially for ARIMA and Prophet, which require a consistent datetime structure.

3.3.4. Stationarity Checks

For models like ARIMA and ARIMA-LSTM, stationarity was assessed using the Augmented Dickey–Fuller test. Non-stationary series were differenced as required to satisfy model assumptions.

These preprocessing steps were essential for maintaining data integrity and enabling fair comparison across diverse model architectures.

3.4. Model Selection

To investigate the trends in cancer incidence in this study, four popular machine learning algorithms and statistical techniques for time series prediction were selected based on their proven track record in time series forecasting: Prophet, long short-term memory (LSTM) networks, autoregressive integrated moving average (ARIMA), and the hybrid ARIMA-LSTM model.

3.4.1. Long Short-Term Memory (LSTM)

Introduced by Hochreiter and Schmidhuber in 1997 [26], LSTM networks are a variant of RNNs that overcome the shortcomings of standard RNNs. These disadvantages include poor performance in handling long-term dependencies and the vanishing or exploding gradient problem. In 1999, a forgotten gate was added to LSTM to restore cell memory, improve the original structure, and become the standard structure for LSTM networks. Unlike deep feedforward neural networks (DFNN), LSTMs contain feedback connections and can process data sequences, not just individual data points such as vectors or arrays [27].

In LSTM networks, the fundamental building block is called a memory block or LSTM unit. Composed of a cell that acts as the memory component and three gates (input, output, forget/keep), these units can retain information over arbitrary periods. The gates of the LSTM unit are responsible for regulating the flow of data through the cell. One of the most prominent features of the LSTM cell is the “constant error carousel” (CEC). An LSTM network is structured similarly to an RNN, except the hidden layers comprise memory blocks instead of neurons [27].

Input gate: The unit features a sigmoidal function that regulates the inflow of data into the cell. It obtains activation from the previous output h^(t−1) and the current input x^(t). By means of the sigmoid function, an input gate produces values ranging from zero to one. A value of zero acts as a complete blockage of information, while a value of one permits the passage of all information. Equation (1) shows this process [27].

i^{t} = σ (W^{(i x)} x^{(t)} + U^{(i h)} h^{(t - 1)} + b^{i})

(1)

Cell input layer: The input to the cell is like the input gate. It takes in the previously hidden state h^(t−1) and the current input x^(t). However, a “tanh” activation function is used to squash the input values to a range between −1 and 1, which is indicated by the symbol l^t in the Equation (2) [27].

l^{t} = t a n h (W^{(l x)} x^{(t)} + U^{(l h)} h^{(t - 1)} + b^{l})

(2)

Forget gate: A unit using a sigmoidal function decides what information from previous steps of the cell should be remembered or discarded. The forget gate takes input from h^(t−1) and x^(t) and assumes values between zero and one. The next step involves a Hadamard product with the old cell state c^t−1 to update to a new cell state c^t in the below equation. If the forget gate has a value of zero, it is closed and will completely forget the information of the old cell state c^t−1. On the other hand, the value of one will make all information memorable. Thus, the forget gate has the authority to reset the cell state if the old data are deemed irrelevant. Equation (3) summarizes the forget gate mechanism [27].

f^{t} = σ (W^{(f x)} x^{(t)} + U^{(f h)} h^{(t - 1)} + b^{f})

(3)

Cell state: The cell state is responsible for storing a cell’s memory over an extended period. Each cell contains a self-connected linear unit known as a constant error carousel (CEC), which recurrently operates to prevent the vanishing or exploding gradient problem in an LSTM network. The CEC incorporates a forget gate that regulates and resets the gate as needed. At time t, the present cell state c^t is modified by the previous cell state c^t−1, controlled by the forget gate, and the current input and cell input product (i^t ∘ l^t). Equation (4) summarizes the overall update of a cell state [27].

c^{t} = f^{t} \circ c^{t - 1} + i^{t} \circ l^{t}

(4)

Output gate: A unit equipped with a sigmoidal function has the ability to regulate the passage of information from a cell. In contrast, an LSTM network utilizes the output gate values at a particular time (represented by o^t) to govern the present cell state c^t, which is then stimulated by a “tanh” function to produce the ultimate output vector h^(t). Equations (5) and (6) show the mechanism of the output gate [27].

o^{t} = σ (W^{(o x)} x^{(t)} + U^{(o h)} h^{(t - 1)} + b^{o})

(5)

h^{t} = o^{t} \circ \tanh (c^{t})

(6)

3.4.2. Facebook Prophet Forecasting Model

Facebook’s Prophet network is a strong tool that can accurately predict time-series data using daily observations of patterns at different scales. Forecast time-series data are derived from additional models, where nonlinear trends account for seasonal, weekly, and daily periods, including holiday results. This tool works best during periods with stable seasonal results and a few seasons of historical data. Prophet deals with missing data and trend changes and can often adapt to deviations. It allows users to make more accurate forecasts faster than other time series forecasting strategies, requiring very little computer time. Prophet is at the level of other models and quickly generates predictions in seconds. This tool can be used to generate accurate weather forecasts even from incomplete or black data without manual work. In addition, the Prophet has many “human” seasons of the week and year [28,29].

As a powerful tool for Python and R released in 2017 by Facebook, Prophet models time series datasets with trends, seasonality, and holidays. Prophet takes a few seconds to fit the model with tunable parameters, and it is represented by the Formula (7) [30]:

y (t) = g (t) + s (t) + h (t) + ϵ_{t}

(7)

The equation provides a comprehensive prediction formula for the Prophet forecasting model, wherein the anticipated outcome, y(t), is determined by the linear or logistic equation, g(t); seasonality based on the chosen period, such as yearly, monthly, or daily, is denoted as s(t); holiday-related anomalies are denoted as h(t); and unforeseen errors are denoted as ϵ_t. The model encompasses multiple parameters that can be fine-tuned to enhance forecasting accuracy. Depending on the intended use, the model can be classified as linear or logistic. Linear models normalize outliers and do not impose a maximum or minimum threshold. Conversely, logistic models are suitable for saturated forecasts that require defining the highest and lowest values [30].

3.4.3. Autoregressive Integrated Moving Average (ARIMA)

ARIMA, a classic statistical model, uses patterns such as trends and seasonality to predict future scores in a series. It is a generalized version of the ARMA model specially designed to handle non-stationary time series. Unlike the ARMA model, which assumes stationarity of the analyzed time series, non-stationary time series must first undergo a transformation process to remove seasonality and trends through finite-point differentiation. A stationary time series is a combination of signal and noise. The ARIMA model separates the time signal from the noise and gives its forecast for a later time point. As indicated by the method’s acronym, its structural components are the following [31,32]:

AR for autoregression: a regression model that uses the dependence relationship between an observation and several lagged observations (model parameter p).

I for integration: calculating the differences between observations at different time points (model parameter d), aiming to make the time series stationary.

MA for moving average: this approach considers the dependence that may exist between observations and the error terms created when a moving average model is used on observations that have a time lag (model parameter q) [33].

One way to represent an AR model with order p, or AR (p), is through a linear process, as shown in Equation (8). The stationary variable is denoted by x and the constant by c. The autocorrelation coefficients at lags 1, 2, …, p are represented by ∅, while the residuals are the Gaussian white noise series with a mean of zero and a variance of σ [34].

x_{t} = c + \sum_{i = 1}^{p} \emptyset_{i} x_{t - i} + ε_{t}

(8)

Equation (9) represents an MA (q) order model, wherein the θ terms denote the weights given to the current and previous values of a stochastic term in the time series. Here, μ is the expectation of x and is generally assumed to be zero, while θ equals one. We consider ε to be a Gaussian white noise series with a mean of zero and variance of σ [34].

x_{t} = μ + \sum_{i = o}^{q} θ_{i} ε_{t - i}

(9)

Equation (10) shows how these two models are combined to create an ARMA model of order (p, q), where ∅ ≠ 0, θ ≠ 0, and σ > 0. The parameters represent the AR and MA orders p and q, respectively. ARIMA forecasting, also known as Box and Jenkins forecasting, can handle non-stationary time series data due to its “integration” step. This step involves differencing the time series, converting a non-stationary time series into a stationary one [34].

x_{t} = c + \sum_{i = 1}^{p} \emptyset_{i} x_{t - i} + ε_{t} + \sum_{i = o}^{q} θ_{i} ε_{t - i}

(10)

3.4.4. ARIMA-LSTM Hybrid Model

The hybrid ARIMA-LSTM model is designed to capture the linear and nonlinear aspects of time series data. The time series must be stationary to apply the ARIMA model and predict future values. It should be checked with the Dickey–Fuller test to see if it is in place, and it should be performed if it is not already. Then, optimal parameters will be found to build the model, and finally, predictions will be made using the built model. LSTM works well for non-stationary parts of data as well as has relatively larger memory. The residuals obtained from ARIMA are fed into an LSTM model and trained to tap the pattern and predict the residuals for the next future period [35].

These two models have been chosen for their ability to break down a time series into linear and nonlinear trends, as expressed by Equation (11). In this equation, L_t illustrates the linear component of the time series at time step t. Also, N_t defines the nonlinear component, and ɛ_t represents the error term in the x_t time series [36].

x_t = L_t + N_t + ɛ_t

(11)

The reason for choosing different types of algorithms to conduct the research was to include the wide range of data found in cancer rate numbers and new case amounts and incidence rates with different qualities. LSTM networks provide flexibility in understanding long-term connected changes and complex patterns common in information. As a time-series analysis model, Prophet’s ability to deal with unexpected values, missing data, and shifting trends is beneficial for the unpredictable nature of healthcare information, like cancer-related datasets. As a statistical analysis model, ARIMA is strong at examining and predicting linear sequences and is vital for ensuring that a complete analytical method can compare against more complex models.

3.5. Evaluation Metrics

This study selected a suite of evaluation metrics that best represent the models’ forecasting accuracy, reliability, and applicability in a real-world context to comprehensively assess machine learning models, including LSTM, Prophet, and ARIMA, deployed in forecasting cancer incidence trends in Canada.

Different criteria, such as forecast error measurements, the speed of calculation, interpretability, and others, have been used to assess forecasting quality, where y is the measured value at time t. Forecast error measures or forecast accuracy are the most important in solving practical problems. Typically, the commonly used forecast error measurements are applied to estimate the quality of forecasting methods and choose the best forecasting mechanism for multiple objects. Despite their drawbacks, a set of “traditional” error measurements in every domain is applied. These error measurements are used as presets in domains despite the drawbacks. This section provides an analysis of existing and quite common forecast error measures that are used in forecasting. Measures are divided into groups according to the calculating method and value of error for a specific time t. The formula for calculating and the names of assessments are considered for each error measure [37].

3.5.1. Mean Absolute Error (MAE)

MAE measures the average magnitude of absolute errors between actual and predicted values. Lower MAE values indicate better accuracy. It is widely used in forecasting tasks because it provides an intuitive measure of prediction error in the same unit as the data.

Mean Absolute Error, is given by [37]:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |F_{i} - A_{i}|

where n is the number of observations, F_i is the forecasted value for observation i, and A_i is the actual value for observation i.

3.5.2. Mean Squared Error (MSE)

MSE calculates the average squared difference between actual and predicted values. MSE penalizes significant errors more heavily, making it useful for detecting significant deviations in predictions.

Mean Squared Error, is given by [37]:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(F_{i} - A_{i})}^{2}

where n is the number of observations, F_i is the forecasted value for observation i, and A_i is the actual value for observation i.

3.5.3. Root Mean Squared Error (RMSE)

RMSE is the square root of MSE, which provides an interpretable error value in the same unit as the target variable. RMSE is sensitive to large errors and is commonly used in forecasting applications to measure model performance.

Root Mean Squared Error is given by [37]:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(F_{i} - A_{i})}^{2}}

where n is the number of observations, F_i is the forecasted value for observation i, and A_i is the actual value for observation i.

3.5.4. Mean Absolute Percentage Error (MAPE)

MAPE expresses the prediction error as a percentage of actual values, making it easier to interpret in relative terms. As it provides a percentage-based error measure, MAPE is particularly useful for comparing errors across datasets with different scales.

Mean Absolute Percentage Error is given by [37]:

M A P E = \frac{1}{n} \sum_{t = 1}^{n} |\frac{A_{i} - F_{i}}{A_{i}}|

where n is the number of observations, F_i is the forecasted value for observation i, and A_i is the actual value for observation i.

3.5.5. Coefficient of Determination (R²)

R² represents the proportion of variance in the dependent variable that is predictable from the independent variables.

The Coefficient of Determination is given by:

R^{2} = 1 - \frac{R S S}{T S S}

R², or the coefficient of determination, is a widely used metric in regression analysis. It measures the proportion of the variance in the dependent variable that is predictable from the independent variables. However, it has limitations, such as being biased statistics and providing invalid results in the presence of measurement errors. Therefore, while R-squared is a valuable metric, it should be interpreted cautiously and in conjunction with other criteria [38]. It is widely used in regression and forecasting models to indicate how well the model explains the variability in the data. A value closer to 1 indicates a better fit.

R² is not always suitable for time series forecasting due to the complex nature of time series data and the potential for model uncertainty. Goel [39] highlights the need for hybrid models that can capture both linear and nonlinear components in time series data, suggesting that a single metric like R² may not adequately capture the predictive performance. Chatfield [40] further emphasizes the importance of considering model uncertainty in time series analysis, which can affect the accuracy of forecasts. Hewamalage [41] underscores the need for robust and efficient forecasting methods, indicating that R² may not be the most suitable metric in all cases. Lastly, Kim [42] discusses the impact of estimation error on forecast mean squared errors, which can also affect the reliability of R² in time series forecasting.

The limitations mentioned regarding the R² metric, such as potential bias and sensitivity to measurement errors, are intended to highlight practical issues related to real-world model interpretation and data quality considerations. Specifically, R² can be influenced by factors such as model specification, omitted variables, and data inaccuracies, rather than being inherently flawed mathematically. Therefore, R² values should always be interpreted with careful attention to these practical and contextual limitations, especially in complex forecasting scenarios.

4. Results

4.1. Performance of the Models

After deploying all models for every category, evaluation metrics were extracted to assess the models’ performance. The results help us to choose the best models for categories. This section gives the error rates for each model for forecasting new cancer cases and incidence rates for 2022 to 2026 for different categories. The models were trained on data from 1992 to 2021.

It is important to note that negative R² values, as observed for the Prophet model in our analysis, indicate that the model does not improve predictive accuracy compared with a baseline mean-only (intercept-only) model. Such results do not inherently suggest that the Prophet model is incorrectly specified or invalid. Rather, these negative values highlight that, within the specific context of our forecasting problem, the Prophet model’s predictive strength was comparatively weaker than the other evaluated models (ARIMA, LSTM, and Hybrid ARIMA-LSTM). Therefore, R² values, particularly negative ones, should be interpreted cautiously as indicators of relative model performance rather than absolute measures of model validity.

4.1.1. Error Rates for Geography Categories

Table 1 shows the error rates for new cancer cases and cancer incidence rates in geographical regions. The error rates have been interpreted to indicate the best models, which have been highlighted in bold and italic. The error rates show that the hybrid model performs the best in forecasting new cancer cases and cancer incidence rates for geographic regions.

4.1.2. Error Rates for Age Categories

Table 2 shows the error rates for new cancer cases and cancer incidence rates in age group categories. The error rates have been interpreted to indicate the best models, indicated by bold and italic highlights. The error rates show that the hybrid model performs the best in forecasting new cancer cases and cancer incidence rates for age group categories.

4.1.3. Error Rates for Sex Categories

Table 3 shows the error rates for new cancer cases and cancer incidence rate in sex categories. The error rates have been interpreted to indicate the best models, which have been highlighted in bold. The error rates show that the LSTM model performs the best in forecasting new cancer cases for sex categories.

4.2. Forecasting New Cancer Cases and Incidence Rate

Among the other models, the hybrid model is the optimum model for predicting the values for geography categories from 2022 to 2026. Table 4 shows the results extracted from the mentioned model.

Compared with the other models, the hybrid model is the optimum model for predicting the values for age group categories from 2022 to 2026. Table 5 demonstrates the results extracted from the model mentioned.

Compared with the other models, the LSTM model is the best model for forecasting the values for sex-based categories from 2022 to 2026. Table 6 shows the results extracted from the model mentioned.

4.3. Visualization Insights

Figure 2, Figure 3 and Figure 4 visually show the results of forecasted new cancer cases from 2022 to 2026 for regions, age groups, and sex, respectively.

Also, Figure 5, Figure 6 and Figure 7 visually show the results of the forecasted cancer incidence rate per 100,000 people from 2022 to 2026 for regions, age groups, and sexes, respectively.

5. Discussion

The findings of this study align with previous research on cancer incidence forecasting in Canada but extend existing methodologies by incorporating machine learning (ML) models to enhance predictive accuracy. Past studies, such as those by Brenner et al. [16,17,18], have relied heavily on statistical models like CANPROJ, which use age-period-cohort (APC) frameworks to extrapolate trends. While effective in capturing historical linear patterns, such models assume that past trends will continue predictably, limiting their ability to reflect complex, nonlinear dynamics in cancer data. In contrast, the machine learning models used in this study, particularly the hybrid ARIMA-LSTM model, are adaptive and data-driven, capable of learning from fluctuations and underlying patterns that may not follow a strict statistical progression.

For instance, Brenner et al. projected new cancer cases up to 2024 using historical data and demographic adjustments. Our approach not only extends these projections to 2026, offering a more updated outlook, but also improves accuracy by integrating deep learning with statistical time series models. The hybrid ARIMA-LSTM model consistently yielded lower error rates than other models, demonstrating that this combined approach effectively captures both linear and nonlinear trends in cancer incidence forecasting.

The analysis reveals distinct patterns and trends across sex, geography, and age groups. Males exhibit a steady increase in both new cancer cases and incidence rates, while females experience a rise in absolute cases but a decline in incidence rates.

The observed trend in female cancer incidence—where absolute cancer cases are increasing while incidence rates are declining—suggests demographic shifts, notably population growth and aging, rather than improvements in cancer treatments alone. Additionally, improvements in screening programs and early detection efforts (such as breast and cervical cancer screening) documented in the existing literature likely play a role in stabilizing or reducing incidence rates for certain cancer types.

Brenner et al. confirmed that despite an overall decline in cancer incidence and mortality rates, Canada is expected to see an increase in new cancer cases and deaths in 2024, primarily due to its growing and aging population. While advancements in prevention, screening, and treatment have mitigated the effects of certain cancers, these near-term projections emphasize the ongoing burden cancer may pose to individuals and the Canadian healthcare system [18].

Geographically, regions like Alberta, British Columbia, Ontario, and Quebec show increasing cancer cases and incidence rates, indicating a growing cancer burden. In contrast, provinces such as Newfoundland and Labrador demonstrate a declining trend, possibly reflecting effective local cancer control measures or public health strategies. Age-wise, cancer incidence remains stable among children but increases significantly among middle-aged and older adults, especially in individuals aged 75 and above.

These findings underscore the need for targeted public health interventions explicitly tailored to demographic groups and geographic regions showing an increased cancer burden. The steady rise in cancer cases among older adults and specific provinces highlights the importance of developing age-specific and region-specific cancer prevention and control strategies. Meanwhile, stability in pediatric cancer incidence suggests that existing pediatric cancer strategies may be effective but require sustained support.

Given the complexity of cancer incidence trends, selecting an accurate forecasting model is critical for public health planning. An evaluation of model performance reveals that the hybrid ARIMA-LSTM model consistently achieves the lowest error rates across most geographic and age group categories, making it the most suitable choice for cancer incidence forecasting. This model outperforms the ARIMA, LSTM, and Prophet models in key accuracy metrics, including MAE, MSE, RMSE, MAPE, and R².

5.1. Model Performance Across Demographics and Regions

Geographic performance: The hybrid ARIMA-LSTM model yields significantly lower RMSE values across multiple provinces, with R² values exceeding 0.80 in most cases, indicating strong predictive accuracy. In contrast, the Prophet model exhibits the highest error rates, often producing negative R² values, suggesting poor model fit.

Age group performance: The hybrid model provides the most accurate predictions for middle-aged and older adults (50+ years), achieving the lowest MAPE and RMSE values. In contrast, the Prophet model underperforms in most age categories.

Gender performance: While the hybrid model excels in most categories, LSTM performs best in gender-based forecasting, particularly for male and female cancer incidence rates, as indicated by its superior MAE and RMSE values.

The hybrid ARIMA-LSTM model’s superior performance is attributed to its ability to capture both linear and nonlinear trends in cancer incidence data. ARIMA alone is well suited for handling linear trends but struggles with complex patterns, whereas LSTM excels in capturing long-term dependencies. By integrating these approaches, the hybrid model effectively mitigates their individual limitations, resulting in more accurate and reliable cancer incidence forecasts.

5.2. Implications for Public Health and Cancer Prevention

The findings highlight the critical role of advanced forecasting models in improving cancer prevention and resource allocation strategies. Rising incidence rates in middle-aged and older adults necessitate robust screening programs and preventive measures tailored to these demographics. Similarly, region-specific public health initiatives that address unique risk factors are essential to mitigating the cancer burden. By leveraging accurate forecasting models, policymakers and healthcare professionals can develop more effective intervention strategies, optimize resource allocation, and ultimately improve cancer outcomes across diverse populations.

Our study provides a foundation for further integrating AI-driven forecasting models in epidemiological research. Future research could explore ensemble methods, combining traditional epidemiological models (e.g., APC) with deep learning architectures to refine cancer incidence predictions further. Additionally, incorporating real-time clinical and genetic data could enhance prediction accuracy and improve personalized cancer risk assessments.

5.3. Limitations

One of the key limitations of this study is the imputation of Quebec’s cancer incidence data after 2017 due to the absence of official records. While we employed a multiple imputation approach to enhance accuracy and capture uncertainty, projections remain dependent on historical trends and national averages rather than direct registry data. The variability in imputed estimates reflects the inherent uncertainty in forecasting missing data. Future studies should integrate officially updated Quebec data when available to refine predictive models further and validate imputed estimates.

Also, this study analyzes cancer incidence as a whole, whereas prevention programs typically target specific cancers. Future research should extend this analysis to individual cancer types, allowing for a more detailed examination of how prevention efforts impact site-specific cancer trends.

Moreover, our study assumes that biological and environmental factors primarily drive cancer incidence trends, but access to healthcare services plays a crucial role in early detection and diagnosis. Screening disparities, specialist availability, and healthcare system delays may introduce biases in observed cancer incidence rates across age groups and regions. Future studies should incorporate healthcare accessibility indices to better quantify the role of healthcare infrastructure in shaping cancer trends.

Author Contributions

Conceptualization, E.K. and K.P.; methodology, E.K. and K.P.; software, E.K.; validation, E.K. and K.P.; formal analysis, E.K.; investigation, E.K.; resources, E.K. and K.P.; data curation, E.K.; writing—original draft preparation, E.K.; writing—review and editing, E.K. and K.P.; visualization, E.K.; supervision, K.P.; project administration, K.P.; funding acquisition, K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in the Statistics Canada portal, published on 31 January 2024 at this link https://www150.statcan.gc.ca/n1/daily-quotidien/240131/dq240131d-eng.htm.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kibbe, W.A.; Klemm, J.D.; Quackenbush, J. Cancer Informatics: New Tools for a Data-Driven Age in Cancer Research. Cancer Res. 2017, 77, e1–e2. [Google Scholar] [CrossRef] [PubMed][Green Version]
Biemar, F.; Foti, M. Global progress against cancer—Challenges and opportunities. Cancer Biol. Med. 2013, 10, 183–186. [Google Scholar] [CrossRef] [PubMed]
Sikora, K. Developing a global strategy for cancer. Eur. J. Cancer 1999, 35, 24–31. [Google Scholar] [CrossRef] [PubMed]
Ivanović, M.; Radovanović, M. Modern machine learning techniques and their applications. In Electronics, Communications and Networks IV; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
Rathor, A.; Gyanchandani, M. A review at Machine Learning algorithms targeting big data challenges. In Proceedings of the 2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), Mysuru, India, 15–16 December 2017; pp. 1–7. [Google Scholar] [CrossRef]
Nguyen, G.T.; Dlugolinsky, S.; Bobák, M.; Tran, V.D.; López García, Á.; Heredia, I.; Malík, P.; Hluchý, L. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: A survey. Artif. Intell. Rev. 2019, 52, 77–124. [Google Scholar] [CrossRef]
Udousoro, I.C. Machine Learning: A Review. Semicond. Sci. Inf. Devices 2020, 2, 5–14. [Google Scholar] [CrossRef]
Verma, M.; Khoury, M.J.; Ioannidis, J.P. Opportunities and Challenges for Selected Emerging Technologies in Cancer Epidemiology: Mitochondrial, Epigenomic, Metabolomic, and Telomerase Profiling. Cancer Epidemiol. Biomark. Prev. 2012, 22, 189–200. [Google Scholar] [CrossRef]
Berger, M.L.; Curtis, M.D.; Smith, G.; Harnett, J.; Abernethy, A.P. Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol. 2016, 12, 1261–1274. [Google Scholar] [CrossRef]
Gupta, S.; Gupta, A.; Kumar, Y. Artificial intelligence techniques in Cancer research: Opportunities and challenges. In Proceedings of the 2021 International Conference on Technological Advancements and Innovations (ICTAI), Tashkent, Uzbekistan, 10–12 November 2021; pp. 411–416. [Google Scholar] [CrossRef]
Schlick, C.J.; Castle, J.P.; Bentrem, D.J. Utilizing Big Data in Cancer Care. Surg. Oncol. Clin. N. Am. 2018, 27, 641–652. [Google Scholar] [CrossRef]
Shweta; Riya; Kumar, A. Cancer Prediction Using Machine Learning Algorithm. Int. J. Sci. Res. (IJSR) 2022, 11, 873–875. [Google Scholar] [CrossRef]
Kaur, I.; Doja, M.N.; Ahmad, T. Data mining and machine learning in cancer survival research: An overview and future recommendations. J. Biomed. Inform. 2022, 128, 104026. [Google Scholar] [CrossRef]
Shruti Trivedi, N.K. Predictive Analytics in Healthcare using Machine Learning. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Meropol, N.J.; Donegan, J.; Rich, A.S. Progress in the Application of Machine Learning Algorithms to Cancer Research and Care. JAMA Netw. Open 2021, 4, e2116063. [Google Scholar] [CrossRef] [PubMed]
Brenner, D.R.; Poirier, A.; Woods, R.R.; Ellison, L.F.; Billette, J.M.; Demers, A.A.; Zhang, S.X.; Yao, C.; Finley, C.; Fitzgerald, N.; et al. Canadian Cancer Statistics Advisory Committee. Projected estimates of cancer in Canada in 2022. CMAJ 2022, 194, E601–E607. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Brenner, D.R.; Weir, H.K.; Demers, A.A.; Ellison, L.F.; Louzado, C.; Shaw, A.; Turner, D.; Woods, R.R.; Smith, L.M. Projected estimates of cancer in Canada in 2020. Can. Med. Assoc. J. 2020, 192, E199–E205. [Google Scholar] [CrossRef] [PubMed]
Brenner, D.R.; Gillis, J.; Demers, A.A.; Ellison, L.F.; Billette, J.-M.; Zhang, S.X.; Liu, J.L.; Woods, R.R.; Finley, C.; Fitzgerald, N.; et al. Projected estimates of cancer in Canada in 2024. Can. Med. Assoc. J. 2024, 196, E615–E623. [Google Scholar] [CrossRef]
Table 13-10-0394-01: Leading Causes of Death, Total Population, by Age Group. Statistics Canada: Ottawa, ON, Canada, 2025; Available online: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310039401 (accessed on 4 August 2018).
Canadian Cancer Statistics Advisory Committee in Collaboration with the Canadian Cancer Society Statistics Canada and the Public Health Agency of Canada. Canadian Cancer Statistics; Canadian Cancer Society: Toronto, ON, Canada, 2021; Available online: www.cancer.ca/Canadian-Cancer-Statistics-2021-EN (accessed on 28 March 2022).
Xie, L.; Semenciw, R.; Mery, L. Cancer incidence in Canada: Trends and projections (1983–2032). Health Promot. Chronic Dis. Prev. Can. 2015, 35 (Suppl. S1), 2–186. [Google Scholar] [CrossRef]
de Oliveira, C.; Weir, S.; Rangrej, J.; Krahn, M.D.; Mittmann, N.; Hoch, J.S.; Chan, K.K.W.; Peacock, S. The economic burden of cancer care in Canada: A population-based cost study. CMAJ Open 2018, 6, E1–E10. [Google Scholar] [CrossRef]
Qiu, Z.; Hatcher, J. Cancer Projection Analytical Network Working Team CANPROJ: The Rpackage of Cancer Projection Methods Based on Generalized Linear Models for Age Period/or Cohort Version, I; Alberta Health Services: Edmonton, AB, Canada, 2013. [Google Scholar]
Government of Canada, S.C. Cancer Incidence in Canada, 2021. The Daily. Available online: https://www150.statcan.gc.ca/n1/daily-quotidien/240131/dq240131d-eng.htm (accessed on 31 January 2024).
Government of Canada, S.C. Canadian Cancer Registry—Age-Standardization: Incidence; Government of Canada, Statistics Canada: Ottawa, ON, Canada, 2025; Available online: https://www.statcan.gc.ca/en/statistical-programs/document/3207_D12_V4 (accessed on 17 November 2021).
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Emmert-Streib, F.; Yang, Z.; Feng, H.; Tripathi, S.; Dehmer, M. An Introductory Review of Deep Learning for Prediction Models with Big Data. Front. Artif. Intell. 2020, 3, 4. [Google Scholar] [CrossRef]
Kaninde, S.; Mahajan, M.; Janghale, A.; Joshi, B. Stock Price Prediction using Facebook Prophet. ITM Web Conf. 2022, 44, 3060. [Google Scholar] [CrossRef]
Korstanje, J. The Prophet Model. In Advanced Forecasting with Python; Apress: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Shen, J.; Valagolam, D.; McCalla, S. Prophet forecasting model: A machine learning approach to predict the concentration of air pollutants (PM₂.₅, PM₁₀, O₃, NO₂, SO₂, CO) in Seoul, South Korea. PeerJ 2020, 8, e9961. [Google Scholar] [CrossRef]
Rundo, F.; Trenta, F.; di Stallo, A.L.; Battiato, S. Machine learning for quantitative finance applications: A survey. Appl. Sci. 2019, 9, 5574. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.; Matsopoulos, G.K. A Review of ARIMA vs. Machine Learning Approaches for Time Series Forecasting in Data Driven Networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
Sima, S.N.; Akbar, S.N. Forecasting Economics and Financial Time Series: ARIMA vs. LSTM. arXiv 2018, arXiv:1803.06386. [Google Scholar] [CrossRef]
Kulshreshtha, S.; Vijayalakshmi, A. An ARIMA-LSTM hybrid model for stock market prediction using live data. J. Eng. Sci. Technol. Rev. 2020, 13, 117–123. [Google Scholar] [CrossRef]
Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
Maxim, S.; Adriaan, B.; Shcherbakova, N.L.; Anton, T.; Janovsky, T.A.; Kamaev, V.A. A survey of forecast error measures. World Appl. Sci. J. 2013, 24, 171–176. [Google Scholar]
Cheng, C.; Shalabh Garg, G. Coefficient of determination for multiple measurement error models. J. Multivar. Anal. 2014, 126, 137–152. [Google Scholar] [CrossRef]
Goel, H.; Melnyk, I.; Banerjee, A. R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting. arXiv 2017, arXiv:1709.03159. [Google Scholar]
Chatfield, C. Model uncertainty and forecast accuracy. J. Forecast. 1996, 15, 495–508. [Google Scholar] [CrossRef]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions. arXiv 2019, arXiv:1909.00590. [Google Scholar] [CrossRef]
Kim, T.; Leybourne, S.J.; Newbold, P. Asymptotic mean-squared forecast error when an autoregression with linear trend is fitted to data generated by an I(0) or I(1) process. J. Time Ser. Anal. 2004, 25, 583–602. [Google Scholar] [CrossRef]

Figure 1. Research process.

Figure 2. Forecasted new cancer cases across various regions in Canada (the yellow-shaded area shows the predicted values).

Figure 3. Forecasted new cancer cases across various age groups in Canada (the yellow-shaded area shows the predicted values).

Figure 4. Forecasted new cancer cases in Canada by sexes (the yellow-shaded area shows the predicted values).

Figure 5. Forecasted cancer incidence rate per 100,000 people across various regions in Canada (the yellow-shaded area shows the predicted values).

Figure 6. Forecasted cancer incidence rate per 100,000 people across various age groups in Canada (the yellow-shaded area shows the predicted values).

Figure 7. Forecasted cancer incidence rate per 100,000 people in Canada by sex (the yellow-shaded area shows the predicted values).

Table 1. Error rates of geography regions for new cancer cases and cancer incidence rate.

Geography	Category	Model	MSE	MAE	RMSE	MAPE	R²
Alberta	New Cancer Cases	ARIMA	0.0028	0.0377	0.0528	0.0444	0.6795
		LSTM	0.0028	0.0379	0.0527	0.0452	0.6905
		Hybrid	0.0015	0.0344	0.0385	0.0402	0.8296
		Prophet	0.0032	0.0509	0.0564	0.0596	0.3438
	Cancer Incidence Rate	ARIMA	0.0102	0.0629	0.1012	0.0786	0.0832
		LSTM	0.0081	0.0671	0.0907	0.0820	0.0898
		Hybrid	0.0013	0.0293	0.0357	0.0315	0.8651
		Prophet	0.0051	0.0568	0.0714	0.0656	0.3887
British Columbia	New Cancer Cases	ARIMA	0.0032	0.0383	0.0567	0.0468	0.6612
		LSTM	0.0028	0.0416	0.0525	0.0491	0.7100
		Hybrid	0.0018	0.0317	0.0419	0.0382	0.8151
		Prophet	0.0038	0.0544	0.0614	0.0665	0.3208
	Cancer Incidence Rate	ARIMA	0.0112	0.0816	0.0961	0.0935	0.0659
		LSTM	0.0091	0.0769	0.0954	0.0934	0.1547
		Hybrid	0.0064	0.0628	0.0799	0.0766	0.3516
		Prophet	0.0071	0.0595	0.0845	0.0765	0.0301
Manitoba	New Cancer Cases	ARIMA	0.0092	0.0762	0.0960	0.0844	0.4569
		LSTM	0.0047	0.0547	0.0687	0.0677	0.6866
		Hybrid	0.0004	0.0169	0.0205	0.0199	0.9752
		Prophet	0.0061	0.0583	0.0779	0.0661	0.5133
	Cancer Incidence Rate	ARIMA	0.0148	0.0910	0.1218	0.1202	0.1289
		LSTM	0.0054	0.0564	0.0732	0.0598	0.2090
		Hybrid	0.0015	0.0240	0.0390	0.0279	0.8844
		Prophet	0.0089	0.0877	0.1174	0.0922	−1.7234
New Brunswick	New Cancer Cases	ARIMA	0.0059	0.0705	0.0768	0.0835	0.3275
		LSTM	0.0038	0.0493	0.0619	0.0613	0.5627
		Hybrid	0.0018	0.0258	0.0427	0.0343	0.7917
		Prophet	0.0048	0.0594	0.0694	0.0726	0.4504
	Cancer Incidence Rate	ARIMA	0.0060	0.0605	0.0776	0.0657	0.4920
		LSTM	0.0023	0.0350	0.0480	0.0433	0.5112
		Hybrid	0.0011	0.0252	0.0325	0.0281	0.7727
		Prophet	0.0068	0.0741	0.0823	0.0851	−1.1331
Newfoundland and Labrador	New Cancer Cases	ARIMA	0.0062	0.0634	0.0790	0.0748	0.2867
		LSTM	0.0155	0.0778	0.1244	0.0997	0.6262
		Hybrid	0.0003	0.0141	0.0169	0.0154	0.9763
		Prophet	0.0122	0.0816	0.1105	0.0895	−2.8241
	Cancer Incidence Rate	ARIMA	0.0060	0.0658	0.0776	0.0718	0.3345
		LSTM	0.0119	0.0692	0.1093	0.0844	0.7782
		Hybrid	0.0007	0.0211	0.0274	0.0222	0.9157
		Prophet	0.0109	0.0766	0.1046	0.0819	−0.3781
Northwest Territories	New Cancer Cases	ARIMA	0.0194	0.1343	0.1394	0.1676	0.1355
		LSTM	0.0128	0.0908	0.1132	0.1170	0.6193
		Hybrid	0.0052	0.0601	0.0721	0.0842	0.8061
		Prophet	0.0252	0.1397	0.1589	0.1923	0.1366
	Cancer Incidence Rate	ARIMA	0.0200	0.0995	0.1414	0.1293	0.07207
		LSTM	0.0164	0.0809	0.1279	0.1113	0.2239
		Hybrid	0.0076	0.0672	0.0869	0.0922	0.5949
		Prophet	0.0155	0.1101	0.1245	0.1375	0.3650
Nova Scotia	New Cancer Cases	ARIMA	0.0062	0.0634	0.0790	0.0748	0.2867
		LSTM	0.0004	0.0163	0.0206	0.0119	0.7926
		Hybrid	0.0002	0.0118	0.0139	0.0125	0.8699
		Prophet	0.0091	0.0694	0.0951	0.0767	0.4746
	Cancer Incidence Rate	ARIMA	0.0054	0.0661	0.0737	0.0780	0.6996
		LSTM	0.0087	0.0866	0.0939	0.0960	0.3972
		Hybrid	0.0003	0.0145	0.0165	0.0164	0.9188
		Prophet	0.0058	0.0641	0.0990	0.0735	−1.7068
Nunavut	New Cancer Cases	ARIMA	0.0038	0.0379	0.0617	0.0826	0.3726
		LSTM	0.0030	0.0431	0.0549	0.0744	0.9006
		Hybrid	0.0022	0.0289	0.0469	0.0557	0.9356
		Prophet	0.0309	0.1348	0.1757	0.2021	0.2435
	Cancer Incidence Rate	ARIMA	0.0010	0.0255	0.0260	0.0566	0.4159
		LSTM	0.0041	0.0373	0.0640	0.0837	0.7505
		Hybrid	0.0009	0.0233	0.0298	0.0422	0.9471
		Prophet	0.0231	0.1117	0.1521	0.1067	0.1507
Ontario	New Cancer Cases	ARIMA	0.0081	0.0614	0.0901	0.0693	0.3200
		LSTM	0.0043	0.0496	0.0658	0.0563	0.2001
		Hybrid	0.0019	0.0283	0.0436	0.0329	0.6904
		Prophet	0.0035	0.0517	0.0589	0.0570	0.1763
	Cancer Incidence Rate	ARIMA	0.0191	0.0910	0.1083	0.0936	0.55701
		LSTM	0.0107	0.0631	0.1035	0.0840	0.4479
		Hybrid	0.0021	0.0344	0.0459	0.0388	0.6580
		Prophet	0.0128	0.1011	0.1212	0.1162	−0.5615
Prince Edward Island	New Cancer Cases	ARIMA	0.0110	0.0776	0.1047	0.0879	0.3742
		LSTM	0.0187	0.0693	0.1368	0.0784	0.1700
		Hybrid	0.0036	0.0451	0.0604	0.0632	0.8212
		Prophet	0.0227	0.1229	0.1507	0.1065	0.2397
	Cancer Incidence Rate	ARIMA	0.0059	0.0588	0.0767	0.0613	0.1394
		LSTM	0.0039	0.0623	0.0626	0.0654	0.1945
		Hybrid	0.0035	0.0469	0.0590	0.0532	0.2424
		Prophet	0.0181	0.1059	0.1347	0.1042	−2.7544
Quebec	New Cancer Cases	ARIMA	0.0084	0.0511	0.0919	0.0558	0.1893
		LSTM	0.0020	0.0398	0.0444	0.0440	0.5558
		Hybrid	0.0006	0.0242	0.0250	0.0260	0.5084
		Prophet	0.0052	0.0488	0.0719	0.0522	−0.6552
	Cancer Incidence Rate	ARIMA	0.0007	0.0250	0.0267	0.0271	0.2899
		LSTM	0.0012	0.0353	0.0408	0.0996	0.6675
		Hybrid	0.0005	0.0161	0.0231	0.0207	0.8603
		Prophet	0.0112	0.0918	0.1061	0.1077	−3.3572
Saskatchewan	New Cancer Cases	ARIMA	0.0115	0.0741	0.1075	0.0958	0.4375
		LSTM	0.0111	0.0865	0.1052	0.1005	0.1892
		Hybrid	0.0048	0.0486	0.0691	0.0607	0.3505
		Prophet	0.0086	0.0688	0.0929	0.0845	0.2633
	Cancer Incidence Rate	ARIMA	0.0179	0.1095	0.1340	0.1652	0.1713
		LSTM	0.0208	0.1249	0.1442	0.1869	0.3781
		Hybrid	0.0122	0.0716	0.1103	0.0965	0.2076
		Prophet	0.0143	0.0848	0.1194	0.1119	−0.5023
Yukon	New Cancer Cases	ARIMA	0.0106	0.0862	0.1030	0.1074	0.5719
		LSTM	0.0160	0.1135	0.1145	0.0911	0.8823
		Hybrid	0.0026	0.0344	0.0513	0.0449	0.8793
		Prophet	0.0137	0.1021	0.1169	0.1293	0.3063
	Cancer Incidence Rate	ARIMA	0.0205	0.1055	0.1432	0.1283	0.3311
		LSTM	0.0044	0.0361	0.0664	0.0483	0.8559
		Hybrid	0.0016	0.0311	0.0395	0.0451	0.9427
		Prophet	0.0263	0.1244	0.1621	0.1606	−1.1203

Table 2. Error rates of age group categories for new cancer cases and cancer incidence rate.

Age Group	Category	Model	MSE	MAE	RMSE	MAPE	R²
0 to 04 years	New Cancer Cases	ARIMA	0.0186	0.1079	0.1362	0.1765	0.1522
		LSTM	0.0198	0.1249	0.1407	0.2338	0.4500
		Hybrid	0.0178	0.1267	0.1333	0.2349	0.1454
		Prophet	0.1187	0.2956	0.3445	0.6277	−6.2204
	Cancer Incidence Rate	ARIMA	0.0019	0.0343	0.0436	0.0956	0.2485
		LSTM	0.0080	0.0689	0.0896	0.1696	0.1359
		Hybrid	0.0003	0.0124	0.0169	0.0270	0.1520
		Prophet	0.0961	0.2822	0.3100	0.2458	−1.9227
05 to 09 years	New Cancer Cases	ARIMA	0.0445	0.1901	0.2108	0.4117	0.1155
		LSTM	0.0554	0.2000	0.2353	0.4075	0.0097
		Hybrid	0.0237	0.1442	0.1541	0.3136	0.3751
		Prophet	0.1057	0.2656	0.3251	0.4799	−2.0613
	Cancer Incidence Rate	ARIMA	0.0020	0.0417	0.0453	0.0712	0.2856
		LSTM	0.0018	0.0310	0.0420	0.0497	0.3977
		Hybrid	0.0010	0.0280	0.0319	0.0344	0.1493
		Prophet	0.0148	0.0859	0.1215	0.1345	−1.4655
10 to 14 years	New Cancer Cases	ARIMA	0.0112	0.0929	0.1056	0.1679	0.0206
		LSTM	0.0140	0.1061	0.1183	0.1846	0.0688
		Hybrid	0.0101	0.0845	0.1004	0.1504	0.0770
		Prophet	0.0273	0.143	0.1652	0.1652	−0.9431
	Cancer Incidence Rate	ARIMA	0.0058	0.0572	0.0761	0.1240	0.3283
		LSTM	0.0079	0.0639	0.0890	0.1169	0.2946
		Hybrid	0.0029	0.0456	0.0537	0.0632	0.6257
		Prophet	0.0317	0.1619	0.1780	0.2154	−1.0946
15 to 19 years	New Cancer Cases	ARIMA	0.0257	0.1459	0.1571	0.1969	0.0586
		LSTM	0.0270	0.1365	0.1642	0.1846	0.1824
		Hybrid	0.0256	0.1341	0.1601	0.1792	0.2258
		Prophet	0.0769	0.2243	0.2774	0.3919	−0.6949
	Cancer Incidence Rate	ARIMA	0.0019	0.0428	0.0443	0.0536	0.2529
		LSTM	0.0215	0.1133	0.1467	0.1521	0.2127
		Hybrid	0.0012	0.0264	0.0348	0.0279	0.3815
		Prophet	0.0293	0.1462	0.1711	0.1793	−0.5151
20 to 24 years	New Cancer Cases	ARIMA	0.0082	0.0584	0.0909	0.0682	0.2217
		LSTM	0.0135	0.1071	0.1160	0.1310	0.5725
		Hybrid	0.0018	0.0361	0.0420	0.0436	0.8290
		Prophet	0.0256	0.1309	0.1601	0.1498	−2.2898
	Cancer Incidence Rate	ARIMA	0.0045	0.0517	0.0675	0.0704	0.2005
		LSTM	0.0147	0.1144	0.1214	0.1427	0.4443
		Hybrid	0.0021	0.0352	0.0453	0.0392	0.4128
		Prophet	0.0319	0.1595	0.1784	0.2025	−1.7085
25 to 29 years	New Cancer Cases	ARIMA	0.0091	0.0931	0.0999	0.1045	1.5091
		LSTM	0.0045	0.0501	0.0672	0.0660	0.5181
		Hybrid	0.0039	0.0465	0.0625	0.0620	0.6534
		Prophet	0.0207	0.1293	0.1440	0.1597	−1.8633
	Cancer Incidence Rate	ARIMA	0.0017	0.0407	0.0413	0.0447	0.6333
		LSTM	0.0104	0.0715	0.1020	0.0866	0.3196
		Hybrid	0.0002	0.0130	0.0143	0.0154	0.5169
		Prophet	0.0103	0.0776	0.1016	0.0942	−0.162
30 to 34 years	New Cancer Cases	ARIMA	0.0212	0.1211	0.1457	0.1536	0.2017
		LSTM	0.0134	0.0966	0.1159	0.1141	0.1338
		Hybrid	0.0079	0.0721	0.0889	0.0849	0.1797
		Prophet	0.3214	0.2738	0.3214	0.3220	−1.5855
	Cancer Incidence Rate	ARIMA	0.0015	0.0367	0.0396	0.0447	0.2534
		LSTM	0.0164	0.1075	0.1282	0.1244	0.4915
		Hybrid	0.0004	0.0170	0.0205	0.0198	0.4139
		Prophet	0.0185	0.1128	0.1360	0.1310	−0.6678
35 to 39 years	New Cancer Cases	ARIMA	0.0040	0.0504	0.0635	0.0698	0.8737
		LSTM	0.0115	0.0984	0.1070	0.1378	0.5311
		Hybrid	0.0039	0.0484	0.0627	0.0679	0.8771
		Prophet	0.0052	0.0602	0.0724	0.0865	0.7172
	Cancer Incidence Rate	ARIMA	0.0029	0.0438	0.0543	0.0494	0.5215
		LSTM	0.0114	0.1009	0.1065	0.1131	0.0491
		Hybrid	0.0014	0.0282	0.0368	0.0310	0.2678
		Prophet	0.0254	0.1307	0.1594	0.1395	−0.3627
40 to 44 years	New Cancer Cases	ARIMA	0.0077	0.1023	0.1329	0.1390	0.6226
		LSTM	0.0036	0.0574	0.0598	0.0685	0.4477
		Hybrid	0.0015	0.0331	0.0391	0.0371	0.5223
		Prophet	0.0200	0.1269	0.1416	0.1641	0.1681
	Cancer Incidence Rate	ARIMA	0.0024	0.0466	0.0499	0.0540	0.1890
		LSTM	0.0098	0.0893	0.0990	0.0981	0.1507
		Hybrid	0.0001	0.0072	0.0105	0.0074	0.2190
		Prophet	0.0112	0.0855	0.1058	0.1006	−1.7236
45 to 49 years	New Cancer Cases	ARIMA	0.0070	0.0592	0.0835	0.0989	0.8396
		LSTM	0.0035	0.0545	0.0590	0.0900	0.2453
		Hybrid	0.0020	0.0370	0.0450	0.0613	0.7002
		Prophet	0.0554	0.1919	0.2355	0.1978	−0.3463
	Cancer Incidence Rate	ARIMA	0.0067	0.0566	0.0823	0.1302	0.2436
		LSTM	0.0122	0.0805	0.1106	0.1737	0.0910
		Hybrid	0.0001	0.0084	0.0100	0.0174	0.9899
		Prophet	0.0135	0.0998	0.1286	0.2087	−1.6021
50 to 54 years	New Cancer Cases	ARIMA	0.0053	0.0660	0.0727	0.0903	0.3711
		LSTM	0.0055	0.0491	0.0742	0.0736	0.2727
		Hybrid	0.0020	0.0328	0.0448	0.0444	0.8250
		Prophet	0.1113	0.3037	0.3336	0.3778	−0.7868
	Cancer Incidence Rate	ARIMA	0.0002	0.0137	0.0144	0.0355	0.4636
		LSTM	0.0074	0.0578	0.0859	0.1351	0.2222
		Hybrid	0.0001	0.0034	0.0039	0.0098	0.9734
		Prophet	0.0320	0.1613	0.1790	0.1922	−3.2580
55 to 59 years	New Cancer Cases	ARIMA	0.0053	0.0542	0.0726	0.0574	0.6032
		LSTM	0.0056	0.0504	0.0750	0.0578	1.2721
		Hybrid	0.0023	0.0368	0.0475	0.0401	0.1101
		Prophet	0.0230	0.1342	0.1518	0.1448	−1.794
	Cancer Incidence Rate	ARIMA	0.0068	0.0542	0.0829	0.0341	0.4939
		LSTM	0.0043	0.0483	0.0655	0.0897	0.6194
		Hybrid	0.0002	0.0112	0.0150	0.0292	0.7621
		Prophet	0.0122	0.0926	0.1106	0.1004	−0.8536
60 to 64 years	New Cancer Cases	ARIMA	0.0044	0.0595	0.0665	0.0642	0.7781
		LSTM	0.0034	0.0463	0.0586	0.0510	0.2373
		Hybrid	0.0023	0.0231	0.0481	0.0262	0.5316
		Prophet	0.0026	0.0406	0.0513	0.0462	0.6347
	Cancer Incidence Rate	ARIMA	0.0048	0.0639	0.0739	0.0402	0.4163
		LSTM	0.0093	0.0861	0.0963	0.0426	0.5478
		Hybrid	0.0002	0.0142	0.0150	0.0182	0.5654
		Prophet	0.0119	0.0987	0.1103	0.0743	0.2763
65 to 69 years	New Cancer Cases	ARIMA	0.0083	0.0662	0.0914	0.0772	0.3804
		LSTM	0.0071	0.0643	0.0840	0.0786	0.2028
		Hybrid	0.0030	0.0473	0.0546	0.0546	0.6826
		Prophet	0.0434	0.1977	0.2083	0.1373	−7.8478
	Cancer Incidence Rate	ARIMA	0.0049	0.0701	0.0867	0.0423	0.7709
		LSTM	0.0008	0.0226	0.0278	0.0313	0.3561
		Hybrid	0.0001	0.0070	0.0083	0.0235	0.6739
		Prophet	0.0069	0.0612	0.0845	0.0679	0.5764
70 to 74 years	New Cancer Cases	ARIMA	0.0080	0.0806	0.0892	0.0931	0.2429
		LSTM	0.0039	0.0520	0.0627	0.0634	0.8156
		Hybrid	0.0031	0.0459	0.0559	0.0620	0.8845
		Prophet	0.0579	0.2302	0.2409	0.1138	−2.4798
	Cancer Incidence Rate	ARIMA	0.0001	0.0101	0.0116	0.0327	0.4240
		LSTM	0.0045	0.0498	0.0672	0.1582	−0.4997
		Hybrid	0.0001	0.0086	0.0117	0.0275	0.2875
		Prophet	0.0056	0.0725	0.0935	0.2287	−1.1505
75 to 79 years	New Cancer Cases	ARIMA	0.0035	0.0524	0.0592	0.0669	0.7772
		LSTM	0.0061	0.0757	0.0778	0.0964	0.5426
		Hybrid	0.0033	0.0512	0.0572	0.0584	0.5831
		Prophet	0.0054	0.0633	0.0732	0.0903	0.4110
	Cancer Incidence Rate	ARIMA	0.0018	0.0218	0.0422	0.0563	0.0188
		LSTM	0.0023	0.0326	0.0479	0.0835	0.3755
		Hybrid	0.0010	0.0205	0.0316	0.0465	0.4501
		Prophet	0.0125	0.0835	0.11201	0.0843	0.5386
80 to 84 years	New Cancer Cases	ARIMA	0.0023	0.0442	0.0476	0.0462	0.9672
		LSTM	0.0024	0.0407	0.0485	0.0428	0.7745
		Hybrid	0.0007	0.0196	0.0261	0.0210	0.8385
		Prophet	0.0267	0.0500	0.0634	0.0575	0.1867
	Cancer Incidence Rate	ARIMA	0.0013	0.0358	0.0363	0.0623	0.2378
		LSTM	0.0042	0.0620	0.0650	0.1096	0.5701
		Hybrid	0.0002	0.0126	0.0154	0.0130	0.6267
		Prophet	0.0076	0.0694	0.0869	0.1163	0.5425
85 to 89 years	New Cancer Cases	ARIMA	0.0040	0.0524	0.0629	0.0538	0.2693
		LSTM	0.0023	0.0398	0.0480	0.0425	0.1433
		Hybrid	0.0019	0.0355	0.0432	0.0376	0.1592
		Prophet	0.0056	0.0568	0.0745	0.0601	−1.6369
	Cancer Incidence Rate	ARIMA	0.0093	0.0950	0.0963	0.2053	0.0619
		LSTM	0.0105	0.0884	0.1027	0.2135	0.2379
		Hybrid	0.0014	0.0286	0.0375	0.0650	0.2018
		Prophet	0.0192	0.1139	0.1386	0.1236	8.1164
90 years and over	New Cancer Cases	ARIMA	0.0099	0.0671	0.0995	0.0737	4.7627
		LSTM	0.0094	0.0799	0.0969	0.0894	1.5906
		Hybrid	0.0032	0.0479	0.0564	0.0536	0.4497
		Prophet	0.0182	0.1257	0.1350	0.1387	−2.9161
	Cancer Incidence Rate	ARIMA	0.0011	0.0258	0.0338	0.0560	0.5235
		LSTM	0.0037	0.0450	0.0607	0.0956	0.4617
		Hybrid	0.0003	0.0155	0.0184	0.0327	0.6609
		Prophet	0.0076	0.0714	0.0874	0.0755	−0.6405

Table 3. Error rates of sex-based categories for new cancer cases and cancer incidence rate.

Sex	Category	Model	MSE	MAE	RMSE	MAPE	R²
Males	New Cancer Cases	ARIMA	0.0061	0.0506	0.0781	0.0552	0.0908
		LSTM	0.0023	0.0219	0.0481	0.0244	0.5148
		Hybrid	0.0029	0.0364	0.0537	0.0399	0.3304
		Prophet	0.0031	0.0433	0.0557	0.0478	0.4621
	Cancer Incidence Rate	ARIMA	0.0069	0.0728	0.0833	0.0772	0.1768
		LSTM	0.0022	0.0415	0.0465	0.0454	0.0470
		Hybrid	0.0036	0.0491	0.0600	0.0530	0.0662
		Prophet	0.0093	0.0827	0.0967	0.08276	−1.8708
Females	New Cancer Cases	ARIMA	0.0055	0.0484	0.0740	0.0529	0.4091
		LSTM	0.0019	0.0248	0.0436	0.0288	0.3378
		Hybrid	0.0019	0.0259	0.0437	0.0292	0.3402
		Prophet	0.0021	0.0338	0.0463	0.0377	0.5566
	Cancer Incidence Rate	ARIMA	0.0058	0.0714	0.0761	0.0714	0.1504
		LSTM	0.0017	0.0324	0.0413	0.0329	0.2201
		Hybrid	0.0045	0.0606	0.0672	0.0626	0.8291
		Prophet	0.0065	0.0582	0.0805	0.0587	−0.1023
Both Sexes	New Cancer Cases	ARIMA	0.0054	0.0445	0.0741	0.0489	0.5487
		LSTM	0.0021	0.0226	0.0454	0.0258	0.4082
		Hybrid	0.0024	0.0297	0.0493	0.0330	0.3410
		Prophet	0.0032	0.0478	0.0562	0.0532	0.3917
	Cancer Incidence Rate	ARIMA	0.0036	0.0520	0.0603	0.0532	0.0001
		LSTM	0.0013	0.0292	0.0366	0.0301	0.0165
		Hybrid	0.0038	0.0562	0.0618	0.0592	0.3813
		Prophet	0.0075	0.0679	0.0867	0.0703	−0.6117

Table 4. Predicted number of new cancer cases and cancer incidence rates by regions in Canada.

Geography/Predicted Years		2022	2023	2024	2025	2026
Alberta	New Cancer Cases	23,907	24,387	24,702	24,975	25,927
Alberta	Cancer Incidence Rate	443.8	462.2	461.7	433.7	458.9
British Columbia	New Cancer Cases	31,801	32,438	32,729	32,728	33,456
British Columbia	Cancer Incidence Rate	531.3	539.5	540.6	537.9	545.7
Manitoba	New Cancer Cases	8181	8300	8374	8418	8530
Manitoba	Cancer Incidence Rate	505.3	516.0	517.2	513.1	509.3
New Brunswick	New Cancer Cases	5959	5999	6033	6076	6187
New Brunswick	Cancer Incidence Rate	656.8	657.3	648.5	647.9	662.2
Newfoundland and Labrador	New Cancer Cases	4171	4116	4062	3976	4012
Newfoundland and Labrador	Cancer Incidence Rate	693.4	690.7	678.8	650.3	596.1
Northwest Territories	New Cancer Cases	241	237	244	240	263
Northwest Territories	Cancer Incidence Rate	389.0	400.4	396.4	375.7	341.8
Nova Scotia	New Cancer Cases	7467	7434	7445	7511	7562
Nova Scotia	Cancer Incidence Rate	693.5	674.2	652.9	652.6	653.8
Nunavut	New Cancer Cases	84	79	77	79	81
Nunavut	Cancer Incidence Rate	200.1	196.7	188.3	179.4	173.0
Ontario	New Cancer Cases	94,677	94,763	94,975	95,398	97,951
Ontario	Cancer Incidence Rate	582.2	573.6	555.0	545.0	543.7
Prince Edward Island	New Cancer Cases	1102	1111	1157	1161	1181
Prince Edward Island	Cancer Incidence Rate	620.0	620.8	626.4	632.8	636.9
Quebec	New Cancer Cases	63,803	63,841	63,932	64,073	64,264
Quebec	Cancer Incidence Rate	675.7	668.7	669.5	671.8	670.4
Saskatchewan	New Cancer Cases	6585	6636	6666	6624	6716
Saskatchewan	Cancer Incidence Rate	491.8	495.5	496.5	494.9	493.7
Yukon	New Cancer Cases	199	198	188	183	188
Yukon	Cancer Incidence Rate	445.4	464.5	445.8	393.0	333.6

Table 5. Predicted number of new cancer cases and cancer incidence rate by age groups in Canada.

Age Group/Predicted Years		2022	2023	2024	2025	2026
0 to 4 years	New Cancer Cases	453	454	452	450	450
0 to 4 years	Cancer Incidence Rate	23.2	23.4	23.4	23.4	23.4
5 to 9 years	New Cancer Cases	280	270	265	264	266
5 to 9 years	Cancer Incidence Rate	13.8	13.9	13.8	13.8	13.8
10 to 14 years	New Cancer Cases	286	290	291	286	290
10 to 14 years	Cancer Incidence Rate	13.2	13.5	13.3	13.4	13.5
15 to 19 years	New Cancer Cases	534	531	535	537	530
15 to 19 years	Cancer Incidence Rate	24.4	24.1	24.1	24.1	24.2
20 to 24 years	New Cancer Cases	937	929	936	930	935
20 to 24 years	Cancer Incidence Rate	36.4	35.7	35.8	35.9	35.9
25 to 29 years	New Cancer Cases	1605	1625	1633	1665	1715
25 to 29 years	Cancer Incidence Rate	59.6	60.1	60.4	60.5	60.3
30 to 34 years	New Cancer Cases	2514	2526	2536	2545	2555
30 to 34 years	Cancer Incidence Rate	91.4	90.9	90.7	91.1	91.5
35 to 39 years	New Cancer Cases	3960	4040	4073	4095	4179
35 to 39 years	Cancer Incidence Rate	137.6	136.9	135.8	135.5	136.0
40 to 44 years	New Cancer Cases	5660	5678	5705	5732	5808
40 to 44 years	Cancer Incidence Rate	210.2	210.4	210.1	209.5	209.3
45 to 49 years	New Cancer Cases	8595	8489	8379	8287	8234
45 to 49 years	Cancer Incidence Rate	315.5	314.2	315.8	316.5	315.9
50 to 54 years	New Cancer Cases	15,325	14,930	14,509	14,018	13,488
50 to 54 years	Cancer Incidence Rate	509.6	510.0	510.6	510.9	510.9
55 to 59 years	New Cancer Cases	22,931	22,845	22,740	22,576	22,228
55 to 59 years	Cancer Incidence Rate	775.0	775.4	774.7	774.8	774.6
60 to 64 years	New Cancer Cases	32,186	32,419	32,481	32,641	33,517
60 to 64 years	Cancer Incidence Rate	1140.0	1144.1	1142.8	1139.6	1140.5
65 to 69 years	New Cancer Cases	38,617	39,032	39,397	39,965	41,398
65 to 69 years	Cancer Incidence Rate	1604.8	1608.2	1604.8	1603.0	1603.0
70 to 74 years	New Cancer Cases	39,764	40,584	41,431	42,145	43,771
70 to 74 years	Cancer Incidence Rate	2040.0	2037.9	2037.4	2033.9	2031.8
75 to 79 years	New Cancer Cases	31,024	31,431	31,890	32,303	33,313
75 to 79 years	Cancer Incidence Rate	2362.2	2369.2	2372.2	2372.2	2368.4
80 to 84 years	New Cancer Cases	22,494	22,578	22,682	22,787	22,883
80 to 84 years	Cancer Incidence Rate	2590.6	2591.1	2590.8	2588.9	2586.9
85 to 89 years	New Cancer Cases	14,756	15,094	15,181	15,342	15,602
85 to 89 years	Cancer Incidence Rate	2667.2	2671.3	2671.5	2672.3	2674.6
90 years and above	New Cancer Cases	8185	8311	8432	8550	8685
90 years and above	Cancer Incidence Rate	2446.3	2449.3	2444.8	2437.5	2440.8

Table 6. Predicted number of new cancer cases and cancer incidence rate by sex in Canada.

Sex/Predicted Years		2022	2023	2024	2025	2026
Males	New Cancer Cases	129,012	129,515	130,009	131,690	133,088
Males	Cancer Incidence Rate	600.6	598.9	599.7	600.8	602.3
Females	New Cancer Cases	119,514	120,428	121,119	121,614	123,904
Females	Cancer Incidence Rate	540.1	539.2	537.3	534.4	530.6
Both Sexes	New Cancer Cases	244,007	244,941	245,242	245,051	248,705
Both Sexes	Cancer Incidence Rate	567.6	566.3	565.3	563.8	561.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kaviani, E.; Passi, K. Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms 2025, 18, 265. https://doi.org/10.3390/a18050265

AMA Style

Kaviani E, Passi K. Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms. 2025; 18(5):265. https://doi.org/10.3390/a18050265

Chicago/Turabian Style

Kaviani, Ehsan, and Kalpdrum Passi. 2025. "Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques" Algorithms 18, no. 5: 265. https://doi.org/10.3390/a18050265

APA Style

Kaviani, E., & Passi, K. (2025). Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques. Algorithms, 18(5), 265. https://doi.org/10.3390/a18050265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forecasting Cancer Incidence in Canada by Age, Sex, and Region Until 2026 Using Machine Learning Techniques

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Research Design

3.2. Data Source

3.3. Data Preprocessing

3.3.1. Handling Missing Data

3.3.2. Normalization

3.3.3. Temporal Indexing

3.3.4. Stationarity Checks

3.4. Model Selection

3.4.1. Long Short-Term Memory (LSTM)

3.4.2. Facebook Prophet Forecasting Model

3.4.3. Autoregressive Integrated Moving Average (ARIMA)

3.4.4. ARIMA-LSTM Hybrid Model

3.5. Evaluation Metrics

3.5.1. Mean Absolute Error (MAE)

3.5.2. Mean Squared Error (MSE)

3.5.3. Root Mean Squared Error (RMSE)

3.5.4. Mean Absolute Percentage Error (MAPE)

3.5.5. Coefficient of Determination (R2)

4. Results

4.1. Performance of the Models

4.1.1. Error Rates for Geography Categories

4.1.2. Error Rates for Age Categories

4.1.3. Error Rates for Sex Categories

4.2. Forecasting New Cancer Cases and Incidence Rate

4.3. Visualization Insights

5. Discussion

5.1. Model Performance Across Demographics and Regions

5.2. Implications for Public Health and Cancer Prevention

5.3. Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.5.5. Coefficient of Determination (R²)