3.2.1. Partial Least Squares (PLS) Regression
For the predictive modelling portion of the study, elemental, mineralogical and ore composition data, coupled with depositional type, were retained for a total of 46 independent (explanatory) variables including 5 composite variables, e.g., total clays. The depositional setting variable was one-hot encoded to map its categorical data to integer values, represented as a binary vector [
74]. Total bitumen recovery was reserved as the lone dependent (response) variable for the sake of this conceptual study. A PLS regression algorithm was written in Python coding and follows well-established theory after several authors [
43,
75,
76]. The methodology begins by calculating the SVD of the correlation matrix 
, as described for Equations (1) and (2), and iteratively computing sets of orthogonal latent variables with the corresponding regression weights.
During each successive iteration, the first left and right singular vectors (
 and 
) are used as weight vectors to calculate sets of scores (
 and 
) for 
 and 
, respectively; loadings are then obtained by regressing 
 and 
 against the same vector 
 (
 and 
) [
75]. The last step of the iteration “deflates” the current data matrices (i.e., removes information related to the 
th latent variable) by subtracting the outer products 
 and 
 from 
 and 
, respectively [
75]. The next component (or latent variable) can then be calculated starting from the SVD of the cross-product of the newly deflated matrices (
 and 
). The process continues until 
 is completely decomposed into 
 components and a null matrix is obtained. After each iteration, vectors 
, 
, 
 and 
 are stored as columns in their respective matrices 
, 
, 
 and 
. The matrix of regression coefficients (
) can then be calculated as:
          where 
 is in fact the Moore-Penrose pseudoinverse for the generalized case of a non-symmetric matrix [
76]. Finally, the matrix of regression coefficients (
) is multiplied by the original set of independent variables prior to any deflations (matrix 
) to obtain the predictions of the dependent variables (matrix 
) [
43]. A number of criteria can be calculated to select the appropriate number of components to keep while limiting loss of significance, evaluate the quality of prediction and validate the model, i.e., cross-validation.
Validation is critical to the development of robust predictive models; the quality of prediction must be assessed, and model significance also determined. A common measure of prediction quality is called the residual estimated sum of squares (RESS) and is calculated as follows:
          where 
 is the squared matrix norm and decreases as prediction quality improves [
43]. However, RESS alone is not the most useful metric, as it will continue to decrease until all components are added, i.e., it does not detect overfitting. An improved measure for quality of prediction is the predicted residual estimated sum of squares (PRESS), computed as follows:
          where 
 represents a predicted set of values generated from cross-validation and also decreases with increasing prediction quality [
43]. The selection of the optimal number of components to extract is crucial to avoid overfitting the data. Since prediction quality typically first increases then decreases upon successive component addition, a possible approach is to begin with the first component and stop as soon as the PRESS reverses direction [
31]. A more intricate method is to compute the 
 criterion for the 
th component, as follows:
          and compare against an arbitrary critical value (e.g., 
); only components with a 
 value greater than or equal to this threshold are generally kept in the model [
31,
33].
Because the available dataset is limited to only 60 samples, it was decided not to split the data into separate training and validation sets; instead, leave-one-out cross-validation (LOOCV), also called the “jackknife” method, was utilized. In this technique, each observation is iteratively dropped from the set, and the remaining observations then comprise a training set used to estimate the left-out observation. All estimated observations are stored in a final matrix denoted 
, which then serves as the validation set for subsequent prediction quality metrics (e.g., PRESS and 
 criteria) [
31].
The PLS regression model was run sequentially, and a series of quality of prediction statistics were tabulated for each of 1, 2, 3, 4, 5 and 10 component scenarios (
Figure 6). The 
 criterion indicates that only the first component should be kept in the prediction model, with a value of 0.28, as all ensuing trials resulted in values less than zero. However, not only was the coefficient of determination (R
2) quite low for the 1 component scenario (0.34), but root mean squared error (RMSE) and mean absolute residual values were also relatively high. Furthermore, the first component alone only accounts for ~88% of the total model variance, as determined by the sum of squares of the singular values. As a result, the behaviour of the PRESS statistic was tracked upon successive trials in order to identify an improved fit; ultimately, a total of 5 components was deemed appropriate for building the regression model in relation to the available dataset. This was based upon the fact that the PRESS value trended upwards over the first 4 components but dropped significantly upon addition of the fifth; this reversal also coincided with a much higher R
2 score of 0.72, improved (decreased) RMSE and residual values and an explained variance of 99.65%. Further addition of successive components (e.g., 10 components) did not greatly improve prediction accuracy or error metrics, resulted in poorer PRESS and 
 statistics and would likely lead to severe overfitting to the present dataset. It is also noteworthy that residuals were consistently greater for marine samples, which indicates greater variability in the predicted set for this depositional type (as expected).
Predictions from the final 5-component model are shown in 
Figure 7, and comparative descriptive statistics for the observed and predicted datasets are shown in 
Table 3. Estimated bitumen recoveries were capped at 100%, and negative values were set to zero, as crossing these thresholds is impossible in practice. The predicted values are generally quite reasonable, within ~11% for the (fluvio-)estuarine samples and ~13% for marine samples on average. This level of error (RMSE of 16.33) is not surprising on account of the assumptions made to finalize the original dataset, in addition to the significant geological variability inherent to oil sands deposits. As expected, the variability in marine sample residuals (standard deviation of 9.89%) is nearly double that of estuarine samples (standard deviation of 5.27%) and can likely be attributed to heterogeneities in clay contents and especially clay types. Overall, the PLS regression model has performed as intended and with a mere total of 60 samples from unknown and/or different mining projects altogether. This highlights the importance of rigorous sampling campaigns and characterization of appropriate geometallurgical profiles towards the development of robust predictive models, particularly for complex operations dealing with multiple and/or heterogeneous ore feeds. It is postulated that the predictive power of the present model would be greatly increased with these controls in place.
Once the regression model has been finalized with the appropriate number of components, confidence intervals for the predicted values can be calculated using the “bootstrap” cross-validation method. This technique involves the random re-sampling of the original observations with replacement, i.e., each observation can be selected zero or multiple times [
43]. This is repeated many times (e.g., 1000 or 10,000), and regression coefficients and corresponding predictions are computed for each bootstrapped sample set. The distribution of predicted values from all of these iterations is then used to estimate confidence limits for each variable; intervals that do not span zero (positive or negative) are considered significant [
43]. Similarly, bootstrap ratios can be calculated by dividing the mean of each distribution by its standard deviation; akin to a student 
t-test, if the ratio is greater than a critical value (e.g., >2, corresponding to an alpha value of approximately 0.05), the variable is also considered significant [
43].
Table 4 provides statistics computed from the distribution of 10,000 bootstrap sample sets generated from the 5-component regression model; variable significance was determined based on both bootstrap ratios and 95% confidence intervals. Of the elemental composition variables, only Na, Ca and Mg were deemed insignificant. Corresponding insignificant minerals include albite for Na; gypsum, bassanite and anorthite for Ca; chlorite for Mg; the carbonates (calcite, dolomite and ankerite) for both Ca and Mg. Interestingly, both pyrite and amorphous Fe-oxides/hydroxides were also considered insignificant (oil sands tend to contain significant heavy metals). All remaining elements and mineral phases, in addition to depositional sample type and bitumen recovery, were determined as statistically significant.
 The relationships between the independent variables can be observed visually by plotting the stored 
-loadings (matrix 
) for the first two components (
Figure 8). Bitumen content is clearly most strongly linked to elemental carbon and organic carbon (as expected); it also appears in association to silicon (quartz–silica–cristobalite), sulphur (organic sulphur), titanium minerals (rutile and ilmenite) and lepidolite (Li-rich mica). The first dimension also opposes the bitumen group from the clay minerals, water content and carbonates (siderite). Notably, anatase (metastable form of TiO
2) plots opposite the other Ti-bearing phases. In the second dimension, the organic-related groups (bitumen, carbon and sulphur) clearly oppose the related silicate and oxide minerals; there is also a broad separation between silicates and carbonate + iron-bearing phases.
Overall, the PLS regression model has proven to be a powerful prediction tool, capable of providing additional useful information regarding process variables that can help drive the characterization of geometallurgical profiles, sampling methodologies and other planning processes.
  3.2.2. Discrete Event Simulations
For the DES portion of the study, two ore types were classified according to documented depositional setting and predicted bitumen recoveries from the 5-component PLS regression model. Ore type 1 consists entirely of marine samples (generally <75% recovery), and ore type 2 includes (fluvio-)estuarine samples (>75% recovery) as well as a few of marine type with recoveries also greater than 75%. Due to the limited nature of the dataset (only 3 samples per mining parcel), natural background noise was added to the relative proportions of ore types 1 and 2 via random number generation with a standard deviation of 5%. Two modes of operation (A and B) are considered here to balance stockpile levels against bitumen extraction rates and incoming ore feed from mining. While the conceptual mine has been operating in areas predominantly containing ore type 2 (favourable due to higher grades and recoveries) for some time, a large expansion of reserves comprising mainly ore type 1 has recently been completed. With the expansion, longer term forecasts suggest an overall deposit composition of 55–45% for ore types 1 and 2, respectively, with increased variability caused by geological heterogeneities; these values correspond to the average proportions determined from ore classification based on the predictive modelling.
In order to sustain the availability of ore type 2 and improve the economics of certain portions of the newly expanded area, the operation is evaluating possible adjustments to current blending strategies and intends to implement a secondary alternate mode. Based on the geological attributes of ore types 1 (high bitumen, low fines) and 2 (high fines), the new strategy will also serve to control the distribution of solids in ore feeds to the froth treatment plant, thereby improving amenability to transport via pipelines to the upgrader. As a result, operational Mode A will consist of an approximate 40–60 blend of ores 1 and 2. Because ore type 2 will generally be in shorter supply, a second operational Mode B, consisting of an 80–20 blend of ores 1 and 2, is needed in order to avoid stockouts, or an eventual shortage. This will ultimately stabilize feed balances, maximize equipment/infrastructure selection and utilization and allow for improved production scheduling; collectively, these factors can lead to significant reductions in operating and capital costs.
Both modes are expected to perform similarly in terms of downstream bitumen recovery processes, except that Mode B requires a pre-treatment stage to control excess chloride ions related to the marine origin and high fines content of ore type 1. Consequently, bitumen extraction rates for Mode B are set 10% lower than those for Mode A; modal parameters for each configuration are summarized in 
Table 5. Despite the fact that Mode A is both more productive and economical, ore stockouts would be inevitable over extended periods of usage because the weight fraction of ore type 2 (
w2A) is 15% higher than that of the deposit (
w2D). To account for the possibility of stockouts prior to a planned shutdown, contingency modes with adjusted configuration rates have been incorporated for each of Modes A and B.
Appropriate weight fractions (
w1A,2A,1B,2B) and throughput rates (
rA,B) are assessed with respect to geological estimations (
w1D,2D) using deterministic mass balancing, as follows [
15,
17]:
          where 
tA and 
tB denote the time elapsed under Modes A and B, respectively. Average throughput between the two modes, or similarly between each mode and its corresponding contingency configuration, can then be computed as follows [
15,
17]:
Equations (7) and (8), which ignore the risk of stockout, indicate that Mode A should be applied 1.5 times as often as Mode B, with an average throughput of 28,800 t/h. The framework aims to simultaneously maximize throughput and minimize target stockpile levels, thereby increasing production efficiency and reducing overall costs; larger stockpiles necessitate larger storage areas and equipment, as well as increased handling costs.
The current framework is designed such that mining rates exceed plant capacity, hence the plant acts as a bottleneck. To ensure stockpiles are adequately supplied to maintain consistent ore feed to the plant, ore will be mined at minimum rates of 30 kt/h under Mode A and 27 kt/h under Mode B. Target total stockpile level is a control variable that remains constant (except during extended stockout periods); however, the relative proportions of ore types 1 and 2 fluctuate contingent on the active operational mode. Mode A (productive phase) causes a relative decrease in the proportion of ore type 2, meanwhile Mode B (replenishment phase) has the opposite effect. The selection of operational mode is based on the stockpile level of the limiting ore type (in this case, ore 2) at the end of a production campaign during planned shutdowns every 4 weeks.
Under the present framework (
Table 5), a naïve analysis indicates a critical threshold of 2.916 Mt for ore type 2; this level is computed as a function of campaign length (27 days) and rate of change under Mode A (108,000 t/d; plant capacity of 720,000 t/d × 
w2D of 45% less relative critical ore 2 throughput from 40–60 blending strategy). Similarly, the analysis indicates a minimum total target stockpile level (sum of ores 1 and 2) of 4.374 Mt, determined as the maximum rate of change between ore stockpiles 1 and 2 as a function of campaign duration (under either mode). However, the digital twin is subject to the geological uncertainty of the ore, which is not taken into account by Equations (7) and (8). Unexpected fluctuations in ore feed attributes can indeed cause either overages or shortfalls for a given ore type, potentially leading to stockout towards the end of a production campaign [
15]. To mitigate this risk, an operational buffer can be introduced by raising the threshold for the critical (limiting) ore type; a similar control measure would be to raise the target total stockpile level.
Because stockouts are nonetheless a real possibility, recourse actions are built into the digital twin to maintain ore feed consistency. These recourse actions depend on the timing of stockout; if an ore type is depleted during a production campaign, a contingency mode is enacted that allows the exhausted stockpile to build back up. As indicated in 
Table 5, Contingency Mode A only consumes ore type 1, and Contingency Mode B only consumes ore type 2. These contingency modes are much less productive than the regular configuration rates (65% for Mode A and 50% for Mode B); as a result, the duration of these segments has been limited to 1 day, which causes alternations until the next planned shutdown. If the critical ore level remains below the selected threshold at the end of a campaign, the plant will employ the alternate mode of operation to re-equilibrate stockpile levels. Time segment parameters for production campaigns, shutdowns and contingency mode duration are summarized in 
Table 6.
The current framework was implemented, and subsequent computational results (
Table 7 and 
Table 8, 
Figure 9 and 
Figure 10) generated, using commercial DES software (Rockwell Arena©) with Visual Basic for Applications (VBA). Extended operating periods can be simulated to assess system performance in response to geological uncertainty, with adjustments made to the critical ore and target stockpile levels as control variables. In its present configuration, the simulation model assumes that ore is mined to completion from a single parcel at a time. The framework has the flexibility to incorporate geological uncertainty by reading data from external source files. For the purposes of this study, uncertainty was introduced through Monte Carlo simulation; the proportions of ore types 1 and 2, determined from the classification of mining parcels based on depositional setting and predicted recoveries, were used to generate 100 statistical replicas through random number generation with a standard deviation of 5%. The model is configured such that 792 Mt of ore are processed within each replica, corresponding to approximately 1200 days of operation.
A series of simulations were run to observe the effects of the selected control variable levels on throughput and potential stockout risk, in response to geological uncertainty. The first set of trials varied the total stockpile target levels, while holding the critical ore 2 level constant at 2.916 Mt (deterministic value). A total of 5 scenarios were considered with total stockpile levels set at 1× (“one times”), 1.5×, 2×, 3× and 5× the deterministic value (4.374 Mt); simulated results for each are summarized in 
Table 7.
Consistent with Navarra et al. [
15] and Wilson et al. [
17], the results show that naïve selection of the total stockpile target level does not perform well over extended operating periods, with Mode A being applied only 1.1× as often as Mode B for an average throughput of 26.5 kt/h. This is clearly less productive than the deterministic result of 28.8 kt/h (Mode A applied 1.5× more than Mode B), and the simulated operation also suffered from frequent sustained shortages of both ore types (
Figure 9a). Increasing the total stockpile level by just 1.5× (Scenario 2) already improves overall system response; however, with Mode A applied 1.3× as often as Mode B for an average throughput of 28.1 kt/h, this is still worse than expected from Equations (7) and (8). Scenario 3, which doubled the deterministic total stockpile level to 8.748 Mt, produced the best overall results with Mode A applied 1.75× more frequently than Mode B for an average throughput of 28.7 kt/h; there was also a drastic reduction in the proportion of time spent in contingency modes (
Figure 9b). Successive increases to the stockpile targets (Scenarios 4 and 5) did not show any marked changes, and system performance was actually slightly worse for both. These results suggest that in order to maximize throughput and mitigate stockout risk, the target total stockpile level is best maintained in the range of 2–3 times the selected critical ore threshold.
Using the parameter values established from Scenario 3, the framework was subsequently configured to simulate 100 replications, corresponding to approximately 120,000 days of operation. Average results from this test mirrored those of the single replication (
Table 7) but highlighted repeated ore shortages as a significant operating risk under this scheme, with 82% of the replications confronted by stockouts. While not apparent from the single replication simulation, this outcome is directly related to the high variability of the dataset and is entirely possible in the context of oil sands mining, particularly when dealing with multiple and/or heterogeneous ore feed sources. Frequent and/or sustained stockout periods (especially early in a campaign) require additional consideration; as a recourse action, the possibility for mining surges has been incorporated into the framework in order to supply ore feed directly to the plant to maintain production (
Figure 10).
To attenuate the significant stockout risk observed under Scenario 3, a second set of simulations were executed in which adjustments were made to the critical ore limit while keeping the total stockpile target at 2× this level. Four scenarios were tested with critical ore levels designated at 1.5×, 2×, 2.5× and 3× the deterministic value (2.916 Mt); results for each simulation trial are summarized in 
Table 8. While variations in the critical ore threshold had no meaningful effect on throughput rates or modal proportions, important reductions in the number and frequency of stockout periods were observed with the framework configured for 100 replications (~120,000 operating days). At twice the deterministic value (Scenario 6), the number of replications affected by ore shortages was reduced by 20% (cf. Scenario 3); at 2.5× (Scenario 7), this number decreased to just 5%. Tripling the critical value actually eliminated simulated stockout periods altogether; however, increased capital and operating costs associated with exceedingly large stockpile levels must be weighed against the risk of stockout in the decision-making process.
The time-averaged distribution of operational modes in response to geological uncertainty can be useful to evaluate the effects of varied control parameters. 
Figure 11a represents the naïve approach of Scenario 1, in which the deterministic values for the critical ore level (2.916 Mt) and total stockpile target level (4.374 Mt) were applied; 
Figure 11b depicts the enhanced framework configuration established under Scenario 7 (described above). The latter scheme is a significant improvement over the naïve setup, with an 8–9% increase in the proportion of time spent under Mode A, a much lower reliance on contingency modes (~15%) and the virtual elimination of ore stockouts. All of these factors contribute to improved production efficiencies; moreover, the enhanced configuration is also more economical based on higher consumption rates for ore type 2, which boasts higher overall grades and bitumen recoveries. Both framework applications benefit from the ability to switch between modes relatively freely in response to data variability, but the enhanced configuration is much less susceptible to operational risk caused by geological heterogeneities.
Overall, these simulation results support the flexibility of DES digital twins to integrate predictive modelling data generated through PLS regression (or other advanced methods) in order to assess the system-wide response to geological uncertainty. This quantitative framework is an extension of recent conceptual work by Navarra et al. (2019) and Wilson et al. (2021) and demonstrates its adaptation to evaluate operational risk factors associated with potential processing applications for Canada’s oil sands. Simulations indicate that ore stockouts are a very real possibility due to extreme geological heterogeneities inherent to oil sands; however, the current digital twin allows for the analysis of potential adjustments to control strategies at an earlier stage, which can help drive decision making and mitigate identified risk factors. The blending control strategies described in this study would necessitate significant stockpiling infrastructure and equipment, but these implied costs could easily be offset by higher throughputs, minimized downtime and extended operational life achieved through the implementation of alternate modes of operation.