2.1. The Python Implementations
The two Python implementations are written by Peyman Sadrimajd et al. [
12] and were validated in comparison with the MATLAB implementation of BSM2. One of the implementations is older and considers a fixed inflow rate; this will be called the “
SteadPy” implementation, since it considers a steady inflow. The second Python implementation is newer and considers a variable inflow rate; this will be referred to as “
DynPy”, since this implementation models dynamic inflow. Both of the Python implementations are in the DAE form and hard code all of the equations directly: that is, they do not code them in matrix form. The only real difference between
DynPy and
SteadPy is how they solve the DAE system.
To solve the ODEs, both versions of the Python code use
DOP853, which is an “explicit Runge–Kutta method of order 8” [
13].
DOP853 is a Python implementation of the
DOP853 algorithm originally written in Fortran. It is an adaptive step-size method that is error controlled. To solve the algebraic equations, the time span is broken into time steps, and at each time step, the Newton–Raphson method is used to solve the algebraic equations, as specified by the BSM2.
SteadPy recomputes the algebraic equations every 15 min of simulation time. Conversely,
DynPy takes a vector of time steps and an array of inflow rates, where each row of the array correspond to a time step, and it recomputes the algebraic equations at each time step.
To make
DynPy model a system with constant inflow values (such as
SteadPy and the Julia implementation), it is only necessary to keep the row vector of inflow values constant for every time step. To keep the comparison consistent, the vector of time steps for
DynPy was kept the same as it was originally coded. This vector is given in
Table 1. Keeping the inflow vector constant means
DynPy and
SteadPy solve the same system with the same numerical methods; the only difference between them is then how often the algebraic equations are recomputed.
2.2. The Java Implementation
The Java implementation was written by Liam Pettigrew et al. [
14]. It is based on the Matlab code for the BSM2 and implements the DAE version of ADM1. Like the Python codes, it hard codes the equations instead of coding them in matrix form. This team has also produced a modified version of the code [
15] that considers changes to the process rates. That version is not considered in this paper.
It uses the
AdamsBashforthIntegrator class included in the
org.apache.commons. math3.ode.nonstiff package to solve the ODEs. This class implements an explicit linear multistep solver known as an Adams method [
16].
AdamsBashforthIntegrator also implements error control using an adaptive step size. Similarly to
SteadPy, the Java implementation uses the Newton–Rhaphson method suggested by the BSM2 to recompute the algebraic equations every 15 min of simulation time.
2.3. The Julia Implementation
The Julia code was created to meet two main demands: the first to be flexible to model alterations, and the second to offer greater flexibility in the output. The Java and Python implementations both hard code the equations, whereas the Julia code implements the matrix form given by Equation (
1). The matrix form is easier to make changes to, since it only requires editing discrete entries of the Petersen matrix and rate equations instead of going through and editing each equation individually. Additionally, the Java implementation and
SteadPy only output the system solution at a specified final time. The
DynPy outputs the solution at discrete times, but these times must be specified by the input, which can lead to a greater build-up of error if the step sizes are chosen to be too large, as can be seen in
Section 3. Therefore, simulating a chain of reactors, where the outflow of one reactor becomes the inflow of another, requires additional changes to these programs, and it also introduces the possibility of a build-up of errors.
Since computational time is an important factor in the utility of an ADM1 implementation, the Julia implementation was developed to exploit the purported speed of Julia’s DE solvers [
17]. In contrast to the other implementations, the Julia implementation is not in DAE form but is instead in ODE form. This choice was made because Julia’s ODE solvers are more versatile than its DAE solvers, offering more flexibility in the coding.
The Julia implementation also returns the solution for a range of times
t within the specified time range, as opposed to only returning the final solution. Additionally, the Julia implementation exclusively uses adaptive time-stepping methods without having to solve any algebraic equations, so the final solution does not suffer from the same inaccuracies as the dynamic Python implementation, which will be seen in
Section 3. After testing several solver methods,
Rodas4P() was chosen. It is a “4th order A-stable stiffly stable Rosenbrock method with a stiff-aware 3rd order interpolant [
18]”. How the sole use of an adaptive step size method will impact the accuracy of the solutions of systems with variable inflow, and therefore how it will affect the solutions of multi-reactor systems, will be the subject of another paper. Finally, some optimisations were made, such as using the
Memoize package’s
@memoize macro on functions that repeatedly take the same inputs.
@memoize stores the solutions of a function for given inputs in memory, so that the function does not have to be recomputed each time that those inputs are used. The
@profile macro also found that performing the linear algebra calculations with sparse matrices was a causing bottleneck, so the matrices were all written in full matrix form.
2.4. Null-Hypothesis Significance Testing
To compare the solutions given by each of the implementations, we will use null-hypothesis significance testing. This type of statistical analysis returns a p-value that indicates whether a so-called “alternate hypothesis” can be accepted or rejected. To conduct null hypothesis significance testing, both an alternate hypothesis and a null hypothesis are required. In this case, our alternate hypothesis is that the mean values of our quantities of interest will differ depending on the implementation of ADM1 used to compute them. Our null hypothesis is therefore that the mean values will not differ depending on implementation. If the p-value is close to zero, then we reject the null hypothesis in favour of the alternate hypothesis: that quantities of interest differ depending on the implementation used to compute them.
Commonly used null-hypothesis tests are the Student’s
t-test and one-way analysis of variance (ANOVA). They compare the mean values of a quantity of interest with respect to their variances. The two tests differ based on the number of groups that are being considered; Student’s
t-test only considers two groups of data, whereas ANOVA considers more than two groups of data [
19].
To use these two tests with large data sets, there needs to be an equality of variance [
19]. That is, there must be an equal amount of variation around each of the mean values for both the Student’s
t-test and ANOVA to return accurate results. To determine if this condition is met, Levene’s test [
19] can be applied to the data. Levene’s test returns a
p-value that measures how different the variances of the data sets are from each other. If the
p-value returned by Levene’s test is significant, i.e., if the
p-value is less than 0.05, the variances are not significatly different, and one can conclude there is an equality of variance.
If Levene’s test finds an equality of variance, then one can proceed with Student’s
t-test/ANOVA. However, if that is not the case, a different null-hypothesis test must be used to compare the data. One such test is the Kruskal–Wallis test [
20], which is used on non-parametric data, i.e., data that lack an equality of variance.
With all statistical tests, it is important to determine where resulting conclusions stem from. For this reason, post hoc tests are applied to examine the data more thoroughly. In this analysis, two post hoc tests were used to assess results: Student’s
t-test [
21] and Dunn’s test [
22]. These tests were used to make pairwise comparisons between the means to determine which implementations were significantly different from the others. Student’s
t-test was performed on the data sets that passed Levine’s test, and Dunn’s test, a non-paramteric test that functions similarly to the Student’s
t-test, was performed on the remaining data sets.
2.5. Validating the Julia Code with the Python DAE Implementation
To first validate the code, the Julia implementation was compared against the latest implementation of the Python code,
DynPy. The same initial conditions, inflow vectors, and model parameters were used in both cases. These parameters are given in the BSM2 and will be referred to as the “default” parameters. The solution was found on the time interval
since at
, the solution for these parameters will have had time to reach steady state. The maximum relative difference between the two solutions is defined as
where
is defined to be the final solution using the Julia code and
is defined to be the final solution using the Python code. The index
i refers to the component of the vector of state variables. Originally, the maximum relative difference was greater than 1000%. The difference decreased when typos were discovered in the Petersen matrix. When the Julia code returned a solution where the maximum relative difference between the solutions was less than 5%, the code was considered validated, and the following more rigorous tests were performed.
2.7. Statistical Analysis
All statistical analysis was performed using the statistical software included in the R programming language. In order for these tests to be conducted using the R IDE, supplementary packages had to be installed and called when the analysis was conducted.
For each of the four data sets described in
Section 2.6.1, three quantities of interest were compared: the weighted average of the solution, the concentration of carbon dioxide gas, and the concentration of methane gas. The weighted average
of a solution at time
is given by
where
is the solution for state variable
i at time
, so the value of the state variable at time
is weighted by its initial value, making the weighted average unitless. In this case, as mentioned in
Section 2.6.1,
. The sum is from 1 to 35, since there are 35 state variables in ADM1. It was decided to take a weighted average to attempt to ensure that the value of each state variable affects the mean equally. Since the initial conditions given in the BSM2 were chosen to be close to the steady-state solutions, the initial conditions were also chosen to be the weights.
For each of the three quantities of interest, Levene’s test was used to determine if there was an equality of variance between the mean of the values given by each implementation, and then,
ggbetweenstats was used to perform the statistical analysis and plot the data. The
tibble package [
24] was used to convert the .csv files containing the data sets into
tibble type data frames that could be interpreted by R. To determine whether the data were parametric or not, the function
leveneTest from the
car package [
25] was used to perform Levene’s test. The
ggbetweenstats function in the
ggstatsplot package [
26] was then used to plot the data and perform the statistical tests. The
ggstatsplot package uses functions from various packages to perform the statistical tests and functions from the
ggplot2 package [
27] to plot.
To specify whether the
ggbetweenstats function performs a parametric or non-parametric test, the optional argument
type is set equal to either
parametric or
non- parametric. If
parametric is specified, an ANOVA test is performed using the function
oneway.test with the optional argument
var.equal = TRUE. The Student’s
t-test is performed using the
pairwise.t.test function from the
stats package [
28]. If the
non-parametric argument is specified, the
kruskal.test from the
stats performs the Kruskal–Wallis test, and the Dunn test is performed using the
kwAllPairsDunnTest function from the
PMCMRplus package [
29].
All results were then analysed using the generated p-values to determine if they were statistically significant or not in order to determine if the null hypothesis (that the solutions do not differ based on implementation) could be accepted. The greater the p-value, the less significant the differences between the implementations are. A level of significance, , is generally chosen, below which the p-value is said to be significant. In this case, the level of significance was = 0.05, meaning that if the p-value was less than 0.05, then the null hypothesis was rejected.