water Artiﬁcial Neural Network (ANN) Modelling for Biogas Production in Pre-Commercialized Integrated Anaerobic-Aerobic Bioreactors (IAAB)

: The use of integrated anaerobic-aerobic bioreactor (IAAB) to treat the Palm Oil Mill Efﬂuent (POME) showed promising results, which successfully overcome the limitation of a large space that is needed in the conventional method. The understanding of synergism between anaerobic digestion and aerobic process is required to achieve maximum biogas production and COD removal. Hence, this work presents the use of artiﬁcial neural network (ANN) to predict the COD removal (%), purity of methane (%), and methane yield (LCH 4 /gCOD removed ) of anaerobic digestion and COD removal (%), biochemical oxygen demand (BOD) removal (%), and total suspended solid (TSS) removal (%) of aerobic process in a pre-commercialized IAAB located at Negeri Sembilan, Malaysia. MATLAB R2019b was used to develop the two ANN models. Bayesian regularization backpropagation (BR) showed the best performance among the 12 training algorithms. The trained ANN models showed high accuracy (R2 > 0.997) and demonstrated good alignment with the industrial data obtained from the pre-commercialized IAAB over a 6-month period. The developed ANN model is subsequently used to create the optimal operating conditions which maximize the output parameters. The COD removal (%) was improved by 33.9% (from 68.7% to 92%), while the methane yield was improved by 13.4% (from 0.23 LCH 4 /gCOD removed to 0.26 LCH 4 /gCOD removed ). Sensitivity analysis shows that COD inlet is the most inﬂuential input parameters that affect the methane yield, anaerobic COD, BOD and TSS removals, while for aerobic process, COD removal is most affected by mixed liquor suspended solids (MLSS). The trained ANN model can be utilized as a decision support system (DSS) for operators to predict the behavior of the IAAB system and solve the problems of instability and inconsistent biogas production in the anaerobic digestion process. This is of utmost importance for the successful commercialization of this IAAB technology. Additional input parameters such as the mixing time, reaction time, nutrients (ammonium nitrogen and total phosphorus) and concentration of microorganisms could be considered for the improvement of the ANN model.


Introduction
The palm oil industry in Malaysia is substantial, contributing nearly 3.6% to Malaysia's gross domestic product (GDP) in 2020 [1]. Moreover, the global market share of Malaysia in palm oil production and export is 25.8% and 34.3% respectively [2]. Hence, it is important that the waste produced by the palm oil industry, namely the palm oil mill effluent (POME), is regulated, and treated safely before discharging it into the environment. POME contains several polluting characteristics, including phosphorus, sulphate, volatile solid, and total organic carbon [3]. It is estimated that 2.5 to 3.5 tons of palm oil mill effluent is produced from every 1 ton of crude palm oil produced locally [4]). Therefore, a proper and appropriate operation of wastewater treatment plant (WWTP) must be utilized [5].
There are many methods of treating POME, the most common method being anaerobic digestion and facultative digestion [6]. There are some recent studies on the treatment of POME that suggest a more innovative way of treating POME. A study done by Chong et al. [7] highlights the benefit of using both aerobic and anaerobic processes in a single bioreactor, namely integrated anaerobic-aerobic bioreactor (IAAB) which are able to perform much better than conventional treatment of palm oil mill effluent methods. The simulation of integrated anaerobic and aerobic bioreactor shows that removal efficiency of chemical oxygen demand (COD) and biochemical oxygen demand (BOD) of up to 99% can be achieved while the cost of net expenditure can be reduced by 5.8%. However, there are two main concerns that have to be addressed in order to convince the palm oil millers to adopt this newly invented IAAB technology to generate biogas and producing high quality treated effluent simultaneously. This includes the stability of the anaerobic digestion process and consistent biogas generation. POME treatment is usually prone to have stability issue due to the varying characteristics of POME which depend on the crop season and the loading rate of the milling process. Besides, lack of skilled personnel for monitoring and control of anaerobic digester could also lead to slow feedback time to address any potential instability issue [8]. Therefore, the development of a prediction model of anaerobic digestion of POME for IAAB is crucial to address the aforementioned issues.
Despite the development of several mechanistic model such as International Water Association (IWA) Anaerobic Digestion Model No.1 (ADM1) [9], the need for a large number of parameters and estimations [10] makes it difficult to quantify the organic composition of the effluent feed stream, which is a piece of essential information needed to utilize the model. In recent years, there has been an increased use of machine learning such as artificial neural networks (ANNs) to model and optimize the process as an alternative to mechanistic models [11]. The ANN modelling can be used as a monitoring framework for WTTP operation that can increase cost optimization and identification of the quality or stability of wastewater effluent. The practical benefit of using ANN in WWTP is its ability to identify the complex non-linear relationship between input and output parameters that are useful for prediction and estimation of WWTP effluent properties without the need for prior knowledge on theoretical and physical laws that govern the biological process. However, the mentioned benefits of using an ANN is not applicable if the simulation were done solely with engineering software like HYSYS or SuperPro. ANN modelling can be used in WWTP to predict the performance of WTTP with a high degree of accuracy, provided that the quality of historical data is fairly good [12]. Numerous studies that use ANN to optimize the anaerobic digestion have been reported with high R 2 of up to 0.998. For instance, Güçlü, Yılmaz and Ozkan-Yucel [13] achieve a R 2 = 0.89 using the ANN model. The parameters used to train the model were pH value, gas flow rate, VFAs, temperature, organic matter, dry matter, and alkalinity. Similarly, Sathish and Vivekanandan [14] also used ANN to optimize the anaerobic digestion process using 4 input variables, which are temperature, agitation time, pH level, and concentration of substrate. They are able

Methodology
The ANN will be trained using 12 different training algorithms in MATLAB R2019b software. The best training algorithm will be determined by considering several performance criteria. Two ANN models will be developed for anaerobic digestion and aerobic process respectively. The training data set for the ANN model will be obtained from the pre-commercialized IAAB located at Negeri Sembilan, Malaysia. The operation of the IAAB can be found elsewhere [7]. The ANN model with the highest prediction accuracy and acceptable correlation coefficient (R), mean squared error (MSE), mean absolute error (MAE) and mean absolute percent error (MAPE) will be used. Sensitivity analysis is conducted to evaluate the relative importance of input parameters on the output parameters. The first ANN model developed is used to determine the optimum operating condition for the removal rate of COD, purity of methane and methane yield for the anaerobic digestion. The second ANN model developed will be used to determine the optimum operating condition for the COD removal, biochemical oxygen demand (BOD) removal and total suspended solids (TSS) removal for the aerobic process. The industry data (experimental data) will be compared with the predicted data obtained from the ANN model.

Artificial Neural Network Model (ANN)
Artificial neural networks (ANNs), or neural networks, are a complex computational model technique that can be used under machine learning to predict the output of a process. The artificial neural network structure is similar to the biological neural networks in our human brains in the way that neurons signal is being transferred from one node to another [15]. Numerous studies have been done on the application of ANN for use in improving the anaerobic reaction. For instance, the feed-forward backpropagation ANN is used to help in investigating the influence of substrates such as food waste, vegetable waste and organic loading rate on the methane generation to achieve methane purity of around 60 to 70% [16]. Moreover, an ANN model with R value of 0.997 was developed by Mougari et al. [17] to predict the production of methane and biogas from anaerobic digestion process where its substrates are organic wastes. A typical artificial neural network architecture is made up of a single input layer, a hidden layer or multiple hidden layers, and a single output layer. Each layer in the neural network architecture contains several nodes or neurons in biological terms that connect to the next layer. The nodes are given a weight that can be adjusted during the training process. The weights are increased or decreased to strengthen or weaken the strength of the interconnected neuron's signal.
In a single layer architecture, the connection between the nodes can be shown in Equation (1) [18].
where Y is the output value, and X is the input layer and W and b represents the weights and bias respectively. The subscript i and j denote the previous layer and current layer respectively. The function, f is the activation function or transfer function of the single layer architecture. There are multiple types of input and output activation functions that can be used to model a single neuron, the three most commonly used are log-sigmoid transfer function, tan-sigmoid transfer function and linear transfer function. In the case of modelling a neural network that is used for pattern recognition problems, sigmoid output neurons are the most suitable, whereas linear function output neurons are more suitable to be used for function fitting problems [19]. A non-linear input function is needed for the artificial neural network to learn non-linear relationships between the input variable and output variable. Between a non-linear function like tan-sigmoid transfer function (tansig) and log-sigmoid function (logsig), tan-sigmoid transfer function is preferred due to the [−1, 1] range of tansig as compared to [0, 1] for log-sigmoid function. Hence, tan-sigmoid transfer function is able to provide a stronger gradient. To build an ANN model, three phases are needed which is training, testing and validation. Hence, the dataset is typically divided into three categories where 70% to 80% are used for training, which the remaining data is used for testing and validation purposes. In a training process of a feedforward neural network, two main processes are involved: (a) feed-forward process and (b) backpropagation process. In the feed-forward process, the input is propagated into the neural network specifically the hidden layer and then to the output layer where all the weights are randomly assigned, bias and activation function is used to provide a predictive value [20]. Next, the back-propagation phase is initiated where the weights are adjusted to minimize the error between the predicted value and the experimental value of the network inputted from the training data. This is done by a specific set of backpropagation training algorithms.
There is a normalization of data before proceeding to train the network so that each input data contributes equally and is scaled to a standard range which helps in the efficiency of the learning process [21]. The input and output data can be normalized using the Equation (2).
where the x min and x max represent the value of the minimum and maximum data in the dataset, while y min and y max represent the range for normalization. Finally, y is the normalized value of x. The range of normalization is between −1 to 1. The hidden layers comprise several hidden neurons that are adjustable. The learning performance of the artificial neural network usually performs better if the number of hidden neurons is increased. However, a high number of hidden neurons could result in worse performance as the weights are given too much freedom to adjust and overfitting problems may arise. On the other hand, if the number of hidden neuron is too low, it will limit the performance of the artificial neural network.
The anaerobic digestion remains a kind of a black box as it is difficult to control the process due to its complex mechanism and the number of process parameters is high and variable [22]. Therefore, mechanistic modelling of the anaerobic digestion process is too simplistic and unreliable because there is limited information regarding gas-liquid mass transfer coefficients that is relevant to the biogas formation process [23]. The complexity of anaerobic digestion mechanism makes black-box modelling like artificial neural network a more attractive option as compared to mechanistic modelling. The advantage of using artificial neural network is that it can provide accurate prediction without needing any prior information or concept about the relationship between process parameters and the output [24]. Moreover, the learning abilities of artificial neural network modelling allow them to be adaptive to the complex non-linear behavior of the many complex processes in the wastewater treatment plant [25].
There are different types of ANNs models that can be classified according to their purpose and relevant features. For example, the Hopfield network is a type of recurrent ANN model that is best used for image recognition and detection, enhancement of X-ray images, etc. [26]. Kohonen network is a two layer (input and output layer) neural network that forms data cluster, which is also referred to as a self-organizing map. It is typically used for compression of higher dimensional data to a lower dimensional data while maintaining the content [27]. Both Kohonen and Hopfield network uses unsupervised learning algorithm. An unsupervised learning algorithm does not need to be labelled or classified as input data as the learning algorithm will identify the datasets based on patterns. On the other hand, supervised learning algorithms labelled and structured input data that correspond to the output data. The most common neural network that uses supervised learning algorithm is the feedforward backpropagation network (BP), also referred to as the Multi-layer Perceptron (MLP). The BP network is made up of 3 layers, mainly the input layer (independent variable), hidden layer, and output layer (dependent variable). The number of hidden layers can be varied, and its function is to assist in capturing the nonlinearity relationship between the input and output data through a supervised learning algorithm. A BP network is versatile and flexible and can be used for classification, data modelling, pattern recognition and forecasting [28]. Hence, feedforward BP network is used as the parameters (input data) in a wastewater treatment plant are structured and labelled that has an effect on the output data. However, it should be acknowledged that the accuracy of the ANN model is dependent on the quality of the training data. Hence, increasing the number of good quality training data will improve the accuracy of the ANN model.

Activation Function
In a neural network, the activation function determines how the weighted sum of the input neuron signal is transferred into an output node. There are generally only two types of activation functions that will be used to model the neural network. The activation function can be divided into two categories, non-linear and linear function. The linear function (purelin) is calculated based on the multiplication of the input value and a constant shown below in Equation (3).
The linear activation function is used at the output layer to find a linear approximation to a non-linear function. A non-linear function is required to introduce non-linearity to the network [29]. As such, a non-linear function will be used in the hidden layer. There are more variety of non-linear activation functions that can be used such as the radial basis function (Radbas), tangent sigmoid activation (tansig) function, logistic sigmoid (logsig) activation function, rectified linear unit (ReLu) activation function and symmetric saturating linear (Hardlims) activation function. Non-linearity is essential in the network to allow the model to generate complex mapping between the input and output layer. Hence, the neural network is able to predict a target variable that varies non-linearly with its input variables. A literature study done by [30] that compares different combination of activation function in the hidden layer and output layer conclude that the prediction accuracy for tansig non-linear activation function in the hidden layer and purelin linear activation function in the output layer produces the highest accuracy of predictive value of wastewater effluent. Tansig activation function used in ANN modelling also showed the highest accuracy in predicting the susceptibility of shallow landslides with an accuracy of 90% [20]. The tan-sigmoid function shown in Equation (4) has an output range of −1 to 1.
A hyperbolic tangent function is preferred as it can give a stronger gradient and positive and negative output values.
The linear activation function was chosen for the output layer as it is suitable for a continuous valued target such as the chemical oxygen demand (COD) and biogas production.
Hence, the final form of the equation in the feedforward neural network model proposed can be written as shown in Equation (5).

Training Algorithms
One of the more important values in an artificial neural network model is the weight and bias values. A training algorithm is used to determine the best value of weights and bias that is able to predict the output value. The most commonly used training algorithm is the Levenberg-Marquardt (LM) method due to lesser computational time and better performance [31,32].
The LM training algorithm is a type Gauss-Newton method, which is an optimization technique that utilized steepest descent for complex non-linear patterns. Generally, training algorithms that use Quasi-Newton method requires less computational time. However, the drawback of using LM second order optimization technique is it requires storing and loading the inverse Hessian approximation matrix. This means that extra storage is required in the computer's memory in used. The combination of inverse of Hessian algorithm in Equation (6) and Gauss-Newton algorithm in Equation (7), the LM algorithm can be represented in the below Equation (8).
where, µ represent the combination coefficient, and I is the identity matrix The combination of Gaus-Newton and steepest descent algorithm enable LM to train ANN model using two algorithms enable high efficiency and less computation time. It is typically used for small size network training where computation time is the shortest instead of large networks such as image recognition that will result in longer computation time.
The BR training algorithm can help overcome the problem of overfitting during the training of the neural network. Hence, the prediction accuracies can be enhanced. The adjustment of weights and biases is based on the LM optimization [33]. Hence, it will determine the least combination of squared error to generate a network that generalizes well. The BR trained neural network is almost identical to the back-propagation network with the difference being an additional ridge parameter included in the objective function. There are many advantages of using BR as the training algorithm for producing the neural network. For instance, there is a low probability of neural network being over-trained and over-fit, it is also not affected by the size of the network, and require minimal data set as Bayesian neural network generate consistent result with data [34]. Hence, the BR training algorithms were chosen to develop the ANN model for anaerobic digestion and aerobic process. The measurement of the performance of the ANN model is done by using the mean squared error (MSE), coefficient of correlation (R), mean absolute error (MAE) and mean absolute percentage error (MAPE) shown in Equations (9)- (12).
where, Ei: Experimental value Pi: Predicted value P i : Mean of observed data E i : Mean of experimental data N: Number of datasets/model The value of MSE provides the information regarding the accuracy of the ANN model and hence is a crucial performance factor in deciding if the model can be regarded as usable. It indicates the difference between the predicted values and experimental data. The lower the MSE value, the higher the accuracy of the fit. The value of R is used to evaluate and indicate the closeness of the data fit to the regression line. R value ranges from 0 to 1, a value closer to 1 indicates a smaller difference between the observed data and the fitted values. A R 2 value of more than 0.8 is considered to be satisfactory.

Selection of Input and Outputparameters
There are several factors that are important and greatly affect the performance of the anaerobic digester. These factors are useful in determining the production of biogas and corresponding efficiency which ultimately will help to further understand and optimize our anaerobic digestion process. In addition to that, with the use of artificial neural network, process parameters can be adjusted to maximize methane production yield while minimizing cost.
The most critical process parameters that can affect the efficiency of anaerobic digestion include the organic loading rate (OLR), pH level, presence of toxic compounds, temperature and hydraulic retention time (HRT) and others [35]. The input parameters that were used to train the artificial neural network for anaerobic digestion were the inlet flowrate (Q in ), OLR, COD in , BOD in , TSS in, and pH inlet, which are recorded periodically by the plant operators. These parameters can represent the real situation in the IAAB.
On the other hand, the input parameters used for aerobic process were OLR, COD in , BOD in , TSS in , and MLSS, DO, and F/M ratio. There are a number of parameters that could affect the performance of aerobic reaction. One of the parameters is the operating temperature of the aerobic process as it will have an effect on the aerobic bacteria ability to digest the organic waste. Different species of aerobic bacteria will have different optimum temperatures. For instance, the optimum temperature for psychrophilic microorganisms is between 12 • C to 18 • C, for mesophilic microorganisms it is between 25 • C to 40 • C, and finally for thermophilic microorganisms, the optimum temperature for best performance for the bacteria is 55 • C to 65 • C [36].
The second factor is the ratio of food to microorganisms (F/M). The ratio simply means the amount of substrate available to the microorganism in relation to the number of microorganisms present in the aeration tank. The unit can be expressed as the kg COD per kg of MLSS per day. A high F/M ratio indicates a higher amount of food as compared to the number of microorganisms that will lead to the increased growth of aerobic bacteria in the bioreactor. A high level of aerobic bacteria activity is not always the best-case scenario as excessive food will degrade the flocculation process, causing the sludge to not settle easily [37]. On the other hand, a low F/M ratio is not preferable as well due to the rapid settling rate of sludge that would lead to an increased waste rate [37]. In order to ensure that bacteria functioned optimally, the operating pH level must be adjusted to increase microbial growth. Typical bacteria will have optimum microbial growth at pH level of between 6.5 and 7.5 [36]. As mentioned, aerobic bacteria require oxygen to grow. Hence, dissolved oxygen (DO) is one of the critical parameters to tune for optimum microbial growth. The optimum concentration of DO is around 2 mg/L and above [7].
For an anaerobic digestion reactor, the most important output parameters are the production of biogas and the removal of COD. This will reflect the profitability and sustainability of the IAAB plant. Hence, the output for the ANN model will be COD removal (%), percentage of methane (CH 4 ) in the biogas, and methane yield (LCH 4 /gCOD removed ). The aerobic process is typically the secondary treatment to remove the remaining biodegradable organic matter. Hence, the output parameters for the aerobic process would be the percentages of COD, BOD, and TSS removal. These parameters are critical to ensure the final treated effluent is able to meet the discharge limit.

Optimization of Artificial Neural Network
There are 4 characteristics of the ANN's architecture that are important elements in optimizing the ANN model. First, it is the number of hidden layers present in the ANN architecture. Secondly, it is the number of neurons in each of the input, hidden, and output layers. The third element is the type of activation function in each of the layers respectively. Finally, the training algorithm used to train the ANN model is crucial as well. However, it is also worth noting that the quality and number of training data available could also impact the performance of the ANN model. The typical method to determine the best ANN architecture is by a trial-and-error process where different ANN architecture is compared to one another. The general rule of thumb is that the number of neurons in the hidden layer is increased when the time taken to train is longer and performance parameter such as the mean squared error is large. There is no definitive number of neurons or hidden layers that are needed to produce an ANN model that has high predictive value. The number of neurons present in the input layer and output layer is determined by the number of input parameters and output parameters respectively.
The activation functions are selected based on the various type of data that are present and for which layer. For example, identity function is usually utilized in the input layer, while non-linear activation function such as the hyperbolic tangent sigmoid function is used in the hidden layer. The performance of the ANN is affected by the type of training algorithm as it adjusts the weights and bias to generate the ANN model. The speed at which the training is done is also affected by the training algorithm. In general, the influencing factor of an ANN model is the number of neurons in the hidden layer and the type of training algorithm. The artificial neuron network will train 70% of the input data, 15% as the testing data, and the remaining 15% as the validation data. There are a total number of 96 datasets for anaerobic and aerobic digestion, where each dataset contains 6 input parameters and 3 output parameters.

Determination of the Number of Hidden Neurons
One of the key network parameters is the number of hidden neurons that are used in the development of artificial neural network (ANN) model. A high number of hidden neurons can cause an overfitting problem, which is that the model overestimated the target value due to the complexity of the input parameters [38]. The determination of the number of hidden neurons is done through trial and error. There have been several equations that are developed by various authors that supposedly are able to calculate the optimum number of hidden neurons. The findings are summarized in Table 1. Table 1. Equations to determine the number of hidden neurons required.

Equation
Reference Nevertheless, the equations mentioned may not be accurate in determining the optimum number of hidden neurons required. Another method that is helpful is by calculating the mean squared error (MSE) of the training and validation data at varying numbers of hidden neurons.
As shown in Figure 1, the training and validation set both showed the least MSE value when the number of hidden neurons is 10 or 12. Nonetheless, 12 training algorithms are trained using the trial and error method by changing the number of hidden neurons to further verify the optimum number of hidden neurons.

Sensitivity Analysis
Two methods are employed in this work to evaluate the relative importance o input parameters, including connection weight approach and Garson algorithm [46,

Sensitivity Analysis
Two methods are employed in this work to evaluate the relative importance of the input parameters, including connection weight approach and Garson algorithm [46,47].

Connection Weight Approach
The equation used for connection weight approach is shown as below: (13) where the W represent the value of connection weight, the superscript, I, j, o represent the input, hidden, and output neuron, respectively.

Garson Algorithm Method
The equation used for 1.1.1 Garson algorithm method is shown as below: (14) where I j represent the relative importance of jth input parameter on the output parameter, and the input and hidden neuron is represented by Ni and Nh respectively. The connection weight is represented by W. The subscript o, h, and i represent the output, hidden and input neuron respectively.

Performance of ANN Model
The performance of all 12 trained algorithms is analyzed for the anaerobic process and aerobic processes. The results will be discussed in the next section.

Anaerobic Digestion
Tables A7 and A18 (Appendix A) summarized the values of R, MAE, MAPE and MSE of anaerobic digestion. Table A7 showed the highest R value that can be achieved in a trained ANN model for COD removal response, while Tables A8-A10 showed the lowest MAE, MAPE, and MSE values for COD removal response of different training algorithms utilized at different number of hidden neurons. The training algorithm that achieves the highest R value (0.998) for the COD removal processes is the Bayesian Regularization backpropagation (BR) algorithm. The lowest MAE value of 0.211 was obtained from BR using 12 hidden neurons; the lowest MAPE value of 0.009 from BR using 10 hidden neurons; the lowest MSE value of 0.43 using 12 hidden neurons. In general, LM, BR, RP, and BFG performed much better than 8 other training algorithms. However, LM and BR have consistently ranked higher in terms of producing the best performances ANN model. Moreover, by using BR training algorithm, the MAPE error has never exceeded 10% from one neuron to the twenty neurons. On the other hand, GD has the worst performance among the other training algorithms, where it has the highest MAE value of 95.015, MAPE value of 0.77, and MSE of 9769.
The values of R value, MAE, MAPE, and MSE value for methane purity response are summarized in Tables A11-A14 (Appendix A) respectively. The highest R value achieved after training the neural network is 0.997 by using BR training algorithm with 13 hidden neurons. Interestingly, the highest and lowest MAE value that was obtained is from BR training algorithm at 13 hidden neurons and 1 hidden neuron respectively. For MAPE performance criteria, BR training algorithm clearly performs far better than others as the error for each added neuron were less than 1%. Again, BR and LM trained the best ANN model compared to others with the lowest MAE, MAPE, and MAE values. The MSE values for BR were less than 1 from 3 hidden neurons to 20 hidden neurons, indicating good performance.
The highest and lowest value of R, MAE, MAPE, and MSE for methane yield response can be found in Tables A15-A18 (Appendix A). The highest R value across all training algorithms is again obtained from BR with a value of 0.996 at 15 hidden neurons, while GD performed the worst at R value of 0.267 using 11 hidden neurons. Table A16 shows the lowest MAE that is from BR, training algorithms, where it achieves a value of less than 0.001 at 3 different cases, while GDX has the highest MAE value at 1.389. In general, all the training algorithms performed well to predict the methane yield, with MSE error ranges from 0.0001 to 0.04000. For the results shown below, 6 input parameter and 3 output parameters are used to train the ANN model. Using the Bayesian regularization (BR) backpropagation training algorithm, the input and output weights and biases are shown in Tables A8 and A9 (Appendix A).

Aerobic Process
Following the consistently better performance of BR and LM than the 10 other training algorithms in predicting the output from anaerobic digestion, only BR and LM training algorithm is used to train the data for the aerobic process. The performance criteria are summarized in Table A19, Table A20, and Table A21 (Appendix A) for aerobic process. The use of 6 hidden neurons gives an overall lower MSE value for the 3 output parameters. MSE values of 1.0288, 0.6047, and 0.5560 for the prediction of COD removal, BOD removal and TSS removal respectively.

Optimization of ANN Model Parameters
To determine the final ANN model parameters, the consideration of MSE value and R value between experimental and predicted values are used [48]. The MSE value obtained from using 12 neurons is 0.48, 0.43 and 0.0000206 for the prediction of COD removal, purity of methane, and methane yield respectively. The R value achieved is 0.998, 0.983 and 0.990 for the prediction of COD removal, purity of methane, and methane yield respectively, which indicates good fitting between the predicted values and experimental data. As shown in Figure 2a, an ANN architecture of 6-12-3 were developed where bias for the hidden layer and output layer is represented by b1 and b2 respectively.  For aerobic process, the MSE value obtained from using 6 neurons is 1.0288, 0.6047 and, 0.5560 for the prediction of COD removal, purity of methane, and methane yield respectively. The R value achieved is 0.997, 0.981, and 0.995 for the prediction of COD removal, purity of methane, and methane yield respectively. The high value of R and low value of MSE using 6 hidden neurons indicates good fitting and minimal deviation between the predicted values and experimental data. As shown in Figure 2b, an ANN architecture of 7-6-3 were developed where bias for the hidden layer and output layer is represented by b1 and b2 respectively. For aerobic process, the MSE value obtained from using 6 neurons is 1.0288, 0.6047 and 0.5560 for the prediction of COD removal, purity of methane, and methane yield respectively. The R value achieved is 0.997, 0.981 and 0.995 for the prediction of COD removal, purity of methane, and methane yield respectively. The high value of R and low value of MSE using 6 hidden neurons indicates good fitting and minimal deviation between the predicted values and experimental data. As shown in Figure 2b, an ANN architecture of 7-6-3 were developed where bias for the hidden layer and output layer is represented by b1 and b2 respectively.

Validation of the ANN Model
The illustration in Figures 3-8 depicted the correlations and corresponding visual agreement between the experimental data and the BR-ANN output. The proposed BR-ANN model demonstrated very satisfactory performance in predicting the COD removal, methane yield, and methane purity for anaerobic process and COD, BOD and TSS removals for aerobic process.
The R value for predicting the value of methane purity for each hidden neuron can be found in Table A7. A high R value close to 1 indicates a good accuracy of the COD removal. It can be seen that by using 12 hidden neurons, the highest R value of 0.998 can be achieved compared to other combinations. The R 2 value of 0.997 was obtained for predicting the COD removal using 12 hidden neurons and BR training algorithm (Figure 3a). The result is comparable with the neural network obtained from Dibaba et al. [49] where he obtained a R 2 = 0.906 and R 2 = 0.87 obtained from Antwi et al. [50] on the removal of COD during up flow anaerobic sludge blanket bioreactor (UASB) operation. Therefore, it can be seen from Figure 3b that the predicted value obtained from the ANN model and experimental value of COD removal has a slight deviation, proving the capability of the ANN model to predict the outcome of anaerobic digestion.
The R value for predicting the value of methane purity for each hidden neuron can be found in Table A11. The R 2 value obtained from the BR trained ANN model with 12 hidden neurons is 0.972 (Figure 4a), which is slightly lower than the R 2 = 0.985 achieved by Yu, Jaroenpoj and Griffith [51] using 5 input neurons with 8 hidden neurons. Similar to COD removal, the predicted value obtained from the ANN model and experimental value of methane purity has slight deviation only (Figure 4b).
3a). The result is comparable with the neural network obtained from Dibaba et al. [49] where he obtained a R 2 = 0.906 and R 2 = 0.87 obtained from Antwi et al. [50] on the removal of COD during up flow anaerobic sludge blanket bioreactor (UASB) operation. Therefore, it can be seen from Figure 3b that the predicted value obtained from the ANN model and experimental value of COD removal has a slight deviation, proving the capability of the ANN model to predict the outcome of anaerobic digestion.  The R value for predicting the value of methane purity for each hidden neuron can be found in Table A11. The R 2 value obtained from the BR trained ANN model with 12 hidden neurons is 0.972 (Figure 4a), which is slightly lower than the R 2 = 0.985 achieved by Yu, Jaroenpoj and Griffith [51] using 5 input neurons with 8 hidden neurons. Similar to COD removal, the predicted value obtained from the ANN model and experimental value of methane purity has slight deviation only (Figure 4b). The R value for predicting the value of methane purity for each hidden neuron can be found in Table A11. The R 2 value obtained from the BR trained ANN model with 12 hidden neurons is 0.972 (Figure 4a), which is slightly lower than the R 2 = 0.985 achieved by Yu, Jaroenpoj and Griffith [51] using 5 input neurons with 8 hidden neurons. Similar to COD removal, the predicted value obtained from the ANN model and experimental value of methane purity has slight deviation only (Figure 4b).
.  The R value for predicting the value of methane purity for each hidden neuron can be found in Table A15 (Appendix). The BR algorithm produced the best performance ANN model for predicting the methane yield with R value of 0.990, and lowest MAE, MAPE, and MSE value at 0.001, 0.002, 2.06 × 10 −5 respectively ( Figure 5). The results are better with the ANN developed by Xu, Wang and Li [52], which achieve a R value of 0.937 using 8 input neurons and 6 hidden neurons to predict methane yield from anaerobic digestion of plant biomass.  be found in Table A15 (Appendix). The BR algorithm produced the best performance ANN model for predicting the methane yield with R value of 0.990, and lowest MAE, MAPE, and MSE value at 0.001, 0.002, 2.06 × 10 −5 respectively ( Figure 5). The results are better with the ANN developed by Xu, Wang and Li [52], which achieve a R value of 0.937 using 8 input neurons and 6 hidden neurons to predict methane yield from anaerobic digestion of plant biomass. For the ANN model to predict the COD removal, BOD removal, and TSS removal of aerobic process, two of the best training algorithm based on the simulation conducted on anaerobic was chosen to train the dataset, which is BR and LM algorithm. The values of the performance criteria such as R, MAE, MAPE, and MSE value can be found in Tables A19-A21.
The overall performance of BR is better than LM in terms of R, MAE, MAPE, and MSE values across the 3 output parameters. BR training algorithm with 6 hidden neurons was chosen as it gives overall lower MAE, MAPE and MSE values at 0.495, 0.005, and 1.0288 respectively for the prediction of COD removal. As shown in Figure 6, the R 2 value  The high value of determination coefficient, R 2 value (Figure 7a) indicates that the ANN model is able to learn the non-linear relationship between the input and output parameters well. In Figure 7b, minimal deviation can be observed between the experimental BOD removal and predicted BOD removal values.  High R 2 value and minimal deviation between the predicted TSS removal and experimental TSS removal was observed in Figure 8 respectively. While all three R 2 values for COD, BOD and TSS removals are more than 0.9, this shows that ANN model can be used to predict the output parameters accurately.

Sensitivity Analysis
The connection weight approach and Garson method are used to perform the sensitivity analysis and the results are summarized in Tables 2 and 3 respectively. It can be seen from Table 2 that the inflow of biochemical oxygen demand (BODin) and organic loading rate (OLR) are the least influential parameters on the COD removal process, while the value of inlet chemical oxygen demand (CODin), inlet flowrate (Qin), and pH inlet, and inlet total suspended solid (TSS) have more influence towards the COD removal process  The R value for predicting the value of methane purity for each hidden neuron can be found in Table A15 (Appendix A). The BR algorithm produced the best performance ANN model for predicting the methane yield with R value of 0.990, and lowest MAE, MAPE, and MSE value at 0.001, 0.002, 2.06 × 10 −5 respectively ( Figure 5). The results are better with the ANN developed by Xu, Wang and Li [52], which achieve a R value of 0.937 using 8 input neurons and 6 hidden neurons to predict methane yield from anaerobic digestion of plant biomass. For the ANN model to predict the COD removal, BOD removal, and TSS removal of aerobic process, two of the best training algorithm based on the simulation conducted on anaerobic was chosen to train the dataset, which is BR and LM algorithm. The values of the performance criteria such as R, MAE, MAPE, and MSE value can be found in Tables A19-A21.
The overall performance of BR is better than LM in terms of R, MAE, MAPE, and MSE values across the 3 output parameters. BR training algorithm with 6 hidden neurons was chosen as it gives overall lower MAE, MAPE and MSE values at 0.495, 0.005 and 1.0288 respectively for the prediction of COD removal. As shown in Figure 6, the R 2 value of 0.995 indicates a good fitting between the experimental COD value and the predicted COD value.
The high value of determination coefficient, R 2 value (Figure 7a) indicates that the ANN model is able to learn the non-linear relationship between the input and output parameters well. In Figure 7b, minimal deviation can be observed between the experimental BOD removal and predicted BOD removal values.
High R 2 value and minimal deviation between the predicted TSS removal and experimental TSS removal was observed in Figure 8 respectively. While all three R 2 values for COD, BOD and TSS removals are more than 0.9, this shows that ANN model can be used to predict the output parameters accurately.

Sensitivity Analysis
The connection weight approach and Garson method are used to perform the sensitivity analysis and the results are summarized in Tables 2 and 3 respectively. It can be seen from Table 2 that the inflow of biochemical oxygen demand (BOD in ) and organic loading rate (OLR) are the least influential parameters on the COD removal process, while the value of inlet chemical oxygen demand (COD in ), inlet flowrate (Q in ), and pH inlet, and inlet total suspended solid (TSS) have more influence towards the COD removal process in anaerobic digestion. Besides, COD in is the most influential factor in determining the COD removal and the methane yield according to Table 2. The methane yield in anaerobic digestion is known to increase with increasing COD strength [53]. The input parameters strength of influence on purity of methane is as follows: Q in > COD in > pH inlet > BOD in > TSS in > OLR. As for methane yield, COD in ranked the highest in terms of relative importance. The organic waste is the source of the production of methane [54], where level of COD is high, hence COD in is one of the more important chemical properties that could impact the methane yield.
As for aerobic process, the efficiency of COD removal is most affected by mixed liquor suspended solids (MLSS) value, and least affected by F/M ratio. Basim [55] showed that increasing the quantity of MLSS would lead to higher concentration of COD in the sludge, increasing the COD removal efficiency. For output parameter of BOD removal (%), concentration of DO is the most influential parameter, while F/M ratio is the least influential parameter relatively. The amount of DO required by the aerobic microorganisms is the value of biochemical oxygen demand (BOD) [56]. Therefore, there is a strong relationship between the two parameters. Finally, the MLSS is the most influential parameter on the removal efficiency of TSS, and BOD in is the least influential parameter in aerobic process.
However, it is worth noting that the importance of TSS and pH inlet is the highest in anaerobic digestion when using the Garson method (Table 3). The deviation between the two sensitivity analysis methods can be explained by the poor estimation of Garson algorithm. According to Olden, Joy and Death [57], the connection weight approach is superior to the Garson algorithm, where the mean similarity between the estimated variable and true ranks was 92% as compared to less than 50% for Garson's algorithm. Total: 100.0

Optimum Operating Conditions
The developed ANN model for anaerobic digestion is used to create the optimal operating conditions which maximize the output parameters. Following the optimal values, the COD removal (%) increases from 68.7% to 92% as shown in Table 4. The maximum methane yield of 0.26 LCH 4 /gCOD removed was able to achieve from 0.23 LCH 4 /gCOD removed , which represent a 13.4% increase in the methane yield as shown in Table 5. The optimum operating value is shown below in Table 6, where aerobic COD removal (%) increases from 85% to 99% using the developed ANN model. The recommended operating conditions on IAAB based on the literature study are summarized in Table 7. The operating conditions based on ANN model were within the range of recommended operating condition provided by Chan et al. (2020). The purity of methane obtained from Chan et al. [58] were 63%, while ANN based operating condition were able to achieved purity of 66.4%. This indicates that higher purity of methane is possible for IAAB configuration. Finally, the most important parameter that determines the quality of the final discharge waste is the COD level. Hence, it is important to achieve maximum COD removal (%) in the aerobic process in this IAAB plant. The developed ANN model predicted the value of COD removal (%) of 99% based on the operating condition in Table 6. Similar results (85-99.6% COD removal) were achieved from literature [58].  Table 7. Recommended operating condition for anaerobic digestion and aerobic process [58].

Operating conditions Anaerobic Aerobic
OLR (g COD/L day) 0-20 0-9.5 HRT (day) 4.59-27. In summary, COD in of raw POME (the most influential parameter in anaerobic process) needs to be closely monitored and maintained in the range of 50,000-97,000 mg/L to achieve high COD removal and methane yield based on the results presented in the sensitivity analysis (Section 3.4). For aerobic process, it is essential to maintain the most influential parameters, i.e., MLSS and DO at the ranges of 37,000-40,500 mg/L and 2.0-7.3 mg/L respectively to ensure the compliance with the discharge limit. These show that the synergism created between anaerobic and aerobic processes is the key to achieve maximum biogas production and overall COD removal in the IAAB plant.

Conclusions
Two ANN models were developed to predict the effluent parameters of anaerobic digestion and aerobic processes in a pre-commercialized IAAB. A total of 6 input parameters that is Q in , OLR, COD in , BOD in , and TSSin and pH inlet were used to predict the 3 output parameters for anaerobic digestion. For aerobic process, 7 input parameters that are OLR, COD in , BOD in , TSS in , MLSS, DO and F/M ratio were used to predict 3 output parameters for aerobic process. The R value obtained for COD removal (%), purity of methane (%) and methane yield were 0.998, 0.983, and 0.990 respectively for anaerobic digestion. As for aerobic process, accurate prediction for COD removal, BOD removal and TSS removal was obtained with high R value of 0.997, 0.981, and 0.995 respectively. The ANN architecture used for anaerobic reaction was 6-12-3 and 7-6-3 for aerobic process with both using BR training algorithm and tan-sig activation function. The developed ANN models successfully predicted the properties of effluent for the IAAB plant accurately with minimal errors. The developed ANN model is used to optimize the output parameters. Under optimum operating conditions, the anaerobic COD removal (%) and methane yield were successfully improved by 33.9% and 13.4% respectively. Sensitivity analysis shows that COD inlets the most influential input parameters that affect the anaerobic COD removal (%) and methane yield. The trained ANN model can be utilized as a decision support system (DSS) for operators to predict the behavior of the IAAB system. Therefore, it will enable user to perform cost analysis using the ANN model to achieve optimum performance of the IAAB system, which can bring one step closer to successful commercialization of this IAAB technology. For further studies, input parameters such as the mixing time, reaction time, nutrients (ratio C:N:P) and concentration of microorganisms can be used to further develop the ANN model.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A.
Appendix A.1. Tables   Table A1. Input weights and biases of the ANN model for anaerobic digestion.