Definitive Screening Design and Artificial Neural Network for Modeling a Rapid Biodegradation of Date Palm Fronds by a New Trichoderma sp. PWN6 into Citric Acid

Generally, the bioconversion of lignocellulolytics into a new biomolecule is carried out through two or more steps. The current study used one-step bioprocessing of date palm fronds (DPF) into citric acid as a natural product, using a pioneer strain of Trichoderma harzianum (PWN6) that has been selected from six tested isolates based on the highest organic acid (OA) productivity (195.41 µmol/g), with the lowest amount of the released glucose. Trichoderma sp. PWN6 was morphologically and molecularly identified, and the GenBank accession number was MW78912.1. Both definitive screening design (DSD) and artificial neural network (ANN) were applied, for the first time, for modeling the bioconversion process of DPF. Although both models are capable of making accurate predictions, the ANN model outperforms the DSD model in terms of OA production, as ANN is characterized by a higher value of R2 (0.963) and validation R2 (0.967), and lower values of the RMSE (13.44), MDA (11.06), and SSE (9749.5). Citric acid was the only identified OA as was confirmed by GC-MS and UPLC, with a total of 1.5%. In conclusion, DPF together with T. harzianum PWN6 is considered an excellent new combination for citric acid biosynthesis, after modeling with artificial intelligence procedure.


Introduction
The full utilization of natural resources is of great interest in maintaining sustainable social development. That is why there has been an increasing trend toward more efficient utilization of plant residues coinciding with releasing vast amounts of such valueadded biomass into the ecosystem each year [1]. The most promising approach is the bioremediation of plant residuals into useful biomolecules.
After coconut and oil palms, date palm trees (Phoenix dactylifera), family Palmae (Arecaceae), is one of the most widespread palm species in the global agricultural industry [2]. There are nearly more than 100 million date palm trees worldwide [3], grown in tropical and subtropical regions, concentrated mainly in Egypt, Saudi Arabia, Iran, the UAE, and Algeria [4]. An average of 12-15 new date palm fronds (DPF) is formed, and therefore, the exact amount is removed as part of the maintenance of the palm [5], generating tons of DPF [6]. Although they can be converted to compost and/or used for traditional art, in most cases, they are burned, causing pollution to the environment. Alternatively, this large amount of biomass residues can be used in various biotechnological applications, such intelligent backpropagation, the ANN is trained to generate precisely the desired output model that achieves the target. Therefore, this deep learning process is hypothetically more precise and can efficiently replace the other modeling approaches [19][20][21].
Owing to the lack of knowledge and previous studies on the modeling of the biodegradation process, the current work is a trial for sharing a new perspective approach in this field. Especially when knowing that the modern DSD is rarely used in the biological sciences, in general, and, to the best of our knowledge, this is the first study in which DSD was utilized to optimize a biomolecule manufacturing process. Furthermore, despite the recent various successful biotechnological applications of ANNs, there is no knowledge on modeling the biomolecules production from DPF using ANN. The current investigation was undertaken to share some knowledge about such processes.

Results
The start point in the current study was to select concisely and precisely the suitable microorganism that can efficiently convert plant residual biomass i.e., DPF into OA in a single step.

The Biodegradability of DPF by Trichoderma spp.
Assuming equal biodegradation capabilities of DPF (null hypothesis, H 0 ) by the tested Trichoderma spp., a comparison study was initiated to explore their ability to ferment DPFbased medium into organic acids. For such a target, the profile of cellulose-degrading capacity of the investigated fungi was, firstly, differentiated (Table 1). That preliminary step was required to test the possibility of a fungus to initiate and establish a good growth on the complex residual. Date palm fronds (4 mm size) did not receive any pretreatment to simulate the natural growth conditions of the fungi. NS, non-significant; **, significant at p ≤ 0.05.
Although each Trichoderma sp. has a unique enzymatic pattern, there are some similarities between all fungal isolates, which were the ability to secrete substantial amounts of various cellulases that can degrade both cellulose (by filter-paperase (FPase), carboxymethyl cellulase (CMCase), and β-glucosidase) and hemicellulose as well (by xylanase). However, Trichoderma sp. PWN2 and PWN4 are exceptions regarding the negative CMCase production. Another similar point was the notable glucose monomers released in the hydrolysate of the fermented DPF. On the other side, there was a significant variation among fungal isolates regarding OA production, which varied from none to 195.41 µmol/g DPF; the same variation trend could be drawn to soluble phosphorus (P). Therefore, the H 0 is false and rejected, consequently, it is replaced with the accepted alternative hypothesis (H 1 ).
All these results lead to studying the nature of the relationships among all tested parameters employing a simple correlation coefficient (r). The correlation between every pair of the tested criteria was compared at probability (p) ≤ 0.05. The amount of released glucose and xylanase did not show a significant correlation with the other tested parameters. The most apparent relationship was between OA production and filtrate pH, showing significant negative r, being −0.834 at P ≤ 0.05, and between OA production and soluble P, which were positively and significantly correlated (r = 0.935 at p ≤ 0.05). Despite having a sufficient active enzymatic profile, Trichoderma sp. PWN6 showed the lowest amount of the released glucose. However, this isolate was selected as the potent OA producer to model the bioconversion process of DPF into OA.

Screening and Optimizing the Fermentation Criteria Using DSD
Seven independent factors i.e., steam-exploded biomass (SEB), DPF size, incubation time, incubation temperature, inoculation size, tricalcium phosphate (TCP), and (NH 4 ) 2 SO 4 (AS), were screened for their possible effect on OA biosynthesis by the selected Trichoderma sp. PWN6. The experiments were performed following the matrix of DSD (Table 2), assuming an equal effect of the seven tested independent variables, implying, also, that there is no association between the variables and the OA production (H 0 ). Various combinations were investigated, and the experimental results obtained from DSD were statistically analyzed to explore which factor(s) exerts a feasible effect on OA formation by the selected fungus. The results reveal apparent variation in OA response among the various runs, yet the predicted values of the OA are relatively close to the experimental values.  A Pareto chart of the standardized effects was generated ( Figure 1) to determine which independent variables contribute to the variability in the OA production. The relative magnitude and the absolute value of the standardized effects of the tested factors are figured in descending order. All the tested factors extended the farthest and surpassed the threshold of the reference line (2.01), which is the minimum threshold value of the standardized effect at the corresponding significance level (p ≤ 0.05), recording a significant effect on OA production.  For further assessment of the null hypothesis, data (Table 3) display the significant influence of the tested independent variables at p ≤ 0.05. The values of the regression coefficients for each of the tested salts were calculated and found to vary from positive to negative values. The ANOVA showed that the model term is statistically significant (p < 0.001); therefore, the association between the OA and each factor in the design was compared with the corresponding p-value of each term. Variable(s) associated with OA at p ≤ 0.05 is considered statistically significant. Again, all tested parameters have a statistically significant association with OA, contributing to 95.13% of the model variability. However, as indicated by the regression coefficient, steam-exploded biomass (SEB) and incubation temperature had a positive impact, whereas the others had a negative one. For further assessment of the null hypothesis, data (Table 3) display the significant influence of the tested independent variables at p ≤ 0.05. The values of the regression coefficients for each of the tested salts were calculated and found to vary from positive to negative values. The ANOVA showed that the model term is statistically significant (p < 0.001); therefore, the association between the OA and each factor in the design was compared with the corresponding p-value of each term. Variable(s) associated with OA at p ≤ 0.05 is considered statistically significant. Again, all tested parameters have a statistically significant association with OA, contributing to 95.13% of the model variability. However, as indicated by the regression coefficient, steam-exploded biomass (SEB) and incubation temperature had a positive impact, whereas the others had a negative one. The fitting statistics measured the aptness of the model data, in which the standard deviation was estimated to be 16.7051. The values of coefficient of determination (R 2 ) show reasonably high values, being 0.951, 0.944, and 0.933 for R 2 , adjusted-R 2 , and predicted-R 2 , respectively. Nevertheless, the H0 was rejected, and H1 was accepted since all parameters showed a significant effect; hence, they were subjected to further evaluation.
The adequacy of the assumptions of the analysis of DSD data was checked utilizing the residual analysis. Plotting the normal probability plot of the residuals (Figure 2a) shows that the residuals of OA production are normally distributed and follow approximately a straight line. The frequency of the residuals was plotted. The predicted vis residual histogram ( Figure 2b) indicates that the residuals (and hence the error terms) are distributed randomly but evenly along both 0-axis sides, with just one extreme outlier point (greater than 40). Accordingly, the regression equation in coded units was generated to be: Total Based on the experimental design of DSD and data analysis, the conclusion that can be drawn is that the operation conditions, including the seven tested independent variables, were significant and effective on OA production. So, the design matrix of DSD and their data were further modeled using the ANN. mately a straight line. The frequency of the residuals was plotted. The predicted vis residual histogram (Figure 2b) indicates that the residuals (and hence the error terms) are distributed randomly but evenly along both 0-axis sides, with just one extreme outlier point (greater than 40). Accordingly, the regression equation in coded units was generated to be: Total Based on the experimental design of DSD and data analysis, the conclusion that can be drawn is that the operation conditions, including the seven tested independent variables, were significant and effective on OA production. So, the design matrix of DSD and their data were further modeled using the ANN.

Modeling OA Biosynthesis Using ANN
The predictive ANN model was constructed using a fully connected neural network platform with a multilayer feed-forward ANN architecture to model the OA production by Trichoderma sp. PWN6. Numerous hidden neurons, ranging from 3 to 10, and various combinations of ANN-specific parameters, such as learning rate (0.1), were examined to identify the optimum architectural structure and the optimal number of neurons in the hidden layer. All nodes share the hyperbolic tangent sigmoid activation function in the hidden layer (NTanH).

Modeling OA Biosynthesis Using ANN
The predictive ANN model was constructed using a fully connected neural network platform with a multilayer feed-forward ANN architecture to model the OA production by Trichoderma sp. PWN6. Numerous hidden neurons, ranging from 3 to 10, and various combinations of ANN-specific parameters, such as learning rate (0.1), were examined to identify the optimum architectural structure and the optimal number of neurons in the hidden layer. All nodes share the hyperbolic tangent sigmoid activation function in the hidden layer (NTanH).
The ANN training was performed, employing the holdback procedure at a proportion of 0.3333, which randomly portioned DSD data into training, using 36 runs and validation using 18 runs. After several learning trials, each of 100 tours, one hidden layer with three neurons, NTanH(3), with a learning rate of 0.1, using the squared penalty method was found to have the maximum performance in modeling OA production. The ANN topology ( Figure 3) was constructed in three layers, which was designated as 7-3-1. The input layer comprised of seven neurons (SEB, DPF size (mm), incubation time (day), incubation temperature ( • C), inoculation (spore/g), TCP (mg/g), and AS (mg/g)) that is determined by the number of the examined independent factors. The output layer, with the hyperbolic tangent sigmoid activation function, has one neuron (OA, µmol/g), representing the response factor. The in-between single hidden layer performed better when using three hidden neurons.
The ANN model's generality was tested by reducing errors during training and validation. The network was trained until the R 2 was maximized. The machine learning and validation processes were performed on the constructed ANN with a trial-and-error procedure. The trained network's performance was evaluated based on the neural network's ability to anticipate outputs comparable to or extremely near the response target value, with values of R 2 = 96.04 and 96.71 for the 36 training runs and the 18 validation runs, respectively. Once the validation subset was selected, 100 tours of training were performed. The ANN predicted values of each resulting point of the DSD data were computed and shown with the anticipated DSD and experimental values ( Table 2). ANN predicted values, and their errors showed reasonable agreement with the experimental ones and showed lower residual values than those obtained by the DSD model.
To assess and explore the adequacy and fitness of the ANN training and validation process to predict OA production, the values of R 2 for the training and validation processes were found to be 0.960 and 0.967, respectively. Furthermore, the values of residual versus predicted values by the model were plotted. The plot (Figure 4a,b) depicts an even spread of the residual data above and below the 0-axis. These patterns are ideal enough to support the adequacy of the ANN model. ability to anticipate outputs comparable to or extremely near the response target value, with values of R 2 = 96.04 and 96.71 for the 36 training runs and the 18 validation runs, respectively. Once the validation subset was selected, 100 tours of training were performed. The ANN predicted values of each resulting point of the DSD data were computed and shown with the anticipated DSD and experimental values ( Table 2). ANN predicted values, and their errors showed reasonable agreement with the experimental ones and showed lower residual values than those obtained by the DSD model. To assess and explore the adequacy and fitness of the ANN training and validation process to predict OA production, the values of R 2 for the training and validation processes were found to be 0.960 and 0.967, respectively. Furthermore, the values of residual versus predicted values by the model were plotted. The plot (Figure 4a,b) depicts an even spread of the residual data above and below the 0-axis. These patterns are ideal enough to support the adequacy of the ANN model.

Fitness Comparison of DSD and ANN Models
The overall model performance by both DSD and ANN was tested and compared based on the model's ability to correctly classify the fitness of OA production by the two generated models. The statistical parameters used to assess and evaluate the accuracy of both models were calculated (Table 4). R 2 , root mean squared error (RMSE), and the mean absolute deviation (MAD) statistics were calculated for training, validation, and testing sets of DSD and ANN. Higher values of R 2 were observed for the training, validation, and testing of the ANN model compared to the DSD model. In contrast, RMSE and MAD recorded lower values. Comparing the overall performance of both models shows the same previous trend for all the tested statistics, including higher R 2 value and lower RMSE, MAD, and the sum of squares due to error (SSE) of ANN compared to DSD, concluding that the ANN model is slightly better at the overall classification statistics.

Fitness Comparison of DSD and ANN Models
The overall model performance by both DSD and ANN was tested and compared based on the model's ability to correctly classify the fitness of OA production by the two generated models. The statistical parameters used to assess and evaluate the accuracy of both models were calculated (Table 4). R 2 , root mean squared error (RMSE), and the mean absolute deviation (MAD) statistics were calculated for training, validation, and testing sets of DSD and ANN. Higher values of R 2 were observed for the training, validation, and testing of the ANN model compared to the DSD model. In contrast, RMSE and MAD recorded lower values. Comparing the overall performance of both models shows the same previous trend for all the tested statistics, including higher R 2 value and lower RMSE, MAD, and the sum of squares due to error (SSE) of ANN compared to DSD, concluding that the ANN model is slightly better at the overall classification statistics. Likewise, the predicted values by both models were plotted against the corresponding actual (experimental) values to compare the fitness of both models (Figure 5a,b). Again, the ANN model predicts substantially closer points to the line of perfect prediction than the DSD model in the linear regression study. As a result, the ANN model outperforms the DSD model in terms of generalization capability.

Experimental Validation of Both Models
The response maximization of OA production was carried out to determine the optimal combination of the tested variables. DSD and the well-learned ANN models were evaluated regarding their capacity to forecast the production of OA by Trichoderma sp. PWN6 under laboratory conditions. First, the prediction model was used to calculate the theoretical values of the seven tested variables. Figure 6 shows the pattern of every single factor while keeping the other six factors constant; as shown, the theoretical level of the seven variables was found to yield theoretical values of 421.25 and 387.09 µmol OA/g based on the prediction model of DSD and ANN, respectively. The variation in OA values among actual and both DSD and ANN is simply due to the different models used for the calculation of the predicted OA. These levels of variables and their response were validated under the laboratory to check the applicability of the model. The experimental value was 391.37 ± 1.38 µmol OA/g; this value is more obeyed and closer to that estimated by the ANN model.

Experimental Validation of Both Models
The response maximization of OA production was carried out to determine the optimal combination of the tested variables. DSD and the well-learned ANN models were evaluated regarding their capacity to forecast the production of OA by Trichoderma sp. PWN6 under laboratory conditions. First, the prediction model was used to calculate the theoretical values of the seven tested variables. Figure 6 shows the pattern of every single factor while keeping the other six factors constant; as shown, the theoretical level of the seven variables was found to yield theoretical values of 421.25 and 387.09 µmol OA/g based on the prediction model of DSD and ANN, respectively. The variation in OA values among actual and both DSD and ANN is simply due to the different models used for the calculation of the predicted OA. These levels of variables and their response were validated under the laboratory to check the applicability of the model. The experimental value was 391.37 ± 1.38 µmol OA/g; this value is more obeyed and closer to that estimated by the ANN model. mal combination of the tested variables. DSD and the well-learned ANN models were evaluated regarding their capacity to forecast the production of OA by Trichoderma sp. PWN6 under laboratory conditions. First, the prediction model was used to calculate the theoretical values of the seven tested variables. Figure 6 shows the pattern of every single factor while keeping the other six factors constant; as shown, the theoretical level of the seven variables was found to yield theoretical values of 421.25 and 387.09 µmol OA/g based on the prediction model of DSD and ANN, respectively. The variation in OA values among actual and both DSD and ANN is simply due to the different models used for the calculation of the predicted OA. These levels of variables and their response were validated under the laboratory to check the applicability of the model. The experimental value was 391.37 ± 1.38 µmol OA/g; this value is more obeyed and closer to that estimated by the ANN model.  The theoretical values of the tested parameters and the corresponding predicted OA yields, which were estimated based on DSD and ANN models.

Specification of Components Using Gas Chromatography-Mass Spectrometry (GC-MS)
Based on the ANN modeling process, the OA production was scaled up using the validated fermentation conditions reported above. The resulting hydrolysate of the fermented DPF recovered after SSF of DPF was explored for the various possible components using GC-MS. Twelve compounds were identified; the active principle, molecular formula (MF), concentration (peak area %), and retention time (RT) are represented in (Table 5 and Figures 7 and 8

Ultra Performance Liquid Chromatography (UPLC) Analysis
UPLC analysis was used to detect, identify, and quantify citric acid in the fermented DPF hydrolysate. The UPLC analysis (Figure 9) revealed an obvious accumulation titer of citric acid, being 15 mg/g DPF as the main product, which is represented by only one peak. The BLAST analysis of the fungus candidate displayed 99% similarity with the for-

Identification of the Fungal Strain
The selected Trichoderma sp. PWN6 was morphologically and molecularly identified. The morphological and microscopical investigation showed the classification of the fungus as Eukaryota; Fungi; Dikarya; Ascomycota; Pezizomycotina; Sordariomycetes; Hypocreomycetidae; Hypocreales; Hypocreaceae; T. harzianum. The selected fungal strain was further characterized by the molecular technique of the internal transcribed spacer (ITS) region. The PCR product of the amplified ITS fragment isolated from Trichoderma sp. PWN6 shows similarity to the 600 bp marker (Figure 10).
The BLAST analysis of the fungus candidate displayed 99% similarity with the formerly identified Trichoderma spp. on the Genbank. Figure 11 represents the constructed phylogenetic tree of Trichoderma sp. PWN6, which comes in line with the previous morphological identification as T. harzianum. The GenBank accession number of the present fungal strain was received as MW789612.1. The BLAST analysis of the fungus candidate displayed 99% similarity with the formerly identified Trichoderma spp. on the Genbank. Figure 11 represents the constructed phylogenetic tree of Trichoderma sp. PWN6, which comes in line with the previous morphological identification as T. harzianum. The GenBank accession number of the present fungal strain was received as MW789612.1.

Discussion
Unfortunately, DPF is one of the natural resources that, contrary to most plant residuals, was not optimally taken advantage of yet, representing a waste of the natural wealth. The importance of microbial bioprocessing of DPF arises when the target is an important molecule, having a wide array of applications in several fields, such as organic acids (OA).
The DPF is hard to degrade tissue since it is composed mainly of cellulose and hemicellulose; therefore, the study was initiated by screening the fungal isolates for their capacity to degrade rather than grow on the DPF. The biodegradation process needs the cooperation of various enzymes. All fungal isolates were positive for the most of hydrolytic enzymes (Table 1), involving three cellulases (FPase, CMCase, and β-glucosidase) and xylanase, which can hydrolyze the cellulose and hemicellulose of plant materials, respectively. In turn, this enables microorganisms to penetrate and degrade the main components of plant tissues [11].
The main targeted component is cellulose, which is catalyzed by cellulase enzymes into single units of glucose. Several kinds of enzymes and steps are involved in the biodegradation of cellulose, in which cellulases synergistically cleave the β-1,4-glucosidic bonds in the cellulose backbone, liberating the glucose monomers [22]. This monomer is Figure 11. The evolutionary tree of the ITS gene's partial sequence of Trichoderma harzianum strain PWN6 (located in the red rectangle) concerning the closely related sequences on the GenBank.

Discussion
Unfortunately, DPF is one of the natural resources that, contrary to most plant residuals, was not optimally taken advantage of yet, representing a waste of the natural wealth. The importance of microbial bioprocessing of DPF arises when the target is an important molecule, having a wide array of applications in several fields, such as organic acids (OA).
The DPF is hard to degrade tissue since it is composed mainly of cellulose and hemicellulose; therefore, the study was initiated by screening the fungal isolates for their capacity to degrade rather than grow on the DPF. The biodegradation process needs the cooperation of various enzymes. All fungal isolates were positive for the most of hydrolytic enzymes (Table 1), involving three cellulases (FPase, CMCase, and β-glucosidase) and xylanase, which can hydrolyze the cellulose and hemicellulose of plant materials, respectively. In turn, this enables microorganisms to penetrate and degrade the main components of plant tissues [11].
The main targeted component is cellulose, which is catalyzed by cellulase enzymes into single units of glucose. Several kinds of enzymes and steps are involved in the biodegradation of cellulose, in which cellulases synergistically cleave the β-1,4-glucosidic bonds in the cellulose backbone, liberating the glucose monomers [22]. This monomer is a fermentable sugar for most microorganisms [23]. By the same token, xylan is hydrolyzed by the combined action of endo-1,4-β-xylanase and β-D-xylosidases, releasing pentoses, mainly, xylose [24]. These fermentable monomers are the starting point and/or the cornerstone for the biosynthesis of various molecules, including OA, by the microorganism [22][23][24].
The SSF approach was utilized to optimize medium fermentation conditions because of its increased volumetric productivity, simplicity, reduced energy needs, ease of aeration, and simulation of the natural habitat of most fungi [25].
Analysis of the hydrolysate after SSF, obviously, revealed that the liberated glucose was negatively correlated with the amount of secreted OA. This is true since the OA formation process requires and consumes the resulted glucose for the biosynthesis of OA, reducing its amount in the hydrolysate at the expense of increasing OA production [11].
Concerning P-solubilization (Table 1), the DPF-based fermentation medium was supplemented by a complex inorganic TCP at a low level, and the fungal isolates showed a variable capacity to solubilize complex phosphate. The main reason for supporting the fermentation medium with such TCP was to induce fungi to produce OA, which mainly occurs at a low level and complex form of phosphate supply. The capacity to solubilize complex phosphates is mainly attributed to the synthesis of OA, which is produced through cellulose decomposition [9]. The solubilization process takes place by the microorganism to liberate phosphorus, which is reused for the various metabolic process, including cell growth and generation of cell energy; this process is accompanied by a significant gradual reduction in pH of the resulted filtrate and is also positively correlated with the release of OA [26]. The same pattern was found to apply in the current study. As a result, the suitable microorganism (Trichoderma sp. PWN6) was elected for additional investigation, i.e., optimization of the fermentation conditions toward maximization of OA biosynthesis by the selected fungus.
The next step was to determine the operational conditions that boost the bioconversion process. The recently proposed DSD (Table 2) was applied since it is a new improved class of three-level designs that can efficiently estimate main effects and two-factor interactions as well [16,18]. To the best of our knowledge, this is the first time that DSD has been utilized in the bioprocessing of plant biomass.
The Pareto chart analysis ( Figure 1) and ANOVA of DSD (Table 3) show that the seven screened factors displayed a significant effect on OA biosynthesis by the selected Trichoderma sp. PWN6. This reflects the importance of all tested parameters in the biomass conversion process.
The overall design and the tested variables showed p < 0.05. According to the coefficients and p value, it could be concluded that a higher level of SEB and temperature, together with the lower levels of the rest of the other factors, supported the high yield of OA, which was significantly associated with changes in all seven factors.
Other goodness-of-fit statistics were measured, of which the value of R 2 was defined to ascertain the degree of variance in the experimental response OA values indicated by the factor(s). Adding factor(s) to the model leads to get bigger R 2 values, even if the factor(s) was not significant. As a result, adjusted-R 2 was employed, since it is based on the importance of the components in the model rather than their quantity. However, the higher the adjusted-R 2 , the more accurate the link between the factors and the response (OA), and hence the model fits the data effectively. Predicted-R 2 shows how effectively the model predicts responses in new tests without over-fitting. Greater predicted-R 2 values suggest the model's excellent prediction efficiency. The current model explains 93.30% of the variation in OA, indicating that the model provides a good fit for the data.
To check whether the DSD model meets the assumptions of the analysis, residual analysis was performed (Figure 2). Residuals are the disparities between the experimental (observed) and the corresponding predicted value at each data point of OA. The smaller the value of the residuals, the better the fitness of the model, and hence the accuracy of the parameter selection. Depicting the normal probability plot of the residuals shows a straight line. Patterns, other than a straight line, indicate that the model does not meet the model assumptions [20]. Unusual patterns that show a non-straight line, a point that is far away from the line, or changing slope indicate non-normality, an outlier, or an unidentified variable, respectively. Furthermore, depicting the predicted vis residual showed their normal distribution along both 0-axis sides, supporting the fitness of the model. Hence, the DSD was found to be effective in optimizing the OA production, representing a new approach in the biofermentation process. Again, the alternative hypothesis, H 1 , was accepted, and data confirmed a significant effect of the tested factors on the OA biosynthesis.
As a result of the significance of all tested factors, it was suggested that artificial intelligence could be used to model the experimental DSD data. The neural network platform employs a fully connected multilayer perceptron (an algorithm for machine learning) (Figure 3). The ANN predicts response variable(s) using a flexible function of the input variable(s). The main advantages of an ANN model are the flexibility and tendency to predict the fitted data very well. Furthermore, it can model, efficiently and accurately, different response surfaces, using enough hidden nodes and layers. ANN is capable of learning any nonlinear function. Since it can learn weights that map the relationship between inputs and outputs with the aid of the activation function, that helps the network learn the complex nonlinear relationship between input and output. It is generally true that there are intermediate layers rather than a direct path from the independent variables to the response variable; therefore, ANN can be excellent predictors when it is not necessary to describe the functional form of the response surface or the relationship between the inputs and the response(s). The function applied at the nodes of the hidden layers is called the activation function, which transforms a linear combination of the various variables [20,21,[27][28][29].
However, the suitable range of a factor is not fixed and changes according to the experimental situations. As an example, on the inoculum level in the current case, the inoculum of 4 × 10 7 spore represents the half concentration of 8 × 10 7 ; this range was found to be reasonably wide and accepted. The data analysis showed a high reasonably regression coefficient (−35.44) for such a factor, which was experimentally validated to be 3.5 × 10 7 spores.
In addition, the fermentation was carried out under a restricted nutritional medium composition, which created unusual conditions for the fungal growth. Therefore, the inoculum level varied. Moreover, under the restricted nutritional composition of DPF, a high inoculum level may direct fungus to thrive for growth only and may generate a kind of fungus disturbance regarding the growth and degradation process, which may negatively affect or rerate the fungal degradation process. Lower inoculum, on the other hand, may encourage both the growth and fungal degradation to start once inoculation takes place, and that is why ammonium sulfate was incorporated and tested; i.e., to make a balance between the growth and degradation process especially at the beginning of the fermentation process.
In opposition to the prediction model, Figure 6 generates a tendency, in which the OA level decreases as the inoculum level increases. It is already known that every model had its own mechanism of mathematical calculations; however, both DSD and ANN approached the target at the same level of inoculation at the same optimum time (9 days), which confirms each other. Another tendency for the individual factor (time in the present study), when using the prediction equation, is that the factor tendency is calculated at a constant level (center point) of the other six factors. That is why some variation occurs in the tendency of incubation time. However, the final aim was achieved and validated.
Typically, the fitted model must be checked to verify that it gives an adequate approximation to the real system. Proceeding with the examination and optimization of the response surface will likely yield inadequate or misleading findings unless the model exhibits a sufficient fit. In general, residuals are used to evaluate the model's adequacy by establishing residual values and identifying its trend. For each dataset, the residuals are shown (Figure 4) as the difference between the experimental value of OA production and its corresponding forecasted point. Checking the residuals of the ANN model prediction against the experimentally measured values of OA production shows an equal scatter of the residual data, indicating that the variance was independent of OA production, thus reinforcing the adequacy of the ANN model. Furthermore, residuals were shown to be quite minimal at all tested sites. This means that the ANN can precisely fit the real experimental data. Despite the fact that ANN modeling has made its way into various industries, there is no work on the modulation of bioconversion of DPF into OA, and this is the first work that supports this kind of modeling.
Regarding model comparison (Table 4), the ANN model was considerably better at overall classification in the prediction of OA production. The modeling capacity of a given model depends on a high R 2 value and low RMSE, MAD, and SSE values. R 2 evaluates the correlation between the response and anticipated values; therefore, a larger value (up to 1) indicates a significant connection between the two datasets. RMSE is commonly employed in regression analysis to authenticate experimental results since a lower value indicates that the data are concentrated around the line of best fit (prediction errors). MAD is another statistic that determines the average dispersion of data around the mean. A lower MAD value implies a reduced spread of data around the mean. Finally, SSE, another measure of goodness-of-fit, calculates the total deviation of the response values from their fitted values; a smaller number indicates that the model is more suited.
Given the facts mentioned earlier, both models demonstrated a high level of predictive ability. However, when the two models are compared, it becomes clear that DSD is lower in R 2 and higher in the other goodness-of-fit statistics than the ANN model. As a result, the ANN model outperforms the DSD model in terms of OA production prediction. The current conclusion is conceding with a recent study [20], which reported that the ANN model was superior to RSM, recording lower RMSE, MAD, and SSE values and higher R 2 values.
Again, another overall comparison was carried out ( Figure 5), in which the linear regression analysis between the experimental values and those predicted ones was figured. The prediction points of the ANN model are substantially closer to the line of perfect prediction than those of the DSD model. So, the ANN model outperforms the DSD model in terms of generalization capability.
However, there are some merits when modeling using DSD, of which ANN modeling consumed extended computational time through many iterative calculations. Furthermore, the structured nature of the DSD can demonstrate the contributions of each factor in the regression models, thus identifying the insignificant factor(s) that can be eliminated from the model [20,28]. Anyhow, this is not applied in our case because all tested parameters had a significant effect ( Figure 6). On the other side, ANN had high predictive precision due to its universal ability to approximate the system's nonlinearity, compared with the other models, which requires only a sole step calculation for a response surface model [20,21].
Validation of the response maximization of OA production by the two models was checked and compared regarding their predictive capability. Applying the calculated theoretical values of the seven tested variables under laboratory conditions were found to yield 391.37 ± 1.38 µmol OA/g. This value is more closely related to the speculative value of ANN (387.09) than DSD (421.25). Such a result, truly, confirms the higher accuracy and predictive ability of the ANN model than the DSD one. However, and for fairness, the DSD model still has some reasonable predictive ability.
Concerning the fermentation condition, and given the complex structure of DPF, the current study could be considered a milestone for the bioconversion of residual biomass such as DPF into citric acid in a relatively short time compared with several previous studies [8,11]. The SEB of DPF was found to facilitate the penetration of fungal hyphae into treated DPF and increase the release of nutrient contents of the biomass. The same explanation could be applied also to the particle size of DBF [30]. Both pretreatments could be used to economize and shorten the fermentation time of OA production, which was maximized after nine days only (Figure 6) in the current work, compared with several weeks in other studies [8,11].
Ammonium salts, such as AS, are simple nitrogen sources and hence required during nearly all growth stages of microorganisms; the small amount encourages the growth of the bacterium at the initial growth stage. NH 4 Cl, as an inorganic nitrogen source, was found to have a significant effect because of the simple structure; its assimilation does not need complicated biological metabolism [30]. Moreover, TCP was used as an inducer for organic acid production, this was confirmed by its significant effect during the DSD experiment. Complex phosphate solubilization by fungi is characterized by their relative ability to dissolve complex phosphates; this activity is generally related to the generation of organic acids, which are also described as end products of fungal cellulose hydrolysis [11,12].
Next to validation and the assurance of the aptness of the tested model, the hydrolysate of the fermented DPF was explored regarding the specification of major components other than CA, using GC-MS (Table 5 and Figures 7 and 8) and UPLC ( Figure 9). Both analyses of the hydrolysate of fermented DPF revealed the presence of various main components in addition to citric acid as one of the major OAs, confirming the hypothesis of the fermentation pathway, starting from using a complex phosphorus source in the fermentation medium to induce OA production passing by the DSD and ANN modeling, ending by the assurance of citric acid by UPLC, in which the obtained profile of citric acid indicates the presence of 1.5%. In this case, it could be said that the DPF is considered a good alternative and new for the fermentation substrate for citric acid biosynthesis, using Trichoderma sp. PWN6.
The resulted citric acid is an important commercial product, and its global production is mainly consumed in the food industry (70%) of the total production, which is followed by 12% in the pharmaceutical industry and 18% for other applications [11,31].
Trichoderma sp. PWN6 was morphologically and molecularly identified as T. harzianum. The molecular identification (Figures 10 and 11) came in harmony with the morphological one. For rapid and precise identification of filamentous fungi at various taxonomic levels, molecular identification techniques exhibit high accuracy, and that is why they are applied here. These methods are based on PCR amplification followed by a comparison of the gene sequence coding for 18S rRNA, during which two ITS-specific PCR primers are used. Since the fragment size of the PCR primers is consistent across many groups of fungi, nucleotide sequencing of ITS fractions is required for revealing interspecific and, in some cases, intraspecific variation [32]. Based on the well-identified sequence, the constructed phylogeny is completely annotated and shows a tight correlation with those of similar strains of T. harzianum. The ITS region could be used in barcode identification for different fungi, especially Basidiomycota [33]. Moreover, the ITS region is usually used and could be sufficient for fungal identification on the species level [34]. The ITS region is also considered to be among the markers with the fastest and highest probability of correct identifications for a comprehensive group of fungi [35].
Since the majority of the commercially produced citric acid is globally restricted to Aspergillus and Penicillium spp. [8,11,31], therefore, the new Trichoderma harzianum PWN6, reported in the current study, is considered a new candidate as a citric acid accumulator. This is due to the fungal capacity to ferment the complex substrate, DPF, due to its welldeveloped hydrolytic enzymatic system. However, a new fungal candidate that can ferment a biomass waste (DPF) on a relatively simple fermentation medium is economical enough to put such a fungal candidate on the citric acid production map.

SEB
Date palm fronds (Phoenix dactylifera of type, Zaghoul) were collected during September 2019 from the Agricultural Research Center, Cairo, Egypt (30 • 01 13.8 N and 31 • 12 15.8 E). The DPF was washed several times to remove dirt, the last was done by deionized water and dried in a shaded area; then, it was collected, cut, and milled before chemical analysis. The main chemical components were carbohydrates (19.95%), fats (2.70%), protein (10.00%), hemicellulose (7.52%), cellulose (34.55%), lignin (9.28%), and ash (6.00%). The total organic matter was 84.00%. The level of macroelements N, P, and K were 1.60, 0.12, and 3.48 mg/kg, respectively, and the microelements were Fe, Cu, and Mn, being 240.20, 1.03, and 76.73 mg/kg, respectively. The ground particles were graded, by gradient sieves, to various sizes, i.e., 2, 4, and 6 mm long, which were confirmed again by vernier caliper. A leaflet of DPF was applied to serve as solid support and substrate for OA production.
The residues of DPF were pretreated with a steam explosion procedure using an autoclave. For this process, 300 g of the biomass was pretreated by autoclave under pressure at 15 pounds per square inch (psi), the temperature was maintained at 121 • C for 15 min. Then, the under pressured-steam was allowed to suddenly discharge from the autoclave; the resultant was SEB of DPF.

Fermentation Medium
The ability of fungal isolates to ferment palm fronds was evaluated using SSF. The fermentation medium contained one gram of DPF in flasks (100 mL). The fermentation medium was supplemented with 15 mg from each TCP and (NH 4 ) 2 SO 4 . Sterilization was carried out at 121 • C for 20 min.

Fermentation Procedure
Unless otherwise specified, before the fermentation trial, the fungal inoculum was prepared freshly from 5 days-aged culture through scraping against distilled tap water to obtain an inoculum of 10 7 spore ml −1 using a hemocytometer. A known concentration of the spore suspension of each fungus was inoculated into the previous medium, which was followed by incubation at 28 • C for 7 days. The moisture content was adjusted to about 65%. After the fermentation period, the fermented matter was eluted using 10 mL of distilled water. The hydrolysate was used for biochemical analysis. The purpose was to investigate the relative importance and significance of each tested variable of fermentation conditions. To accommodate the DSD structure, seven independent variables (Table 1) were tested at two corner points (low (−1), and high (+1) levels) and one center (0) level for each factor. The range of the tested parameters was selected based on a preliminary experiment to cover the proposed range. The center points indicate experimental runs in which all factor values were set midway between low and high settings. One of the selected factors was two-level categorical i.e., SEB-DPF (L1) and no-SEB (L2), having only two levels. The other six factors were all of three-level, numeric and continuous. They were DPF size (2, 4, 6 mm long), incubation time (5, 7, and 9 day), incubation temperature (25, 30, and 35 • C), inoculation size (4, 6, and 8 × 10 7 spore/g DPF), TCP (10, 15, and 20 mg/g DPF), and AS (10, 15, and 20 mg/g DPF). Accordingly, a base matrix of 18 experimental runs was generated with three replicates each, yielding a total of 54 runs.
The SSF technique was applied for screening and optimizing the fermentation factors affecting the biosynthesis of OA. The SSF was implemented by fermenting one gram of ground DPF in a 100-mL Erlenmeyer flask, following the various combinations of fermentation design reported in the DSD matrix. The moisture of the fermented matter was kept at about 65% by moistening with sterilized tap water when needed. After the fermentation period, the fermented DPF was agitated for 30 min at 200 rev./min on a rotary shaker at room temperature with 10 volumes of distilled water containing 0.25% of brej 35 as a surfactant. The fermented biomass was then separated by filtration followed by centrifugation at 5000 rev./min for 15 min and the filtrate was assayed for OA.

ANN for Modeling OA Biosynthesis
The previous DSD matrix, along with the corresponding data, were used to feed ANN. A fully connected neural networks platform was constructed with one hidden layer; all nodes within the layer have the same hyperbolic tangent sigmoid activation function (NTanH). Experimental data obtained from the DSD matrix were employed to train the artificial neural network and create the prediction model using a fully connected multilayer perceptron algorithm. The data were portioned, randomly, into three datasets, the first for training (using 36 runs to minimize prediction error and establish neural weights, the second for validation (using 18 runs to stop ANN training and selection of the best model, with a holdback propagation of 0.3333), and the third is an external dataset used for testing ANN robustness: i.e., the final assessment of prediction capabilities. The latter dataset was not used in model selection, and it was excluded during model development and used only for final assessment. The neural network had three layers. The ANN topology was designated as 7-h-1. The input layer was composed of seven neurons (SEB, DPF size (mm), incubation time (day), incubation temperature ( • C), inoculation (spore/g), TCP (mg/g), and AS (mg/g)), and the output layer has one neuron (OA production, µmol/g)). Between the two layers, another hidden layer was constructed and tested using a number of neurons ranging from 3 to 10. The penalty method at a learning rate of 0.1 was used for fitting the model with 100 tours, using the trial-and-error procedure to train the ANN. Once the minimum values of RMSE, MAD, and SSE were reached, accompanied by the highest value of the R 2 , the predicted outputs were extremely near to the real OA production's response target value.

Colorimetrical Determinations
Assay of cellulases in the post-culture filtrate was carried out following the procedures of [11,36], with slight modification, in which the activities of FPase, CMCase, β-glucosidase, and xylanase on 1% of microcrystalline cellulose, carboxymethyl cellulose, cellobiose, and oat-spelled xylan (Sigma-Aldrich) were assayed, respectively. All substrates were individually dissolved in citrate buffer (0.05 M, pH 4.8); the reaction mixture (1 mL of the filtrate and 1 mL appropriate substrate buffer solution) was incubated at 50 • C for 60, 30, 15, and 30 min, respectively. The released reducing sugars by the enzymatic action was determined by the dinitro salicylic acid method [37]. Enzyme unit (U) is defined as the amount of enzyme required to release one µmol min-1 of glucose (FPase or CMCase or β-glucosidase) or xylose (xylanase) under the test conditions.
Glucose determination was carried out in the post-culture filtrate. The released glucose monomers due to the fermentation were determined using a glucose oxidase kit (Spainreact Co., Spain). The free soluble phosphorus released as a result of the biodegradation was measured in the post-culture filtrate [38].
The colorimetrical determination of the total OA in the filtrate was performed based on the method described by [39]. The pH of the post-culture filtrate was measured using the glass electrode pH meter (CP-501, Elmetron).

GC-MS Analysis
The samples were extracted and resuspended in 50 µL of BSTFA incubated in a Dry Block Heater at 70 • C for 30 min. GC-MS analysis of the hydrolysate of DPF was performed using the GC-MS system (Agilent Technologies) that was equipped with a gas chromatograph (7890B) and mass spectrometer detector (5977A). The GC was equipped with an HP-5MS column (30 m × 0.25 mm internal diameter and 0.25 µm film thickness). Analysis was carried out using hydrogen as the carrier gas at a flow rate of 1.0 mL/min at a split-less, injection volume of 2 µL, and the temperature was programmed to 50 • C for 1 min, with a rising rate at 10 • C/min, up to 300 • C, and held for 20 min. The injector and detector were held at 250 • C. Mass spectra were obtained by electron ionization at 70 eV, using a spectral range of 30-700 m/z and solvent delay of 9 min. The mass temperature was 230 • C, and the Quad temperature was 150 • C. Different elements were identified by matching the spectrum fragmentation pattern to those contained in Wiley and NIST Mass Spectral Library data.

UPLC-PDA Analysis and Quantification of CA
The identification and quantification of organic acid are commonly performed by Liquid Chromatography. The organic acids chromatographic profile of the fermented DPF extract was performed using a Waters UPLC Acquity H Class (Waters, Milford, MA, USA) equipped with a quaternary pump (UPQSM), autosampler injector, and a photodiode array detector (PDA). Empower 3 software (Waters, 2010, Milford, MA, USA) was used for data acquisition and processing. Chromatographic separation of the citric acid was carried out using a Waters Acquity UPLC Spherisorb at room temperature with a linear flow rate of 0.8 mL/min and on water Atlantis T3 C18 column (4.6 × 250 mm × 5 µm) with 0.01 mol/L sulfuric acid in waters as mobile phase (isocratic elution); PDA was set at a wavelength of 220 nm.

Identification of Fungal Strain
The selected Trichoderma sp. PWN6 was morphologically and molecularly identified. The complete morphological characterization was done on the selected fungus by observing the growth character of the fungus on agar plates. The measurements and examinations of the morphological structures and vegetative mycelia were investigated under a light microscope by mounting portions of fungal growth in a lactophenol cotton blue stain on clean slides [40][41][42].
The molecular identification protocol was performed. The polymerase chain reaction (PCR) amplification was performed in a total volume of 50 uL, containing 1x reaction buffer, 1.5 mM MgCl2, 1U Taq DNA polymerase (Promega), 2.5mM dNTPs, 30 picomoles of each primer (ITS-1 F (5 -TCCGTAGGTGAACCTGCGG-3 ) and ITS4 R (5 -TCCTCCGCTTATTGATATGC-3 )), and 30 ng genomic DNA. Thermo-cycling PCR program for PCR amplification was performed in a Perkin-Elmer/GeneAmp ® PCR System 9700 (PE Applied Biosystems) programmed to fulfill 40 cycles after an initial denaturation cycle for 5 min at 94 • C. Each cycle consisted of a denaturation step at 94 • C for 30 s, an annealing step at 45 • C for 30 s, and an elongation step at 72 • C for 1 min. The primer extension segment was extended to 7 min at 72 • C in the final cycle. The amplified product of the PCR was resolved by electrophoresis in a 1.5% agarose gel containing ethidium bromide (0.5 µg/mL) in 1x TBE buffer at 95 volts. A 100 bp DNA ladder was used as a molecular size standard. PCR products were visualized on UV light and photographed using a Gel Documentation System (BIO-RAD 2000). The amplified product was purified using EZ-10 spin column PCR products; the purification PCR reaction mixture was transferred to a 1.5 mL microfuge tube, and three volumes were added of binding buffer 1. After that, the mixture solution was transferred to the EZ-10 column and left to stand at room temperature for 2 min. After that centrifuge, 750 uL of wash solution was added to the column and centrifuged at 10,000 rpm for two minutes, after which repeated washing at 10,000 rpm was conducted for an additional minute to remove any residual wash solution.
The column was transferred into a clean 1.5 mL microfuge tube, and 50 uL of elution buffer was added, incubated at room temperature for 2 min, and purified DNA was stored at −20 • C.
The ITS sequencing analysis of the PCR product was carried through in an automatic sequencer ABI PRISM 3730XL Analyzer using Big Dye TM Terminator Cycle Sequencing Kits, following the protocols supplied by the manufacturer. Single-pass sequencing was performed on each template using Rbcl Forward primer. The fluorescent-labeled fragments were purified from the unincorporated terminators with an ethanol precipitation protocol. The samples were resuspended in distilled water and subjected to electrophoresis in an ABI 3730xl sequencer (Microgen Company). The ITS sequence (851 bp) was computationally analyzed using the BLASTn program (http://www.ncbi.nlm.nih.gov/BLAST). Sequences were aligned using Align Sequences Nucleotide BLAST. The obtain sequence has been deposited in GenBank to obtain the closely related fungi sequences; then, the accession number of the fungal strain was received. The evolutionary relationship was deduced using the Neighbor-Joining method. The bootstrap consensus tree was inferred from 2000 replicates. The evolutionary distances were computed using the Jukes-Cantor method and are in the units of the number of base substitutions per site. This analysis involved 17 nucleotide sequences. All ambiguous positions were removed for each sequence pair (pairwise deletion option). MEGA 10 software was used to conduct the evolutionary analyses.

Trial Design and Statistical Examination
The results of the measured biodegradability of DPF by Trichoderma spp. are expressed as mean ± SD of three biological replicates, with the aid of CoStat 6.4 software (IBM Corporation, Armonk, New York, USA). The design and statistical analysis of DSD were accomplished using the Minitab statistical analysis software package (version 19.

Conclusions
Summing up, several merits could be extracted from the current study. Firstly, the majority of the previous studies dealt with the remediation of plant biomass through two sequenced steps: initiating by the saccharification of plant biomass into single monomers, then fermenting the monomers into the target molecule. This study, on the other side, suggests a single-step bioconversion of plant biomass into citric acid in a relatively short time (9 days) in relation to the complex structure of the DPF. Secondly, this is the first study that uses both DSD together with ANN for modeling the bioconversion process of residual biomass into a valuable biomolecule i.e., citric acid, using a new Trichoderma candidate. Therefore, the current study could be considered a milestone for applying artificial intelligence on biomass remediation in the upcoming studies of similar areas, representing a base for applying artificial intelligence in future research discoveries to highlight such virgin areas.