Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement

Zhang, Yukun; Ishikawa, Manabu; Koshio, Shunsuke; Yokoyama, Saichiro; Jiang, Na; Chen, Jiayi; Tong, Yiwen; Zhang, Xiaoxiao

doi:10.3390/fermentation12040191

Open AccessArticle

Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement

by

Yukun Zhang

^1,2,*

,

Manabu Ishikawa

¹

,

Shunsuke Koshio

¹,

Saichiro Yokoyama

¹,

Na Jiang

^1,3,

Jiayi Chen

²

,

Yiwen Tong

⁴ and

Xiaoxiao Zhang

^2,5,*

¹

Laboratory of Aquatic Animal Nutrition, Faculty of Fisheries, Kagoshima University, Kagoshima 890-0056, Japan

²

Hunan Provincial Key Laboratory of the Traditional Chinese Medicine Agricultural Biogenomics, Changsha Medical University, Changsha 410219, China

³

Fisheries Science Institute, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100068, China

⁴

College of Fisheries, Huazhong Agricultural University, Wuhan 430070, China

⁵

Faculty of Agriculture, Kagoshima University, Kagoshima 890-8580, Japan

^*

Authors to whom correspondence should be addressed.

Fermentation 2026, 12(4), 191; https://doi.org/10.3390/fermentation12040191

Submission received: 26 February 2026 / Revised: 27 March 2026 / Accepted: 7 April 2026 / Published: 9 April 2026

(This article belongs to the Special Issue Fermentation Strategies to Enhance Feed Nutritional Value and Optimize Industry Resources)

Download

Browse Figures

Versions Notes

Abstract

To address the challenge of depleting traditional feed resources, this study aimed to biovalorize sweet potato waste (SPW), a major byproduct of the Japanese shochu industry, into a high-value functional animal feed. An innovative two-stage solid-state fermentation (SSF) was employed, featuring an initial aerobic stage with Aspergillus oryzae for substrate degradation, followed by an anaerobic stage with Lactobacillus plantarum for nutritional enhancement. To optimize this complex, multi-variable process, the predictive performance of Artificial Neural Network (ANN) and Random Forest (RF) machine learning models was compared based on an augmented experimental dataset (N = 80). To ensure statistical robustness and prevent data leakage, a repeated k-fold cross-validation strategy was implemented. The RF model demonstrated significantly superior accuracy and reliability than the ANN model, particularly in predicting the primary metric, crude protein (R² = 0.61 ± 0.04 vs. R² = 0.12 ± 0.15). Subsequently, the validated RF model was integrated with a Constrained Differential Evolution (CDE) algorithm for global parameter optimization. The optimized process was predicted to yield a final product with a crude protein content of 25.0%, alongside significant increases of 114.1% in total amino acids and 123.9% in essential amino acids. These projections were experimentally validated in vitro, confirming the model’s accuracy with a relative error of less than 5%. Furthermore, comprehensive biochemical assays demonstrated a massive degradation of anti-nutritional factors and significant enhancements in total phenolic content and antioxidant activity. This study provides a scientifically validated, data-driven framework for the valorization of SPW. It confirms the superior efficacy of ensemble learning methods for optimizing complex bioprocesses with limited data, offering a contribution to the development of a circular bioeconomy and sustainable feed resources.

Keywords:

sweet potato waste; solid-state fermentation; Aspergillus oryzae; Lactobacillus plantarum; machine learning; Artificial Neural Network; random forest; process optimization; nutritional enhancement

1. Introduction

The projected growth of the global population and the simultaneous increase in demand for animal protein place significant pressure on traditional animal feed resources [1]. The aquaculture industry, an essential part of global food security, still relies heavily on fishmeal as a primary protein source. This situation is highly unsustainable due to declining marine pelagic fish stocks, extreme price fluctuations, and serious environmental impacts [2,3]. This issue has led to a worldwide search for alternative, sustainable, and affordable protein ingredients for animal feeds by utilizing agro-industrial by-products [4,5].

Japan produces large amounts of organic waste with significant potential for bioconversion [6,7]. In Kagoshima Prefecture, the heart of Japan’s shochu industry, about 300 kilotons of sweet potatoes are processed each year. This industrial activity generates over 4 kilotons of sweet potato waste (SPW) annually, mainly consisting of peels, rejected residues, and spilled tuber pieces (Figure 1). SPW is a residual material rich in carbohydrates and bioactive compounds like polyphenols, making it an excellent substrate for biotechnological applications [8]. While some of this waste is used as unprocessed feed by local farms, most of it is disposed of as industrial waste, posing environmental challenges and missing economic opportunities. These residues, abundant in carbohydrates and bioactive compounds such as polyphenols, are well-suited for biotechnological upgrading into high-value feed ingredients [9].

However, using sweet potato peel residue directly in high-performance animal diets, especially for monogastric species, is heavily limited by its natural biochemical qualities. Nutritional details are summarized in Table 1. As shown, the raw material is typical lignocellulosic biomass, characterized by a low crude protein content (about 6.0% on a dry matter basis) and a high amount of complex carbohydrates, including starch and dietary fiber [4,5]. Importantly, it contains high levels of anti-nutritional factors (ANFs), mainly trypsin inhibitors. Sporamin, the primary storage protein in sweet potatoes, functions as a strong trypsin inhibitor, greatly hindering proteolytic digestion in monogastric animals [10,11,12]. Therefore, an effective valorization strategy must address three main challenges: increasing protein content, breaking down the tough lignocellulosic matrix, and removing ANFs.

Solid-state fermentation (SSF) is a highly effective and cost-efficient method for upgrading agro-industrial waste [13,14]. However, the complex and resistant nature of sweet potato peel requires an advanced fermentation approach that goes beyond using a single microorganism. A two-stage sequential fermentation process employing Aspergillus oryzae (A. oryzae) and Lactobacillus plantarum (L. plantarum) offers a biochemically synergistic pathway, allowing for sequential lignocellulosic breakdown followed by rapid microbial biosynthesis [15,16,17,18]. Aspergillus oryzae, a GRAS (Generally Recognized as Safe) fungus essential to the shochu industry, is used in the first aerobic stage because of its extensive secretion of hydrolytic enzymes (such as amylases, cellulases, and proteases) that decompose the lignocellulosic matrix and convert complex carbohydrates into sugars [15,19]. This initial hydrolysis produces a nutrient-rich pool of simple sugars and free amino acids for the second anaerobic stage, where Lactobacillus plantarum, a potent lactic acid bacterium, quickly ferments the released sugars. This second step is vital for further breaking down anti-nutritional factors (ANFs), lowering the substrate pH to prevent spoilage, and introducing beneficial probiotic microorganisms to the final product [20,21].

Table 1. Projected comprehensive nutritional composition of sweet potato peel residue (dry matter (DM) basis).

Category	Component	Content Range (DM Basis)	Key Notes & Challenges	References
Proximate Analysis	Crude Protein	5.6–8.8%	Low content, primarily composed of anti-nutritional protein (sporamin).	[12,22,23]
	Crude Fat/Ether Extract	1.7–4.5%	Low content, but can be transformed via fungal fermentation.	[22,23]
	Crude Fiber	2.0–5.0%	Traditional measure; actual dietary fiber is much higher, limiting digestion.	[22,23]
	Ash	4.0–7.0%	Indicates a rich source of minerals.	[22,23,24]
Carbohydrate Profile	Starch	19.4–52.0%	Key energy source, but physically encapsulated by the cell wall.	[4,22,25]
	Total Dietary Fiber	>21% (up to 60%)	Includes cellulose, hemicellulose, and lignin; a key target for degradation.	[26]
	Cellulose	~20.2–31.2%	Structural polysaccharide requiring cellulase for degradation.	[22,26]
	Hemicellulose	~11.4–15.2%	Structural polysaccharide requiring hemicellulase for degradation.	[4,22,25]
	of which: Lignin	~10.0–16.9%	Recalcitrant phenolic polymer, a major barrier to enzymatic hydrolysis.	[27,28]
Anti-Nutritional Factors	Trypsin Inhibitors	High	Core limiting factor; severely inhibits protein digestion.	[12,25,26]
	Phytate	~755 mg/100 g	Chelates minerals, reducing their bioavailability.	[22]
	Tannins	~533 mg/100 g	Reduces protein digestibility and affects palatability.	[22]

DM: dry matter.

Optimizing a biological system governed by highly non-linear, multi-variable interactions remains a significant challenge [20,29,30]. Traditional statistical techniques, such as the Response Surface Methodology (RSM) utilizing a Central Composite Design, depend on second-order polynomial equations that often fail to capture the extreme non-linearities and local optima inherent in dynamic microbial ecosystems [31,32]. To address this, advanced machine learning architectures like Artificial Neural Networks (ANNs) and Random Forest (RF) algorithms paired with heuristic optimization methods are increasingly preferred [33,34,35]. ANNs excel at modeling complex, non-linear relationships in high-dimensional datasets, while Differential Evolution (DE) algorithms efficiently explore extensive parameter spaces to identify global optima [18,19,20]. However, bioprocessing datasets often face the “small N, large P” problem, where the number of experimental samples (N) is too small compared to the high-dimensional complexity of the parameters (P). While gradient-based deep learning methods within ANNs are highly effective at modeling complex relationships, they need large amounts of data to avoid catastrophic overfitting [36]. On the other hand, ensemble learning techniques like RF use bootstrap aggregating (bagging) across decision trees, enabling them to handle small sample sizes effectively, reduce individual biases, and lower overall model variance [37].

This optimized fermentation is expected to create an ideal thermodynamic and biochemical environment for maximum nutritional enrichment. We anticipate that crude protein levels will increase significantly, fiber will decrease, and the levels of key anti-nutritional factors will be almost eliminated through enzymatic degradation [10,12,38,39]. The final product is expected to be a synbiotic matrix consisting of probiotics (living L. plantarum), prebiotics (partially hydrolyzed dietary fibers that support gut health), and postbiotics (bioactive metabolites produced during fermentation, such as organic acids, enzymes, and peptides). This combination is expected to improve nutrient utilization and biochemical quality [40,41,42,43,44].

Therefore, this study is based on the scientific hypothesis that a two-stage, co-culture solid-state fermentation of sweet potato peel waste can be modeled and optimized using an ANN or RF algorithm to produce a high-quality, nutritionally enhanced ingredient. The main objectives are: (1) to systematically gather experimental data on the fermentation process using a multi-variable design; (2) to develop and validate a predictive ANN or RF model for fermentation outcomes with robust cross-validation techniques; and (3) to utilize a Constrained Differential Evolution (CDE) algorithm to find the optimal process parameters that maximize nutritional value and functionality, followed by laboratory validation. The innovative aspect of this study lies in combining an Augmented Experimental Design (AED) with a sequential bioprocess, enabling strong ensemble learning on a highly resistant agricultural waste stream.

2. Materials and Methods

2.1. Substrate and Chemicals

Sweet potato waste (SPW), mainly consisting of peels and unused pieces of tuberous roots (Ipomoea batatas L.), was sourced from a local shochu distillery in Kagoshima, Japan. The raw material had a moisture content of 77% and a crude protein content of 5.95 ± 0.12% on a dry matter (DM) basis, as measured by the Nutritional Biochemistry and Feed Chemistry Laboratory at the Faculty of Agriculture, Kagoshima University (Kagoshima, Japan).

Ammonium sulfate ((NH₄)₂SO₄) of analytical grade was obtained from Wako Pure Chemical Industries, Ltd. (Osaka, Japan) and used as the supplementary nitrogen source. Potato Dextrose Agar was obtained from Nissui Pharmaceutical Co., Ltd. (Tokyo, Japan), Polysorbate 80 (Tween 80), o-phthaldialdehyde (OPA), BAPNA Substrate, Folin-Ciocalteu Reagent, DPPH Radical, Gallic Acid/Tannic Acid were obtained from Nacalai Tesque, Inc. (Kyoto, Japan), and all other chemicals and reagents employed in this study were of analytical or HPLC grade.

2.2. Substrate Pre-Treatment

The SPW was dried in a forced-air oven at 60 °C for 8 h to reduce moisture content to about 10–12% for stable storage. The dried SPW was then ground into a fine powder using a pulverizer (DD-2-3.7, Makino MFG Co., Tokyo, Japan) and sifted through a 40-mesh sieve. For each fermentation run, the ground SPW powder underwent high-pressure steam pre-treatment to break down the lignocellulosic structure and improve enzymatic accessibility. The powder was autoclaved at 103.4 kPa (around 121 °C) for 20 min and then cooled to room temperature [45].

2.3. Microorganisms and Inoculum Preparation

2.3.1. Microbial Strains

Aspergillus oryzae (strain RIB 40) was obtained from the NITE Biological Resource Center (Chiba, Japan). Lactobacillus plantarum (strain JCM 1149) was procured from the Japan Collection of Microorganisms, RIKEN BioResource Research Center (Tsukuba, Japan).

2.3.2. Starter Culture Preparation

The A. oryzae starter culture was prepared by inoculating the strain onto Potato Dextrose Agar (PDA) slants (Nissui Pharmaceutical Co., Ltd., Tokyo, Japan) and incubating at 30 °C for 5–7 days until abundant sporulation was observed. Spores were harvested by washing the slant surface with 10 mL of sterile 0.1% (v/v) Tween 80 solution. The resulting spore suspension was filtered through sterile glass wool, and the spore concentration was determined using a hemocytometer (Erma Inc., Tokyo, Japan) and adjusted to 1 × 10⁸ spores/mL with sterile distilled water [20].

The L. plantarum inoculum was prepared by transferring a loopful of the stock culture into 50 mL of De Man, Rogosa, and Sharpe (MRS) broth (Becton, Dickinson and Company, Franklin Lakes, NJ, USA). The culture was incubated anaerobically at 37 °C for 24 h. We then placed this activated culture in fresh MRS broth and incubated it for 18 h under the same conditions to ensure the cells were in the late logarithmic growth phase. We used centrifugation (5000× g, 10 min, 4 °C) to collect the bacterial cells, washed them twice with sterile physiological saline, and then resuspended them in saline to a final concentration of 1 × 10¹⁰ Colony Forming Units (CFU)/mL. The standard plate count method on MRS agar was used to check the number of living cells.

2.4. Two-Stage Solid-State Fermentation (SSF) Process

The fermentation was carried out in two steps using 500 mL erlenmeyer flasks (AGC Techno Glass Co., Ltd., Shizuoka, Japan) with 50 g (dry matter basis) of the pre-treated SPW powder.

Stage 1: Aerobic Fungal Fermentation: The initial moisture level of the substrate (ranging continuously between 55.0% and 80.0%) was strictly controlled. The exact volume of aqueous A. oryzae inoculum, the volumetric solution of the nitrogen supplement (ammonium sulfate), and additional sterile distilled water were mathematically balanced using mass-balance equations against the known dry matter weight to achieve the precise target moisture percentage. The flasks were sealed with breathable cotton plugs and incubated aerobically under the conditions described in Table 2.

Stage 2: Anaerobic Bacterial Fermentation: A true sequential co-culture approach was used. No intermediate sterilization was done before bacterial inoculation in Stage 2, allowing the primary fungal strain to stay biologically active until it was naturally inhibited by the rapidly decreasing pH and oxygen depletion [46]. The substrate was inoculated with the specific volumetric L. plantarum suspension needed to reach the target CFU/g, sealed to create anaerobic conditions, and incubated (Table 2).

2.5. Modeling and Optimization Strategy

A comparative study using two main machine learning (ML) algorithms, the Artificial Neural Network (ANN) and the Random Forest (RF), was carried out to model and optimize the complex, non-linear fermentation process. The aim was to determine which model could most accurately predict the five main fermentation outcomes (Table 3) using the eight input process variables listed in Table 2.

Both models were developed using Python (v3.10) with the scikit-learn library [47]. The model with the highest predictive accuracy on unseen test data was then selected as the objective function for a global optimization algorithm.

2.5.1. Data Preprocessing and Leakage Prevention

The experimental dataset (N = 80) was loaded into a pandas (version 2.3.3) DataFrame [48]. To strictly prevent data leakage, all preprocessing steps were applied only to the training subset within each cross-validation fold and then used on the test subset.

Log-Transformation: To normalize the distributions of the input variables (A. oryzae Inoculum, L. plantarum Inoculum) and the output variable (Viable LAB Count) more normally, we log10-transformed them.

Data Partitioning: The complete dataset was evaluated using a repeated k-fold cross-validation strategy (5-fold, repeated 3 times with different random seeds) to ensure statistical robustness and stability in model comparisons.

Feature Scaling: StandardScaler was used to standardize the input and output features for the ANN model only (mean = 0, std = 1). This step is essential for the ANN model to perform well, but is not necessary for the RF model [47,49,50].

2.5.2. Model 1: Artificial Neural Network (ANN)

Scikit-learn’s MLPRegressor (Multi-layer Perceptron) was used to develop the ANN model.

Architecture: Five separate 8-input-1-output (MISO) models were trained for each of the five responses, rather than a single 8-input-5-output (MIMO) model. This prevents gradient vanishing or exploding caused by output variables with different scales. Each network consists of an input layer (8 neurons), two hidden layers (with 12 and 6 neurons), and an output layer (1 neuron) (an 8-12-6-1 structure).

Training: The ‘adam’ solver was used with a high L2 regularization (alpha = 0.1) and a learning rate of 0.001 [51]. This strong regularization was a deliberate choice to force the model to find only the most generalizable patterns and combat the severe overfitting typically seen when training ANNs on small datasets. A limited hyperparameter grid search confirmed that the fundamental lack of data density (N = 80) for an 8-dimensional space caused the network to fail regardless of tuning.

2.5.3. Model 2: Random Forest (RF)

We used the RandomForestRegressor from scikit-learn to make the comparative RF model [47,52].

Architecture & Theory: RF is an ensemble largetitude of learning that builds a number of decision trees (n_estimators = 100) during training [53]. RF creates 100 “weak” trees, each trained on a different random subset of the data (“bagging”).

Prediction and Uncertainty: The final prediction is the average of all 100 trees’ predictions. This averaging mitigates “noise” and errors from individual trees, enabling the model to capture complex, non-linear relationships without overfitting. Furthermore, predictive uncertainty at any given coordinate in the parameter space was estimated by calculating the standard deviation (SD) of predictions across all 100 individual trees, thereby generating visual confidence intervals [54].

Training: For a direct performance comparison, the model was trained on the same k-fold cross-validation splits as the ANN. To get the best predictive accuracy, key hyperparameters (max depth = 5, min samples leaf = 2) were tuned.

2.5.4. Optimization by Differential Evolution

Based on the comparative results (detailed in Section 3), the trained RF model was confirmed as the only reliable predictor for the non-linear responses. Therefore, the validated RF model was used as the objective function for a Constrained Differential Evolution (CDE) algorithm, a robust and dependable global optimizer implemented in scipy.Optimize library [55]. To ensure practical biological viability, the DE algorithm was programmed to maximize the expected Crude Protein (%) output, subject to strict boundary constraints: Trypsin Inhibitor Activity (TIA) ≤ 15% and Lactic Acid ≤ 30 g/kg.

2.6. Experimental Design: A Mixed Method for Machine Learning

Training and testing robust machine learning models require a comprehensive experimental dataset. Traditional Response Surface Methodology (RSM) designs, such as the Central Composite Design (CCD), work well for optimizing simple quadratic models with few experimental runs. However, their sparsity (N < 50) severely limits the effectiveness of data-intensive models like ANNs. To create a dataset that allows a fair comparison between ANN and RF, we used an Augmented Experimental Design (AED) with 80 runs. The original 50 experimental runs from the CCD provided a strong foundation, and 30 additional runs were strategically added to increase dataset density. These new runs were based on the original ranges but rounded to practical testing values (e.g., 0.5 °C increments, 1-h increments). This approach enabled a more thorough exploration of the parameter space.

2.7. Section and Preparation

At the end of each fermentation run, the fermented sweet potato waste (FSPW) was collected. A portion was immediately used for pH measurement and microbial analysis. The rest was lyophilized (freeze-dried), ground in a rotary mill to pass through a 0.2 mm mesh sieve, and stored at −20 °C for further chemical analysis. Unfermented SPW was processed in the same way to serve as the control.

2.8. Analytical Methods

2.8.1. pH and Organic Acid Analysis

The pH of the samples was measured using a digital pH meter after homogenizing 10 g of the sample with 90 mL of deionized water. The concentrations of lactic acid and acetic acid were determined by High-Performance Liquid Chromatography (HPLC) (Shimadzu Corp., Kyoto, Japan), equipped with a UV detector at 210 nm and an Aminex HPX-87H column (Bio-Rad, Hercules, CA, USA), using 5 mM H₂SO₄ as the mobile phase.

2.8.2. Nutritional Analysis

The proximate composition (dry matter, crude protein, crude fat, crude fiber, and ash) of both SPW and FSPW samples was determined using the standard procedures of AOAC International [56]. 1 Crude protein was measured via the Kjeldahl method (N × 6.25). Total amino acid profiles were analyzed with an automated amino acid analyzer (Hitachi L-8900, Hitachi, Tokyo, Japan) after acid hydrolysis with 6 M HCl at 110 °C for 24 h.

2.8.3. Degree of Protein Hydrolysis (DH)

The DH, defined as the percentage of cleaved peptide bonds, was measured using the o-phthaldialdehyde (OPA) spectrophotometric method as described in our previous research [57]. Absorbance was taken at 340 nm.

2.8.4. Anti-Nutritional Factors (ANFs) Assays

TIA was measured according to the AOCS Official Method, which is based on the inhibition of trypsin hydrolysis of Nα-Benzoyl-L-arginine 4-nitroanilide hydrochloride (BAPNA) [56]. Phytic acid content was determined spectrophotometrically based on the pink-colored complex formed between ferric iron and 2,2′-bipyridine. Total tannins were quantified using the Folin–Ciocalteu method with tannic acid as a standard.

2.8.5. Total Phenolic Content (TPC) and Antioxidant Activity

TPC was measured using the Folin–Ciocalteu reagent method, with results expressed as gallic acid equivalents (GAE). Antioxidant activity was evaluated through the DPPH (2,2-diphenyl-1-picrylhydrazyl) radical scavenging assay.

2.9. Statistical Analysis

The experimental design and RSM analysis utilized the “rsm” package in R (version 4.2.1). Implementations of ANN and RF were performed using Python (v3.10), with code available in the Supplementary Files. All experiments were conducted in triplicate, except for in vitro validation runs, which had n = 6, and data were presented as mean ± standard deviation (SD). Machine learning performance metrics were averaged across multiple k-fold validation splits, expressed as mean ± SD. Differences between treatment means were tested using a one-way ANOVA and Tukey’s test, with a significance level of p < 0.05. The performance of the ANN model was assessed through the coefficient of determination (R²) and the F-test.

3. Results and Discussion

3.1. The Response Values for Augmented Experimental Design Show the Results

The N = 80 Augmented Experimental Design (AED) response values are summarized in Table S2.

3.2. The Comparative Performance of Artificial Neural Network (ANN) and Random Forest (RF) Models

This section explores the performance difference between gradient-based ANN and ensemble-based RF algorithms when modeling small-sample, high-dimensional bioprocess data. The assessment uses rigorous k-fold cross-validation metrics instead of a single static data split.

3.2.1. Inherent Limitations of ANN and the Robustness of Ensemble Methods

The initial step in this study’s modeling phase involved using a sparse dataset (N = 50) created through a Central Composite Design (CCD) (the first 50 of 80 datasets in Tables S1 and S2). This was done to evaluate how effectively two machine learning algorithms could predict outcomes with limited data. The results from this phase clearly showed a significant performance gap between ANN and RF in the analysis of small-sample, high-dimensional bioprocess data.

The evaluation metrics in Figure 2 and Figure 3 reveal that the two models performed very differently. The ANN model failed to generalize, predicting none of the five key fermentation outcomes. Its coefficient of determination (R²) on the test set was either negative or near zero for all variables: Crude Protein (R² = −0.2803), Crude Fiber (R² = 0.1165), Trypsin Inhibitor Activity (TIA) (R² = −0.6977), Lactic Acid (R² = −0.5571), and log10 Viable LAB Count (R² = −0.6487). A negative R² value indicates that the model’s predictions are worse than simply using the mean of the training data, clearly showing that the model failed to learn meaningful patterns and its predictions are invalid.

This critical failure is also clear in the model’s training behavior, as shown by the loss curves (Figure 4). The learning paths showed a typical case of severe overfitting. As the model memorized the noise in the 50 sparse data points, the training loss dropped sharply, while the validation loss did not stabilize. This exemplifies the “curse of dimensionality,” which occurs when a model is too complex for the dataset to support. An ANN, especially the Multi-Layer Perceptron (MLP) used here, is a universal function approximator with many trainable parameters. In a high-dimensional space defined by 8 input variables, only 50 data points are far from enough to give proper constraints for training.

Conversely, the RF model showed strong predictive ability despite the limited dataset (Figure 3). It achieved very high accuracy across most response variables: Crude Fiber (R² = 0.9320), TIA (R² = 0.9960), and Lactic Acid (R² = 0.8971). For the more complex Crude Protein metric, the RF model also delivered moderate yet meaningful predictive performance (R² = 0.6413). This result not only confirmed the RF model’s robustness on small datasets but also revealed the inherent vulnerability of standard ANN architectures in such applications.

The failure of the ANN model is a common sign of a mismatch between its high complexity and the low information content of the data. An ANN, especially the Multi-Layer Perceptron (MLP) used here, is a universal function approximator with many trainable parameters (weights and biases) [58]. In a complex, multi-dimensional space defined by 8 input variables, having only 50 data points is far from enough to give proper constraints for training. This is a typical “small N, large P” problem, where the sample size is much smaller than the number of features or model parameters—a common issue in bioinformatics and bioprocess research [59]. During training, the backpropagation algorithm tries to minimize error on the training set. Still, with too few data points, the model tends to capture random noise instead of the true underlying signal. This often results in severe overfitting or failure to find a generalizable solution. Since the idea of “small data” depends on model complexity, a sample size of N = 50 is clearly below the necessary threshold for an ANN to learn effectively [60].

The RF model, however, is effective because it uses a unique ensemble learning architecture that reduces variance to combat overfitting. RF builds many decision trees and trains each on a random bootstrap sample [61]. This two-step randomization process creates a diverse set of “weak learners.” The final prediction is made by averaging all the tree predictions, which cancels out the biases and random errors of each model, lowers the model’s overall variance, and results in a strong learner that performs well on unseen data [59]. This approach allows RF to often outperform more complex single models, like ANNs or Support Vector Machines (SVMs), on small-sample, high-dimensional datasets, showing its superior applicability [62].

3.2.2. Impact of Augmented Experimental Design (N = 80) on Predictive Accuracy

To address the limitations of the small sample size, this study used an Augmented Experimental Design (AED), expanding the dataset from 50 to 80 samples. Table 4 shows the detailed 5-fold cross-validation performance metrics for both algorithms.

As shown in Table 4, increasing the dataset to 80 samples allowed the ANN to surpass a key learning threshold for highly deterministic, kinetically driven variables like Crude Fiber and Lactic Acid, with mean values of 0.85 and 0.91, respectively. However, the standard deviations of the ANN’s metrics remained notably higher than those of the RF model, indicating ongoing instability across different data folds.

More critically, for the highly stochastic and systemic optimization target—Crude Protein—the ANN remained fundamentally unable to generalize (R² = 0.12 ± 0.15). The RF model maintained strong predictive accuracy (R² = 0.61 ± 0.04) and excellent stability. Based on this detailed cross-validation analysis, the RF model was identified as the only dependable tool for global process optimization.

The corresponding loss curves (Figure 5) for the N = 80 dataset visually confirm this performance split. In stark contrast, the loss curves for Crude Protein and Viable LAB Count resembled the failed N = 50 models: a significant and persistent gap between a low training loss and a high, non-converging validation loss, confirming that even with 80 data points, the complexity of these specific biological responses was too high for the ANN to model, resulting in continued overfitting (Table 4). At the same time, the RF model maintained its high performance and stability on the N = 80 dataset.

At the same time, the RF maintained its high performance and stability on the N = 80 dataset. Its prediction accuracy for Crude Fiber (R² = 0.93 ± 0.02) and TIA (R² = 0.99 ± 0.01) remained exceptionally high, while its accuracy for Lactic Acid (R² = 0.89 ± 0.04) and Crude Protein (R² = 0.61 ± 0.04) showed consistent generalization across all folds. Its prediction accuracy for Crude Fiber (R² = 0.9463) and TIA (R² = 0.9964) slightly improved, while its accuracy for Lactic Acid (R² = 0.9029) and Crude Protein (R² = 0.6319) stayed largely the same as with N = 50.

This indicates that the RF model effectively captured the underlying variance even at N = 50, highlighting its usefulness in data-limited situations. The increased data density restored the ANN model’s ability to predict deterministic responses. The main factors causing the breakdown of crude fiber, inactivation of TIA, and production of lactic acid are enzymatic reactions and microbial primary metabolism. These involve fairly direct and deterministic kinetics. The additional 30 data points provided the ANN with enough information to narrow its large parameter space, allowing it to learn important nonlinear relationships between these variables. This demonstrates the existence of a critical data ‘threshold’ effect when using ANNs. Below this threshold, the model fails to generalize; once exceeded, learning performance improves rapidly [36].

In biotechnology research, initial experimental runs are often limited by time constraints. The first predictions from the model are very important for designing subsequent experiments and saving research and development (R&D) time and resources [62]. Consequently, researchers can map process dynamics earlier and accelerate optimization compared to relying on unstable ANN models.

3.3. In-Depth Analysis of Fermentation Response Prediction: Connecting Statistics and Biology

The RF model’s excellent performance across various metrics makes it a valuable tool for studying the underlying mechanisms of the fermentation process. This analysis clarifies the model’s mechanistic validity and emphasizes the main difficulties in predicting complex biological systems.

3.3.1. High-Precision Modeling of Digestibility and Safety

The main conclusion of this study is the exceptional accuracy of the RF model in predicting crude fiber content and trypsin inhibitor activity (TIA) (R² > 0.93). This success is not coincidental; it results from the fact that relatively deterministic, first-order kinetic processes mainly influence the variations in these two response variables.

In sweet potato waste, the primary source of TIA is its storage protein, sporamin, which is a strong trypsin inhibitor [12,25]. Thermal denaturation and enzymatic degradation are the main mechanisms by which TIA is broken down during fermentation. The high-pressure, high-temperature pretreatment and the fermentation temperature conditions in this study both cause denaturation. A. oryzae, the main organism in the first stage of aerobic fermentation, is known for its strong ability to secrete proteases [12]. These proteases can directly degrade sporamin, rendering its anti-nutritional effects completely absent [12].

The breakdown of crude fiber works in a similar way. Lignocellulose is a complex polymer made up of cellulose, hemicellulose, and lignin. Sweet potato waste is full of it [63]. A. oryzae releases a lot of cellulases and hemicellulases. These hydrolytic enzymes can break down the fiber structure effectively, turning it into simple sugars that microorganisms can use [4,63].

Therefore, there is a direct and strong connection between the model’s input variables (such as fermentation temperature, duration, A. oryzae inoculum size, and moisture content) and the degradation rates of TIA and crude fiber breakdown. Most of these are driven by physicochemical and enzymatic kinetics. This results in very accurate predictions. It is crucial to ensure that the final feed product is safe (by lowering TIA) and easy to digest (by lowering crude fiber).

3.3.2. Estimating Fermentation Efficiency and Biopreservation

In the second, anaerobic stage of fermentation, L. plantarum’s main metabolic product is lactic acid. The concentration not only indicates how well the second stage went, but it also significantly affects the quality and shelf life of the final product [64]. A high amount of lactic acid quickly lowers the pH of the substrate, which stops the growth of spoilage-causing microorganisms and acts as a natural biopreservative [65].

The RF model achieved high prediction accuracy (R² = 0.89 ± 0.04) for lactic acid concentration, indicating it effectively captured the main dynamics of the two-stage fermentation process. The final amount of lactic acid results from both stages working together. The first stage’s effectiveness of A. oryzae depends on the amount of fermentable sugar (substrate) available for the second stage, which hinges on how much fermentation has occurred. If hydrolysis in the first stage isn’t sufficient, L. plantarum won’t produce enough lactic acid in the second stage because it lacks enough “raw material.” Conversely, the amount of sugars generated in the first stage, along with the conditions of the second stage (temperature, duration, inoculum size, etc.), all cooperate to determine how quickly and how much lactic acid L. plantarum produces. The model’s ability to accurately predict lactic acid levels means it has learned and integrated the complex relationships between these two stages and linked lactic acid production to all eight input variables.

3.3.3. The Challenge of Modeling Complex Biological Outcomes

The RF model was effective at predicting degradation and primary metabolite products, but it was less accurate for more complex biological outcomes. Its prediction for crude protein content was only moderately accurate (R² = 0.61 ± 0.04), and its estimate for the Viable LAB Count was nearly incorrect (R² = 0.34 ± 0.08, with a large log-scale RMSE of 0.86 ± 0.12). These two results clearly highlight the limitations and challenges of using machine learning to model biological systems [66,67].

The difficulty in predicting crude protein content arises because it is not the direct product of a single process but the net result of simultaneous, competing dynamics: substrate breakdown, fungal and bacterial biomass synthesis, and carbon loss via respiration. Given the extreme stochasticity of these overlapping population dynamics, an R² exceeding 0.60 indicates a very strong, usable predictive signal. While the model captured overarching systemic trends, predicting the highly stochastic variations inherent in microbial growth proved challenging [68]. Furthermore, the inability to accurately predict the viable LAB count robustly highlights the fundamental limitations of applying static machine learning models to highly dynamic biological systems. Conversely, the failure to accurately predict the viable LAB count highlights a fundamental limitation: static macroscopic variables (like initial temperature) are insufficient to map highly volatile, continuous microbial population ecologies without real-time time-series data. To accurately predict microbial viability, the model would require not only the current eight macroscopic process parameters but likely also dynamic, high-frequency data describing the microbial microenvironment, such as time-series measurements of pH, and concentration profiles of specific sugars and organic acids [69]. This indicates that for highly stochastic and ecological biological indicators, modeling based solely on macroscopic process control parameters is far from sufficient.

3.4. Global Optimum Strategy and Theoretical Validation

The validated RF model was integrated with a Constrained Differential Evolution (CDE) algorithm to perform a comprehensive search of the eight-dimensional parameter space. The goal was to maximize crude protein while keeping TIA below 15% and lactic acid above 30 g/kg. The optimized parameter values do not work independently; instead, they collaborate to create an ideal thermodynamic and biochemical environment for maximum nutritional enhancement [70]. This data-driven approach identified two different sets of fermentation conditions that performed very well, revealing subtle biological strategies to optimize crude protein extraction from fermentation.

3.4.1. Global Optimum Strategy: Maximizing Substrate Preparation

The RF model trained on the N = 80 dataset was clearly the most reliable and accurate predictive tool in this study, based on the thorough performance evaluation described above. As a result, this validated RF model was used as the objective function for a CDE algorithm to search the eight-dimensional process parameter space for the best combination of fermentation conditions that maximizes the key nutritional indicator: crude protein content.

Table 5 details the optimal parameters identified by the RF-CDE algorithm, along with the theoretical predictions and actual in vitro wet-lab validation results. To robustly validate the AI framework, six independent fermentation replicates were conducted using the exact CDE-optimized parameters. The physical in vitro measurements closely matched the algorithm’s theoretical projections, with relative errors remaining well below 5% for all critical deterministic metrics. The successful achievement of 24.6% crude protein clearly demonstrates the model’s reliability. The biological reasoning behind these parameters is significant. These optimized parameters work together to create an ideal thermodynamic and biochemical environment for maximum nutritional enrichment. Although 27.5 °C is slightly below the optimal growth temperature for A. oryzae, it represents the ideal thermodynamic balance point for the sustained activity and stability of its secreted hydrolytic enzymes. This extended phase maximizes the production of readily fermentable sugars and short peptides. This approach emphasizes hydrolytic efficiency over rapid fungal biomass growth.

Furthermore, the Stage 2 parameters (37.0 °C for 58 h) align perfectly with the optimal metabolic kinetics of L. plantarum. This extended anaerobic phase ensures the complete conversion of freed sugars into lactic acid and microbial biomass. An initial moisture content of 62.0% supplies enough water activity for metabolic processes while maintaining the substrate’s porous structure. This prevents slurry formation and ensures proper oxygen transfer during the aerobic stage [71]. The model showed that too much inoculum or nitrogen supplementation did not result in a higher net increase in crude protein. The maximum net protein yield is achieved when combining a moderate nutritional environment with the optimal physicochemical conditions.

3.4.2. Temperature-Focused Optimum and Process Robustness

To demonstrate the significant effect of temperature, a response surface analysis was conducted, showing the expected protein content by changing Stage 1 (T1) and Stage 2 (T2) temperatures while keeping the other six parameters at their best levels.

The contour plot indicates that this analysis identified a slightly different temperature-based optimum, with T1 = 30.5 °C and T2 = 37.6 °C, and a predicted protein content of 23.22% (Figure 6). This approach is more targeted. The Stage 1 temperature of 30.5 °C is very close to the ideal temperature for A. oryzae to produce protease, which directly breaks down existing proteins in the [72]. L. plantarum’s metabolic rate peaks at 37.6 °C in Stage 2, accelerating its growth [73].

The contour plot (Figure 6) shows a broad, stable optimal plateau, indicating that the bioprocess is highly tolerant to slight temperature changes. This operational robustness is crucial for cost-effective industrial scaling.

In conclusion, the hybrid RF-CDE optimizer easily navigated through the highly complex high-dimensional parameter space to identify a complex, biologically viable fermentation strategy. The visual analysis of the temperature response surface not only confirms that the predicted optimum is correct but also demonstrates that the process is robust on its own, which is important for its potential use in industry and commercial success. The data-driven approach has helped us understand the system better. It enabled us to pinpoint a specific range of conditions that work together to maximize nutrient extraction from the substrate, rather than relying on general rules.

3.5. Fermentation-Driven Enhancement of the Amino Acid Profile

An increase in crude protein is a key indicator of optimal fermentation. The specific amount and balance of amino acids, especially essential fatty acids (EFAs) like isoleucine, leucine, and lysine—which fish cannot synthesize—are crucial for the health of marine finfish in their edible parts. For aquaculture species, it’s vital not only to ensure total EFA levels are sufficient but also that EFA profiles closely match the most beneficial amino acid pattern for maximum growth and nitrogen retention. For example, arginine should be about 5.0% of CR, lysine around 5.2% CR, and methionine plus cystine approximately 3.5% CR, based on comprehensive fish meta-analyses [74,75,76,77]. In this section, we examine how the amino acid profile of sweet-potato waste (SPW) changes after an optimized two-stage fermentation process, and discuss how well the resulting EAA pattern meets the needs of various marine fish species [78].

The fermentation significantly enhanced the amino acid contents of sweet potato waste, as shown in Table 6. The Total Amino Acids (TAA) increased from 84.9 mg/g to 181.8 mg/g, a substantial rise of 114.1%. The growth of Total Essential Amino Acids (EAA) was even greater, with an increase of 123.9%, leading to a more uneven growth of proteins. The nutritional value of the proteins improved by raising the EAA/TAA ratio from 0.46 to 0.48. This data provides strong evidence that fermentation not only increased the amount of protein but also greatly enhanced its quality. The combined proteolytic and biosynthetic effects of the two microbes in the optimal conditions are what led to the improvement in the nutritive value of the proteins.

Mechanistic Discussion: Synergistic Proteolysis in a Co-Culture System

Stage 1: Extensive Hydrolysis by A. oryzae

The primary reason for the observed amino acid enrichment is A. oryzae’s potent proteolytic system, known for secreting a wide range of extracellular proteases, including alkaline, neutral, and acid proteases. Protein hydrolysis occurs rapidly at various pH levels, which can occur during solid-state fermentation [79,80]. These enzymes degrade the complex, insoluble native proteins in sweet potato substrate into a large pool of small peptides and free amino acids that can dissolve in the surrounding medium [74]. Setting the Stage 1 temperature (T1) to 30.5 °C aligns with research indicating this as the optimal condition for A. oryzae to produce protease in SSF, making this “pre-digestion” phase as efficient as possible [80].

Stage 2: Nutrient Utilization and Biosynthesis by L. plantarum

L. plantarum is selective about its food; it usually requires pre-formed amino acids and short peptides to grow because it lacks the enzymes needed to break down large, complex proteins [81]. A. oryzae releases a significant amount of amino acids and peptides in Stage 1, creating an ideal environment for L. plantarum to access the nutrients it needs for growth. Additionally, certain strains of L. plantarum possess their own proteolytic systems that can further break down small peptides, leading to a substantial increase in free amino acids, particularly branched-chain amino acids (BCAAs: leucine, isoleucine, and valine) [82]. The optimal temperature for L. plantarum to grow and carry out its metabolic functions in Stage 2 is 37.6 °C, enabling it to utilize the available nutrients to produce its own biomass [83].

The entire process transforms low-value plant protein into high-value microbial protein. The final fermented product (FSPW) is a mixture of partially broken-down substrate, free amino acids, and, most importantly, the biomass of A. oryzae and L. plantarum, which are also good protein sources. Not only do the microorganisms break down the amino acids in the raw material during fermentation, but they also utilize the carbon and nitrogen from the substrate to grow and multiply, producing new microbial proteins with a better amino acid profile. Microbial proteins often contain higher levels of specific essential amino acids, such as lysine, which are usually lacking in plant proteins [84]. Therefore, the significant increase in EAA content occurs because both the release of amino acids already present and the formation of improved microbial proteins are happening. This “upgrades” the overall amino acid profile of the substrate.

3.6. Comprehensive Biochemical Quality Improvement

Ingredients used in FSPW support both nutrition and gut health, as well as help protect feed quality in various ways. FSPW has a strong amino acid profile, enhanced by fermentation, which adds beneficial attributes. For example, solid-state fermentation effectively breaks down plant materials that are poor in nutrition, such as inhibitors like trypsin inhibitors and phytic acid. This process makes the feed much easier to digest and safer overall [79,80]. The final product is rich in live L. plantarum and offers both probiotic and prebiotic benefits. Another well-known probiotic, a traditional symbiont, balances the gut microbiota, boosts immunity, and can improve the efficiency of feed conversion [81,82]. Some of the broken-down fibers in sweet potatoes also serve as prebiotic fibers, supporting beneficial gut bacteria [83]. Better Taste and Shelf Life: L. plantarum lowers the pH of the product, preventing spoilage organisms from growing and acting as a natural preservative. This acidification can also improve the taste, encouraging animals to eat more [84].

Besides macronutrient enrichment, the sequential fermentation caused significant changes in the substrate’s physicochemical and antioxidant properties, confirming its role as a functionally enhanced ingredient.

As detailed in Table 7, the vigorous enzymatic activity of A. oryzae produced a substantial 28.4% Degree of Hydrolysis, effectively breaking down the resilient protein matrices into easily absorbable peptides. At the same time, the breakdown of the lignocellulosic structure helped release bound phenolic compounds, leading to nearly a threefold increase in total phenolic content and DPPH antioxidant scavenging capacity. The significant reduction in anti-nutritional phytic acid and tannins, along with a pH drop to 3.9, clearly confirms the thorough biochemical quality improvements achieved by this optimized process.

In addition to these improvements, the proximate composition analysis showed significant changes in the crude fat and ash contents of the optimized FSPW. The crude fat content rose from 2.1% to 3.8% (Table 7). This increase is mainly due to the intense fungal growth during the initial aerobic stage. As A. oryzae actively breaks down the carbohydrate-rich sweet potato matrix, it converts these carbon sources into microbial cellular biomass, synthesizing and accumulating essential membrane lipids and intracellular fatty acids in the process [85]. Furthermore, the relative ash content grew from 5.2% to 6.5%. Since microorganisms cannot produce inorganic minerals from scratch, this is explained by the law of mass conservation, specifically as a relative concentration effect. The high metabolic activity and cellular respiration of both fungal and bacterial cultures lead to the extensive consumption of organic carbon, which is then released into the atmosphere as volatile carbon dioxide and water vapor. As a result, as the total dry matter of the fermenting substrate decreases, the proportion of non-volatile inorganic minerals (ash) passively and proportionally increases within the remaining matrix [86].

In conclusion, integrating an RF model with a CDE algorithm went beyond simple ‘black box’ optimization. Instead, it precisely mapped and optimized a very complex microbial synergy.

3.7. Broader Implications and Future Directions

This study offers strong empirical evidence for data-driven biotechnology research and development by directly comparing ANNs and RFs in modeling a complex bioprocess. The results clearly demonstrate that ensemble learning methods like Random Forest perform significantly better than traditional ANN models when only a small amount of experimental data is available, which is often due to cost and time constraints. This enables the early creation of reliable predictive models, making experimental design and process optimization more efficient [36].

Future modeling efforts should focus on predicting dynamic biological indicators like microbial viability. Incorporating time-series data for pH, organic acids, and sugar profiles can give models the temporal detail needed to monitor population changes. Additionally, exploring advanced architectures such as Gradient Boosting Machines (GBM) or Physics-Informed Neural Networks might further improve predictive accuracy and interpretability. This approach would help people better understand and manage the fermentation process.

3.8. Study Limitations and Theoretical Validity

While the in vitro empirical validation confirms the biochemical and compositional improvements suggested by the AI framework, a major limitation of the current study is the absence of in vivo biological validation. Claims regarding physiological effects, gut health improvement, and animal growth performance cannot be conclusively verified without comprehensive in vivo feeding trials and digestibility studies on target aquaculture species. Future research needs to include these functional assessments to connect in vitro biochemical quality with actual in vivo effectiveness.

A major limitation of this study is that the optimal crude protein yield of 25.0%, derived from the Differential Evolution algorithm, is a theoretical estimate from the surrogate model. Because biological systems are highly nonlinear, machine learning predictions in bioprocessing inherently risk errors when exploring parameter space outside well-trained regions [36]. Proper physical validation in the wet lab using the exact optimal conditions provided by the DE algorithm is essential to determine precise error margins and confirm real-world industrial applicability.

Furthermore, the failure of both machine learning architectures to accurately predict the viable counts of L. plantarum using solely static initial variables (R² = 0.3431 for RF) highlights a fundamental limitation in static bioprocess modeling. To accurately forecast microbial viability—a highly dynamic metric that depends on rapid growth, plateau, and death phases—integrating continuous, real-time time-series data (such as pH fluctuations and gas off-gassing measurements) through recurrent neural networks (RNNs) is essential for future research [35].

4. Conclusions

This study demonstrated a robust, data-driven framework using Random Forest algorithms and a two-stage solid-state fermentation process to improve the nutritional profile of sweet potato waste significantly. Through repeated k-fold cross-validation, we confirmed that ensemble learning greatly exceeds Artificial Neural Networks in handling the complex, sparse datasets typical of bioprocess engineering. The application of a Constrained Differential Evolution optimizer successfully identified an optimal biological strategy, which was strictly validated through wet-lab experiments. The optimized fermentation resulted in a highly digestible bio-matrix with 24.6% crude protein, a 123.9% increase in essential amino acids, and substantial reductions in anti-nutritional factors. Additionally, the fermentation process led to a comprehensive biochemical enhancement, marked by a significant increase in the degree of protein hydrolysis, total phenolic content, and antioxidant activity, along with the effective breakdown of phytic acid and tannins. Importantly, the bioprocess produced a safe and stable product by quickly lowering pH through the synergistic accumulation of lactic and acetic acids, while also increasing crude fat and ash levels. Although future in vivo trials are needed to verify physiological effects on animals, this research offers the shochu industry a scientifically solid, scalable blueprint to convert environmental waste into a functionally improved, sustainable protein resource.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fermentation12040191/s1, Table S1: The N = 80 Augmented Experimental Design (AED) Parameters; Table S2: Results of the response values for the N = 80 Augmented Experimental Design (AED).

Author Contributions

Conceptualization, Y.Z., X.Z. and M.I.; methodology, Y.Z., S.Y., J.C. and X.Z.; software, Y.Z. and X.Z.; validation, Y.Z. and X.Z.; formal analysis, Y.Z., J.C. and X.Z.; investigation, Y.Z., Y.T. and X.Z.; resources, Y.Z., X.Z. and M.I.; data curation, J.C., Y.T. and N.J.; writing—original draft preparation, Y.Z.; writing—review and editing, N.J., Y.T., X.Z. and M.I.; visualization, Y.Z. and X.Z.; supervision, M.I., S.K. and S.Y.; project administration, Y.Z. and M.I.; funding acquisition, S.K. and M.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the absence of animal- or human-related studies.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding authors.

Acknowledgments

We acknowledge the contribution of the Laboratory of Nutritional Biochemistry and Feed Chemistry, Faculty of Agriculture, Kagoshima University, Japan, for technical assistance in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AED	Augmented Experimental Design
ANFs	Anti-nutritional factors
ANN	Artificial Neural Network
AOAC	Association of Official Analytical Chemists
BAPNA	Nα-Benzoyl-L-arginine 4-nitroanilide hydrochloride
CDE	Constrained Differential Evolution
CFU	Colony-forming units
CP	Crude protein
DH	Degree of protein hydrolysis
DM	Dry matter
EAA	Essential amino acids
FAO	Food and Agriculture Organization
FSPW	Fermented sweet potato waste
GA	Genetic Algorithms
GRAS	Generally Recognized as Safe
HPLC	High-Performance Liquid Chromatography
LAB	Lactic acid bacteria
MISO	Multiple-Input Single-Output
ML	Machine learning
MSE	Mean squared error
NEAA	Non-Essential Amino Acid
OPA	o-phthaldialdehyde
RF	Random Forest
RMSE	Root mean squared error
RNNs	Recurrent neural networks
RSM	Response Surface Methodology
SPW	Sweet potato waste
SSF	Solid-state fermentation
TAA	Total Amino Acids
TEAA	Total Essential Amino Acids
TIA	Trypsin inhibitor activity

References

FAO. The State of World Fisheries and Aquaculture 2024. Blue Transformation in Action; The State of World Fisheries and Aquaculture (SOFIA); FAO: Rome, Italy, 2024. [Google Scholar]
Cashion, T.; Le Manach, F.; Zeller, D.; Pauly, D. Most Fish Destined for Fishmeal Production Are Food-Grade Fish. Fish Fish. 2017, 18, 837–844. [Google Scholar] [CrossRef]
Inoue, O. Leading Sustainability-Oriented Company Recycling Sweet Potatoes to Decarbonize. Sustainable Japan by The Japan Times, 24 March 2023.
Vannini, M.; Marchese, P.; Sisti, L.; Saccani, A.; Mu, T.; Sun, H.; Celli, A. Integrated Efforts for the Valorization of Sweet Potato By-Products within a Circular Economy Concept: Biocomposites for Packaging Applications Close the Loop. Polymers 2021, 13, 1048. [Google Scholar] [CrossRef] [PubMed]
Taoka, Y.; Nagano, N.; Kai, H.; Hayashi, M. Degradation of Distillery Lees (Shochu kasu) by Cellulase-Producing Thraustochytrids. J. Oleo Sci. 2017, 66, 31–40. [Google Scholar] [CrossRef] [PubMed][Green Version]
Zhang, X.; Zhang, Y.; Ijiri, D.; Ohtsuka, A. Evaluation of Effects of the Dry-Heat-Processed Sweet Potato Waste as Broiler Feed. Anim. Sci. J. 2019, 90, 1468–1474. [Google Scholar] [CrossRef]
Akoetey, W.; Britain, M.M.; Morawicki, R.O. Potential Use of Byproducts from Cultivation and Processing of Sweet Potatoes. Cienc. Rural 2017, 47, e20160610. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Y.; Ijiri, D.; Kanda, R.; Ohtsuka, A. Effect of Feeding a Dry-Heat-Processed Sweet Potato Waste on the Growth Performance and Meat Quality of Broilers. Nihon Danc. Chikusan Gakkaihou 2019, 62, 107–114. [Google Scholar] [CrossRef]
Wang, S.; Nie, S.; Zhu, F. Chemical Constituents and Health Effects of Sweet Potato. Food Res. Int. 2016, 89, 90–116. [Google Scholar] [CrossRef]
Abdullahi, A.; Mohammed, T.; Ndirmbita, W.; Hassan, M.; Ibrahim, Y. Proximate Composition, Mineral Content and Anti-Nutrient of Raw and Sun-Dried Sweet Potato Peel. ASUU J. Sci. 2021, 8, 34–47. [Google Scholar]
Agubosi, O.; Dumkenechukwu, I.; John, A. Evaluation of the Nutritive Value of Air-Dried and Sun-Dried Sweet Potato (Ipomoea batatas) Peels. Eur. J. Life Saf. Stab. 2022, 14, 43–51. [Google Scholar]
Maloney, K.P.; Truong, V.-D.; Allen, J.C. Susceptibility of Sweet Potato (Ipomoea batatas) Peel Proteins to Digestive Enzymes. Food Sci. Nutr. 2014, 2, 351–360. [Google Scholar] [CrossRef]
Chilakamarry, C.R.; Mimi Sakinah, A.M.; Zularisam, A.W.; Sirohi, R.; Khilji, I.A.; Ahmad, N.; Pandey, A. Advances in Solid-State Fermentation for Bioconversion of Agricultural Wastes to Value-Added Products: Opportunities and Challenges. Bioresour. Technol. 2022, 343, 126065. [Google Scholar] [CrossRef]
López, R.A. Fermentation Processes: Modeling, Optimization and Control: 2nd Edition. Fermentation 2025, 11, 408. [Google Scholar] [CrossRef]
Duncker, K.E.; Holmes, Z.A.; You, L. Engineered Microbial Consortia: Strategies and Applications. Microb. Cell Factories 2021, 20, 211. [Google Scholar] [CrossRef] [PubMed]
Kurt, F.; Leventhal, G.E.; Spalinger, M.R.; Anthamatten, L.; Rogalla von Bieberstein, P.; Menzi, C.; Reichlin, M.; Meola, M.; Rosenthal, F.; Rogler, G.; et al. Co-Cultivation Is a Powerful Approach to Produce a Robust Functionally Designed Synthetic Consortium as a Live Biotherapeutic Product (LBP). Gut Microbes 2023, 15, 2177486. [Google Scholar] [CrossRef]
Pinto, T.; Flores-Alsina, X.; Gernaey, K.V.; Junicke, H. Alone or Together? A Review on Pure and Mixed Microbial Cultures for Butanol Production. Renew. Sustain. Energy Rev. 2021, 147, 111244. [Google Scholar] [CrossRef]
Yan, N.; Luan, T.; Yin, M.; Niu, Y.; Wu, L.; Yang, S.; Li, Z.; Li, H.; Zhao, J.; Bao, X. Co-Fermentation of Glucose–Xylose–Cellobiose–XOS Mixtures Using a Synthetic Consortium of Recombinant Saccharomyces Cerevisiae Strains. Fermentation 2023, 9, 775. [Google Scholar] [CrossRef]
Jeong, J.; Kim, S. Application of Machine Learning for Quantitative Analysis of Industrial Fermentation Using Image Processing. Food Sci. Biotechnol. 2024, 34, 373–381. [Google Scholar] [CrossRef]
Chiranjeevi, P.; Pandian, M. Integration of Artificial Neural Network Modeling and Genetic Algorithm Approach for Enrichment of Laccase Production in Solid State Fermentation by Pleurotus Ostreatus. BioResources 2014, 9, 2459–2470. [Google Scholar] [CrossRef]
Du, Y.-H.; Wang, M.-Y.; Yang, L.-H.; Tong, L.-L.; Guo, D.-S.; Ji, X.-J. Optimization and Scale-Up of Fermentation Processes Driven by Models. Bioengineering 2022, 9, 473. [Google Scholar] [CrossRef]
Etim, O.; Ubiom, I.C.; Patrick, P.; Opara, C.; Akpan, M. Comparative Study on the Nutrients and Anti-Nutrients Composition of the Peels and Flesh of Sweet Potato (Ipomoea batatas L.). World J. Appl. Sci. Technol. 2024, 15, 309–315. [Google Scholar] [CrossRef]
Yisa, A.G.; Daniel, C. Proximate and phytochemical compositions of differently processed sweet potato (Ipomoea batatas) peel meal. Niger. J. Anim. Prod. 2022, 2048–2052. [Google Scholar] [CrossRef]
Daniel, I.; Yadav, S.; Karani, G.; De Zoysa, A.S.A. Application, chemical component and the biorefinery of Ipomoea batatas wastes to sugar. Zenodo 2022, 4, 2582–5208. [Google Scholar] [CrossRef]
Yeh, K.W.; Chen, J.C.; Lin, M.I.; Chen, Y.M.; Lin, C.Y. Functional Activity of Sporamin from Sweet Potato (Ipomoea batatas Lam.): A Tuber Storage Protein with Trypsin Inhibitory Activity. Plant Mol. Biol. 1997, 33, 565–570. [Google Scholar] [CrossRef] [PubMed]
Cao, Y.; Tian, B.; Zhang, Z.; Yang, K.; Cai, M.; Hu, W.; Guo, Y.; Xia, Q.; Wu, W. Positive Effects of Dietary Fiber from Sweet Potato [Ipomoea batatas (L.) Lam.] Peels by Different Extraction Methods on Human Fecal Microbiota In Vitro Fermentation. Front. Nutr. 2022, 9, 986667. [Google Scholar] [CrossRef]
Diagboya, P.N.; Odagwe, A.; Oyem, H.H.; Omoruyi, C.; Osabohien, E. Adsorptive Decolorization of Dyes in Aqueous Solution Using Magnetic Sweet Potato (Ipomoea batatas L.) Peel Waste. RSC Sustain. 2024, 2, 686–694. [Google Scholar] [CrossRef]
Lucas-González, R.; Pateiro, M.; Domínguez-Valencia, R.; Carrillo, C.; Lorenzo, J.M. Optimization of Anthocyanin Extraction from Purple Sweet Potato Peel (Ipomea batata) Using Sonotrode Ultrasound-Assisted Extraction. Foods 2025, 14, 2686. [Google Scholar] [CrossRef]
Huang, J.-C.; Guo, Q.; Li, X.-H.; Shi, T.-Q. A Comprehensive Review on the Application of Neural Network Model in Microbial Fermentation. Bioresour. Technol. 2025, 416, 131801. [Google Scholar] [CrossRef] [PubMed]
Ekpenyong, M.; Asitok, A.; Antai, S.; Ekpo, B.; Antigha, R.; Ogarekpe, N. Statistical and Artificial Neural Network Approaches to Modeling and Optimization of Fermentation Conditions for Production of a Surface/Bioactive Glyco-Lipo-Peptide. Int. J. Pept. Res. Ther. 2020, 27, 475–495. [Google Scholar] [CrossRef] [PubMed]
Okpalaeke, K.E.; Ibrahim, T.H.; Latinwo, L.M.; Betiku, E. Mathematical Modeling and Optimization Studies by Artificial Neural Network, Genetic Algorithm and Response Surface Methodology: A Case of Ferric Sulfate–Catalyzed Esterification of Neem (Azadirachta indica) Seed Oil. Front. Energy Res. 2020, 8, 614621. [Google Scholar] [CrossRef]
Hu, K.; Ding, C.; Zhou, M.; Wang, C.; Hu, B.; Chen, Y.; Wu, Q.; Feng, N. Artificial Neural Network–Genetic Algorithm to Optimize Yin Rice Inoculation Fermentation Conditions for Improving Physico-Chemical Characteristics. Food Sci. Technol. Res. 2018, 24, 729–737. [Google Scholar] [CrossRef]
Melcher, M.; Scharl, T.; Spangl, B.; Luchner, M.; Cserjan, M.; Bayer, K.; Leisch, F.; Striedner, G. The Potential of Random Forest and Neural Networks for Biomass and Recombinant Protein Modeling in Escherichia coli Fed-Batch Fermentations. Biotechnol. J. 2015, 10, 1770–1782. [Google Scholar] [CrossRef]
Malashin, I.; Martysyuk, D.; Tynchenko, V.; Gantimurov, A.; Semikolenov, A.; Nelyub, V.; Borodulin, A. Machine Learning-Based Process Optimization in Biopolymer Manufacturing: A Review. Polymers 2024, 16, 3368. [Google Scholar] [CrossRef]
Yee, C.S.; Zahia-Azizan, N.A.; Abd Rahim, M.H.; Mohd Zaini, N.A.; Raja-Razali, R.B.; Ushidee-Radzi, M.A.; Ilham, Z.; Wan-Mohtar, W.A.A.Q.I. Smart Fermentation Technologies: Microbial Process Control in Traditional Fermented Foods. Fermentation 2025, 11, 323. [Google Scholar] [CrossRef]
Helleckes, L.M.; Hemmerich, J.; Wiechert, W.; von Lieres, E.; Grünberger, A. Machine Learning in Bioprocess Development: From Promise to Practice. Trends Biotechnol. 2022, 41, 817–835. [Google Scholar] [CrossRef]
Negulescu, C.; Borangiu, T.; Răileanu, S.; Anghel, V.V. AgriMicro—A Microservices-Based Platform for Optimization of Farm Decisions. AgriEngineering 2025, 7, 299. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Q.; Li, Y.; Zhang, H. Modeling and Optimization of Photo-Fermentation Biohydrogen Production from Co-Substrates Basing on Response Surface Methodology and Artificial Neural Network Integrated Genetic Algorithm. Bioresour. Technol. 2023, 374, 128789. [Google Scholar] [CrossRef]
Ojokoh, A. Changes in Nutrient and Antinutrient Contents of Sweet Potato (Ipomea batatas) Peels Subjected to Solid Substrate Fermentation. Biosci. Biotechnol. Res. Asia 2005, 3, 17–20. [Google Scholar]
Sun, H.; Kang, X.; Tan, H.; Cai, H.; Chen, D. Progress in Fermented Unconventional Feed Application in Monogastric Animal Production in China. Fermentation 2023, 9, 947. [Google Scholar] [CrossRef]
Yue, T.; Lu, Y.; Ding, W.; Xu, B.; Zhang, C.; Li, L.; Jian, F.; Huang, S. The Role of Probiotics, Prebiotics, Synbiotics, and Postbiotics in Livestock and Poultry Gut Health: A Review. Metabolites 2025, 15, 478. [Google Scholar] [CrossRef] [PubMed]
Guo, L.; Yang, L.; Wang, K.; Liu, W.; Wang, S.; Zhang, Y.; Li, R.; Wu, Z.; Chen, C. Synbiotic Microencapsulation of Lactiplantibacillus Plantarum-Lentinan for Enhanced Growth in Broilers. AMB Expr. 2025, 15, 128. [Google Scholar] [CrossRef]
Izuddin, W.I.; Humam, A.M.; Loh, T.C.; Foo, H.L.; Samsudin, A.A. Dietary Postbiotic Lactobacillus Plantarum Improves Serum and Ruminal Antioxidant Activity and Upregulates Hepatic Antioxidant Enzymes and Ruminal Barrier Function in Post-Weaning Lambs. Antioxidants 2020, 9, 250. [Google Scholar] [CrossRef]
Kim, M.-J.; Jeon, D.-G.; Lim, Y.; Jang, I. Effects of Prebiotics in Combination with Probiotics on Intestinal Hydrolase Activity, Microbial Population and Immunological Biomarkers in SD Rats Fed an AIN-93G Diet. Lab. Anim. Res. 2022, 38, 20. [Google Scholar] [CrossRef]
Nielsen, P.M.; Petersen, D.; Dambmann, C. Improved Method for Determining Food Protein Degree of Hydrolysis. J. Food Sci. 2001, 66, 642–646. [Google Scholar] [CrossRef]
Yegin, S. Solid-State Fermentation as a Strategy for Improvement of Bioactive Properties of the Plant-Based Food Resources. Bioresour. Bioprocess. 2025, 12, 140. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar]
Importance of Feature Scaling. Available online: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html (accessed on 1 November 2025).
1.17. Neural Network Models (Supervised). Available online: https://scikit-learn.org/stable/modules/neural_networks_supervised.html (accessed on 1 November 2025).
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
RandomForestRegressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed on 1 November 2025).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
AOAC International. Official Methods of Analysis, 22nd ed.; Latimer, G.W., Jr., Ed.; Oxford University Press: Oxford, UK, 2023; ISBN 978-0-19-761013-8. [Google Scholar]
Zhang, Y.; Ishikawa, M.; Koshio, S.; Yokoyama, S.; Dossou, S.; Wang, W.; Zhang, X.; Shadrack, R.S.; Mzengereza, K.; Zhu, K.; et al. Optimization of Soybean Meal Fermentation for Aqua-Feed with Bacillus subtilis Natto Using the Response Surface Methodology. Fermentation 2021, 7, 306. [Google Scholar] [CrossRef]
Adeyemo, J.; Enitan, A. Optimization of Fermentation Processes Using Evolutionary Algorithms—A Review. Sci. Res. Essays 2011, 6, 1464–1472. [Google Scholar]
Yang, P.; Yang, Y.H.; Zhou, B.B.; Zomaya, A.Y. A Review of Ensemble Methods in Bioinformatics. Curr. Bioinform. 2010, 5, 296–308. [Google Scholar] [CrossRef]
Peng, J.; Khuat, T.T.; Musial, K.; Gabrys, B. Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review. Biotechnol. Adv. 2025, 87, 108749. [Google Scholar] [CrossRef] [PubMed]
Lan, T.; Hu, H.; Jiang, C.; Yang, G.; Zhao, Z. A Comparative Study of Decision Tree, Random Forest, and Convolutional Neural Network for Spread-F Identification. Adv. Space Res. 2020, 65, 2052–2061. [Google Scholar] [CrossRef]
Pacheco, V.L.; Bragagnolo, L.; Rosa, F.D.; Thomé, A. Optimization of Biocementation Responses by Artificial Neural Network and Random Forest in Comparison to Response Surface Methodology. Environ. Sci. Pollut. Res. 2023, 30, 61863–61887. [Google Scholar] [CrossRef]
Mei, X.; Mu, T.-H.; Han, J.-J. Composition and Physicochemical Properties of Dietary Fiber Extracted from Residues of 10 Varieties of Sweet Potato by a Sieving Method. J. Agric. Food Chem. 2010, 58, 7305–7310. [Google Scholar] [CrossRef] [PubMed]
Dong, J.; Li, S.; Chen, X.; Sun, Z.; Sun, Y.; Zhen, Y.; Qin, G.; Wang, T.; Demelash, N.; Zhang, X. Effects of Lactiplantibacillus Plantarum Inoculation on the Quality and Bacterial Community of Whole-Crop Corn Silage at Different Harvest Stages. Chem. Biol. Technol. Agric. 2022, 9, 57. [Google Scholar] [CrossRef]
Wang, Y.; Yang, Y.; Yang, X.; Huang, L.; Wang, P.; Zhao, L. Effects of Lactiplantibacillus Plantarum and Fermentation Time on the Quality, Bacterial Community, and Functional Prediction of Silage from Lotus corniculatus L. in Karst Regions. Agriculture 2025, 15, 16. [Google Scholar] [CrossRef]
Noordijk, B.; Garcia Gomez, M.L.; ten Tusscher, K.H.W.J.; de Ridder, D.; van Dijk, A.D.J.; Smith, R.W. The Rise of Scientific Machine Learning: A Perspective on Combining Mechanistic Modelling with Machine Learning for Systems Biology. Front. Syst. Biol. 2024, 4, 1407994. [Google Scholar] [CrossRef]
Whalen, S.; Schreiber, J.; Noble, W.S.; Pollard, K.S. Navigating the Pitfalls of Applying Machine Learning in Genomics. Nat. Rev. Genet. 2022, 23, 169–181. [Google Scholar] [CrossRef] [PubMed]
Dan, T.; Chen, H.; Li, T.; Tian, J.; Ren, W.; Zhang, H.; Sun, T. Influence of Lactobacillus Plantarum P-8 on Fermented Milk Flavor and Storage Stability. Front. Microbiol. 2019, 9, 3133. [Google Scholar] [CrossRef] [PubMed]
Papoutsoglou, G.; Tarazona, S.; Lopes, M.; Klammsteiner, T.; Ibrahimi, E.; Eckenberger, J.; Novielli, P.; Tonda, A.; Simeon, A.; Shigdel, R.; et al. Machine Learning Approaches in Microbiome Research: Challenges and Best Practices. Front. Microbiol. 2023, 14, 1261889. [Google Scholar] [CrossRef]
Huang, X.; Liu, H.; Zhou, Q.; Su, Q. A Surrogate-Assisted Gray Prediction Evolution Algorithm for High-Dimensional Expensive Optimization Problems. Mathematics 2025, 13, 1007. [Google Scholar] [CrossRef]
Rodríguez de Olmos, A.; Correa Deza, M.A.; Garro, M.S. Selected Lactobacilli and Bifidobacteria Development in Solid State Fermentation Using Soybean Paste. Rev. Argent. De Microbiol. Argent. J. Microbiol. 2016, 49, 62–69. [Google Scholar] [CrossRef]
Murai, T.; Annor, G.A. Improving the Nutritional Profile of Intermediate Wheatgrass by Solid-State Fermentation with Aspergillus oryzae Strains. Foods 2025, 14, 395. [Google Scholar] [CrossRef]
Chen, Q.; Liu, B.; Liu, G.; Shi, H.; Wang, J. Effect of Bacillus subtilis and Lactobacillus plantarum on Solid-state Fermentation of Soybean Meal. J. Sci. Food Agric. 2023, 103, 6070–6079. [Google Scholar] [CrossRef]
Salamanca, N.; Herrera, M.; de la Roca, E. Amino Acids as Dietary Additives for Enhancing Fish Welfare in Aquaculture. Animals 2025, 15, 1293. [Google Scholar] [CrossRef] [PubMed]
Akiyama, T.; Oohara, I.; Yamamoto, T. Comparison of Essential Amino Acid Requirements with A/E Ratio among Fish Species (Review Paper). Fish. Sci. 1997, 63, 963–970. [Google Scholar] [CrossRef]
Furuya, W.M.; da Cruz, T.P.; Gatlin, D.M. Amino Acid Requirements for Nile Tilapia: An Update. Animals 2023, 13, 900. [Google Scholar] [CrossRef]
Xing, S.; Liang, X.; Zhang, X.; Oliva-Teles, A.; Peres, H.; Li, M.; Wang, H.; Mai, K.; Kaushik, S.J.; Xue, M. Essential Amino Acid Requirements of Fish and Crustaceans, a Meta-analysis. Rev. Aquac. 2024, 16, 1069–1086. [Google Scholar] [CrossRef]
Pasaribu, T.; Soeka, Y.S.; Nurhidayat, N.; Suciatmih, S.; Yulinery, T.; Triana, E.; Sulistiyani, T.R.; Setyowati, N.; Sulistyowati, D.D.; Susilowati, D.N. Enhancing the Nutritional Profile of Purple Sweet Potato Flour through Fermentation with Lactiplantibacillus plantarum InaCC B157: A Functional Food Perspective. Vet. World 2025, 18, 1870–1880. [Google Scholar] [CrossRef]
de Souza, P.M.; Bittencourt, M.L.D.A.; Caprara, C.C.; de Freitas, M.; de Almeida, R.P.C.; Silveira, D.; Fonseca, Y.M.; Ferreira Filho, E.X.; Pessoa Junior, A.; Magalhães, P.O. A Biotechnology Perspective of Fungal Proteases. Braz. J. Microbiol. 2015, 46, 337–346. [Google Scholar] [CrossRef]
Chutmanop, J.; Chuichulcherm, S.; Chisti, Y.; Srinophakun, P. Protease Production by Aspergillus oryzae in Solid-State Fermentation Using Agroindustrial Substrates. J. Chem. Technol. Biotechnol. 2008, 83, 1012–1018. [Google Scholar] [CrossRef]
Ma, C.; Cheng, G.; Liu, Z.; Guang-Yu, G.; Chen, Z. Determination of the Essential Nutrients Required for Milk Fermentation by Lactobacillus Plantarum. LWT Food Sci. Technol. 2015, 65, 884–889. [Google Scholar] [CrossRef]
Lee, Y.; So, Y.J.; Jung, W.-H.; Kim, T.-R.; Sohn, M.; Jeong, Y.-J.; Imm, J.-Y. Lactiplantibacillus Plantarum LM1001 Improves Digestibility of Branched-Chain Amino Acids in Whey Proteins and Promotes Myogenesis in C2C12 Myotubes. Food Sci. Anim. Resour. 2024, 44, 951–965. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Hong, J.; Wang, L.; Cai, C.; Mo, H.; Wang, J.; Fang, X.; Liao, Z. Effect of Lactic Acid Bacteria Fermentation on Plant-Based Products. Fermentation 2024, 10, 48. [Google Scholar] [CrossRef]
Meng, H.; Xu, Y.; Wang, L.; Wang, J.; Wang, B.; Wu, H.; Hou, D.; Wang, S.; Tong, X.; Jiang, Y.; et al. Impact of Lactiplantibacillus Plantarum on the Fermentation Quality, Nutritional Enhancement, and Microbial Dynamics of Whole Plant Soybean Silage. Front. Microbiol. 2025, 16, 1565951. [Google Scholar] [CrossRef]
Arthur, G.D.; Stirk, W.A.; Van Staden, J. Effects of Autoclaving and Charcoal on Root-Promoting Substances Present in Water Extracts Made from Gelling Agents. Bioresour. Technol. 2006, 97, 1942–1950. [Google Scholar] [CrossRef] [PubMed]
Raimbault, M. General and Microbiological Aspects of Solid Substrate Fermentation. Electron. J. Biotechnol. 1998, 1, 174–188. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of sweet potato waste (SPW) generation and its biovalorization potential in the Japanese shochu industry. Sweet potato tubers are processed into peeled tubers for shochu production, generating significant quantities of sweet potato waste (SPW) as a byproduct. In Kagoshima Prefecture (2023), sweet potato production reached 320 kilotons, with 4–6 kilotons of SPW generated; the waste stream represents a promising substrate for biovalorization as functional animal feed via solid-state fermentation, represented here as fermented sweet potato waste (FSPW).

Figure 2. The ANN model generalized well in predicting all five key fermentation outcomes in Preliminary Modeling on a Sparse Dataset (N = 50).

Figure 3. The RF model generalized well in predicting all five key fermentation outcomes in Preliminary Modeling on a Sparse Dataset (N = 50).

Figure 4. The loss curves of the ANN model for five key fermentation outcomes of Preliminary Modeling on a Sparse Dataset (N = 50).

Figure 5. The loss curves of the ANN model for five key fermentation outcomes of Augmented Experimental Design (N = 80) on Predictive Accuracy.

Figure 6. The contour plot of the predicted optimal plateau from the random forest (RF) model shows the best dosage of crude protein. The plot includes visual confidence intervals to clearly display the statistical variation around the predicted peaks, offering a straightforward view of the model’s certainty.

Table 2. Input parameters and their ranges for the ML model.

Factor (Independent Variable)	Unit	Type	Experimental Range
Temperature (Stage 1)	°C	Continuous	20.0~40.0
Temperature (Stage 2)	°C	Continuous	25.0~45.0
Initial Moisture Content	%	Continuous	55.0~80.0
Aspergillus oryzae Inoculum	spores/g	Continuous	1.0 × 10⁴~1.0 × 10⁷
Lactobacillus plantarum Inoculum	CFU/g	Continuous	1.0 × 10⁶~1.0 × 10⁹
Nitrogen Supplement	%	Continuous	0.0~3.0
Duration Stage 1	h	Continuous	24.0~60.0
Duration Stage 2	h	Continuous	24.0~60.0

Table 3. Output parameters (responses) for the ML model.

Response (Dependent Variable)	Unit
Crude Protein	% (DM)
Crude Fiber	% (DM)
Trypsin Inhibitor Activity (TIA)	% (Remaining)
Lactic Acid	g/kg (DM)
Viable LAB Count	CFU/g

Table 4. Repeated 5-fold cross-validation metrics for ANN and RF models (N = 80).

Response Variable	Model	R²	RMSE
Crude Protein (%)	ANN	0.12 ± 0.15	4.10 ± 0.65
	RF	0.61 ± 0.04	2.65 ± 0.18
Crude Fiber (%)	ANN	0.85 ± 0.08	0.38 ± 0.06
	RF	0.93 ± 0.02	0.26 ± 0.03
TIA (Remaining %)	ANN	0.89 ± 0.07	1.70 ± 0.50
	RF	0.99 ± 0.01	0.32 ± 0.05
Lactic Acid (g/kg DM)	ANN	0.91 ± 0.05	2.15 ± 0.45
	RF	0.89 ± 0.04	2.60 ± 0.30
log10 Viable LAB Count	ANN	−0.85 ± 0.25	2.15 ± 0.42
	RF	0.34 ± 0.08	0.86 ± 0.12

RMSE: Root Mean Squared Error.

Table 5. Predicted optimal fermentation conditions, corresponding results, and in vitro validation results.

Parameter	Optimal Value	Unit
Input Conditions
Temperature (Stage 1)	27.5	°C
Temperature (Stage 2)	37.0	°C
Initial Moisture Content	62.0	%
Aspergillus oryzae Inoculum	8.60 × 10⁴	spores/g
Lactobacillus plantarum Inoculum	2.80 × 10⁶	CFU/g
Nitrogen Supplement	0.9	%
Duration Stage 1	41.0	h
Duration Stage 2	58.0	h
Outcome Validation	Predicted Value	Actual Experimental Result (n= 6)
Crude Protein (%)	25.0	24.6 ± 0.4
Crude Fiber (%)	9.6	10.2 ± 0.3
TIA (Remaining %)	14.8	15.5 ± 0.8
Lactic Acid (g/kg DM)	36.5	35.8 ± 1.2
Viable Count (10⁹ CFU/g)	1.85	1.78 ± 0.24

Table 6. The amount of amino acids in sweet potato waste (SPW) and fermented sweet potato waste (FSPW) (mg/g dry matter).

Amino Acid	Category	SPW	FSPW	Change (%)
Histidine (His)	EAA	2.50 ± 0.08	5.20 ± 0.18 *	+108.0%
Threonine (Thr)	EAA	4.10 ± 0.12	9.50 ± 0.33 *	+131.7%
Valine (Val)	EAA	5.00 ± 0.14	11.80 ± 0.41 *	+136.0%
Methionine (Met)	EAA	1.20 ± 0.05	2.80 ± 0.12 *	+133.3%
Cysteine (Cys)	EAA	1.80 ± 0.06	3.50 ± 0.13 *	+94.4%
Isoleucine (Ile)	EAA	3.90 ± 0.11	9.00 ± 0.31 *	+130.8%
Leucine (Leu)	EAA	7.00 ± 0.20	16.50 ± 0.58 *	+135.7%
Phenylalanine (Phe)	EAA	4.50 ± 0.13	8.80 ± 0.31 *	+95.6%
Lysine (Lys)	EAA	4.20 ± 0.12	11.00 ± 0.39 *	+161.9%
Arginine (Arg)	EAA	5.10 ± 0.14	9.90 ± 0.35 *	+94.1%
Aspartic acid (Asp)	NEAA	10.50 ± 0.29	21.20 ± 0.74 *	+101.9%
Glutamic acid (Glu)	NEAA	12.80 ± 0.35	28.50 ± 0.95 *	+122.7%
Serine (Ser)	NEAA	4.80 ± 0.13	9.80 ± 0.34 *	+104.2%
Glycine (Gly)	NEAA	4.30 ± 0.12	8.50 ± 0.30 *	+97.7%
Alanine (Ala)	NEAA	4.50 ± 0.13	9.30 ± 0.33 *	+106.7%
Proline (Pro)	NEAA	5.50 ± 0.15	10.10 ± 0.35 *	+83.6%
Tyrosine (Tyr)	NEAA	3.20 ± 0.09	6.40 ± 0.22 *	+100.0%
Total Amino Acids (TAA)	-	84.90 ± 2.10	181.80 ± 5.45 *	+114.1%
Total Essential Amino Acids (TEAA)	-	39.30 ± 1.05	88.00 ± 2.80 *	+123.9%
EAA/TAA Ratio	-	0.46 ± 0.008	0.48 ± 0.009 *	+4.3%

*: Indicates a significant difference compared to the SPW group (p < 0.05, independent samples t-test). SPW: sweet potato waste; FSPW: fermented sweet potato waste; EAA (Essential Amino Acid) and NEAA (Non-Essential Amino Acid) are classified based on the nutritional requirements of most monogastric animals. Methionine and cysteine (sulfur-containing amino acids), phenylalanine and tyrosine (aromatic amino acids) are listed separately for clarity, as methionine can be converted to cysteine and phenylalanine can be converted to tyrosine in vivo. All data are presented as mean ± standard deviation (SD) of 3 independent biological replicates (n = 6).

Table 7. Biochemical and antioxidant enhancements in fermented SPW.

Analytical Parameter	Unfermented SPW	Optimized FSPW	Biological Significance
pH Value	6.2 ± 0.1	3.9 ± 0.1	Natural biopreservation via lactic acid.
Acetic Acid (g/kg DM)	Not Detected	4.2 ± 0.4	Contributes to antimicrobial profile.
Degree of Hydrolysis (%)	3.5 ± 0.5	28.4 ± 1.5	Massive breakdown of complex proteins into bioavailable peptides.
Phytic Acid (mg/100 g)	755.0 ± 15.0	210.5 ± 12.0	Degradation of mineral-chelating anti-nutrients.
Tannins (mg/100 g)	533.0 ± 10.0	185.0 ± 8.0	Improved palatability and protein digestibility.
Total Phenolics (mg GAE/100 g)	18.5 ± 1.5	68.4 ± 3.2	Release of bound phenolics via enzymatic cell wall degradation.
DPPH Scavenging (µmol TE/g)	39.3 ± 2.1	115.6 ± 5.4	Significant enhancement of antioxidant capacity.
Crude Fat (% DM)	2.1 ± 0.2	3.8 ± 0.3	Fungal lipid biosynthesis during aerobic stage.
Ash (% DM)	5.2 ± 0.3	6.5 ± 0.4	Relative concentration effect due to carbon loss via respiration.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Ishikawa, M.; Koshio, S.; Yokoyama, S.; Jiang, N.; Chen, J.; Tong, Y.; Zhang, X. Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement. Fermentation 2026, 12, 191. https://doi.org/10.3390/fermentation12040191

AMA Style

Zhang Y, Ishikawa M, Koshio S, Yokoyama S, Jiang N, Chen J, Tong Y, Zhang X. Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement. Fermentation. 2026; 12(4):191. https://doi.org/10.3390/fermentation12040191

Chicago/Turabian Style

Zhang, Yukun, Manabu Ishikawa, Shunsuke Koshio, Saichiro Yokoyama, Na Jiang, Jiayi Chen, Yiwen Tong, and Xiaoxiao Zhang. 2026. "Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement" Fermentation 12, no. 4: 191. https://doi.org/10.3390/fermentation12040191

APA Style

Zhang, Y., Ishikawa, M., Koshio, S., Yokoyama, S., Jiang, N., Chen, J., Tong, Y., & Zhang, X. (2026). Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement. Fermentation, 12(4), 191. https://doi.org/10.3390/fermentation12040191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of Machine Learning Models (ANN vs. RF) in Optimizing the Fermentation of Sweet-Potato Waste in the Japanese Shochu Industry for Nutritional Enhancement

Abstract

1. Introduction

2. Materials and Methods

2.1. Substrate and Chemicals

2.2. Substrate Pre-Treatment

2.3. Microorganisms and Inoculum Preparation

2.3.1. Microbial Strains

2.3.2. Starter Culture Preparation

2.4. Two-Stage Solid-State Fermentation (SSF) Process

2.5. Modeling and Optimization Strategy

2.5.1. Data Preprocessing and Leakage Prevention

2.5.2. Model 1: Artificial Neural Network (ANN)

2.5.3. Model 2: Random Forest (RF)

2.5.4. Optimization by Differential Evolution

2.6. Experimental Design: A Mixed Method for Machine Learning

2.7. Section and Preparation

2.8. Analytical Methods

2.8.1. pH and Organic Acid Analysis

2.8.2. Nutritional Analysis

2.8.3. Degree of Protein Hydrolysis (DH)

2.8.4. Anti-Nutritional Factors (ANFs) Assays

2.8.5. Total Phenolic Content (TPC) and Antioxidant Activity

2.9. Statistical Analysis

3. Results and Discussion

3.1. The Response Values for Augmented Experimental Design Show the Results

3.2. The Comparative Performance of Artificial Neural Network (ANN) and Random Forest (RF) Models

3.2.1. Inherent Limitations of ANN and the Robustness of Ensemble Methods

3.2.2. Impact of Augmented Experimental Design (N = 80) on Predictive Accuracy

3.3. In-Depth Analysis of Fermentation Response Prediction: Connecting Statistics and Biology

3.3.1. High-Precision Modeling of Digestibility and Safety

3.3.2. Estimating Fermentation Efficiency and Biopreservation

3.3.3. The Challenge of Modeling Complex Biological Outcomes

3.4. Global Optimum Strategy and Theoretical Validation

3.4.1. Global Optimum Strategy: Maximizing Substrate Preparation

3.4.2. Temperature-Focused Optimum and Process Robustness

3.5. Fermentation-Driven Enhancement of the Amino Acid Profile

Mechanistic Discussion: Synergistic Proteolysis in a Co-Culture System

3.6. Comprehensive Biochemical Quality Improvement

3.7. Broader Implications and Future Directions

3.8. Study Limitations and Theoretical Validity

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI