Optimization of Soybean Protein Extraction with Ammonium Hydroxide (NH4OH) Using Response Surface Methodology

Plants have been recognized as renewable and sustainable sources of proteins. However, plant protein extraction is challenged by the plant’s recalcitrant cell wall. The conventional extraction methods make use of non-reusable strong alkali chemicals in protein-denaturing extraction conditions. In this study, soy protein was extracted using NH4OH, a weak, recoverable, and reusable alkali. The extraction conditions were optimized using response surface methodology (RSM). A central composite design (CCD) with four independent variables: temperature (25, 40, 55, 70, and 85 °C); NH4OH concentration (0.5, 1, and 1.5%); extraction time (6, 12, 18, and 24 h) and solvent ratio (1:5, 1:10, 1:15 and 1:20 w/v) were used to study the response variables (protein yield and amine concentration). Amine concentration indicates the extent of protein hydrolysis. The RSM model equation for the independent and response variables was computed and used to create the contour plots. A predicted yield of 64.89% protein and 0.19 mM amine revealed a multiple R-squared value of 0.83 and 0.78, respectively. The optimum conditions to obtain the maximum protein yield (65.66%) with the least amine concentration (0.14 Mm) were obtained with 0.5% NH4OH concentration, 12 h extraction time, and a 1:10 (w/v) solvent ratio at 52.5 °C. The findings suggest that NH4OH is suitable to extract soybean protein with little or no impact on protein denaturation.


Introduction
With the recent global population growth, the challenge of hunger and malnutrition has become a major concern affecting over two billion people especially in the underdeveloped world [1]. Dietary protein is a major nutrient that aids the growth, development, and overall wellbeing of the body. The global protein demand is, thus, projected to quadruple by 2050 [2][3][4]. Proteins from animals and plants are the most common sources of dietary protein [1,[5][6][7]; although, it is claimed that animal-based protein sources are more popular [8]. However, plant-based protein sources are gaining ground as, compared to animal protein sources, they are more renewable, sustainable, cheaper, environment friendly and healthier [8,9,[9][10][11]. With the continuous rise in consumer awareness, the motivation for a healthier diet, and concerns for global greenhouse gas emissions [12], consideration and the transition to plant-based proteins is on the increase [1,13]. Plant-based alternative food proteins, especially legumes, have, thus, been identified as potential solutions to the protein security challenge [1].
Soybean is one of the oldest leguminous plant-based protein sources for human and animal nutrition [14]. It is a good nutrient source in animal diets, containing approximately 50% protein, 21% oil, 30-35% carbohydrate and 4% minerals [15]. Soybean protein is a highly nutritious and beneficial protein source that contains essential amino acids [16] and has the potential to enhance the immune system [17,18]. It is a high-quality protein with a 100% amino acid digestibility [19], which is comparable to proteins from animal to derive extraction conditions that would cause the least degradation to the extracted protein, amine quantification was simultaneously carried out with the protein extraction optimization to determine the optimal point at which the protein is extracted with the least degradation. To the best of our knowledge, this is the first time NH 4 OH soy protein extraction has been performed while also monitoring the degree of hydrolysis (amine concentration) in order to minimize protein denaturation.

Plant Material
Dried soybean seeds were obtained from the North Dakota State University (NDSU) pilot plant. Prior to their use, the soybean seeds were cleaned and manually sorted. After sorting, the seeds were ground and sieved. Soybeans of sieve mesh size 0.425-1 mm were used in this study.

Proximate Composition
Proximate analysis gives an overview of the quantity of macromolecules present in a sample [46][47][48][49]. It is, therefore, an important step towards protein extraction. The quantity of protein, moisture, fat, ash, and carbohydrates were determined (Table 1). Moisture content was determined using the oven drying method [21]. The powdered soybean sample (1 g) was weighed in an empty pan and dried using a Binder ED-56 oven dryer at 105 • C overnight. After allowing it to cool in a desiccator, the weight of the dried sample was measured. The percentage moisture content of the sample was determined using Equation (1).
where M m is the mass of the moisture (weight loss) and W s is the weight of the sample.

Total Protein Content
Crude protein content was determined using a nitrogen combustion protein analyzer (LECO FP5820) according to the AACC method 46-30.01 (1999). About 5 g of sample was subjected to pyrolysis (to make the nitrogen available in its free state) followed by combustion at 850 • C in the presence of pure oxygen. Nitrogen given off (% nitrogen) was detected using a thermal detector. The crude protein was then determined from the % nitrogen using a standard conversion factor of 6.25, as shown in Equation (2) (AACC method 1999).

Total Ash Content
Ashing was performed using the Thermo Fisher Scientific Thermolyne Benchtop TM 1100 • C Muffle Furnace (Vernon Hills, IL 60061, USA). The sample was weighed into empty crucibles and placed in the muffle furnace at 550 • C for 24 h [50]. After allowing it to cool in a desiccator, the weight of the ash was measured. The percentage ash content of the sample was determined using Equation (3) [50].
where M a is the mass of the ash and W s is the weight of the sample.

Total Fat
The proximate analysis of fat was performed using a Dionex ASE 200 Accelerated Solvent Extractor (Poway, CA, USA). About 6 g of sample was weighed into empty iron vials with cellulose filter papers inserted at their bases. The vials were tightly closed and transferred into the accelerated solvent extractor (using hexane as the extraction solvent) at 1000 psi [51]. The resulting oil-hexane mixture was aerated at 60 • C in a water bath to evaporate the remaining hexane. The oil was then transferred into the vacuum oven overnight to dry. The weight of the oil was recorded.

Total Carbohydrate
Carbohydrate determination was performed using the percentage of carbohydrate by subtraction method. This is achieved based on the calculation of the difference between 100 and the sum of the percentages of moisture, ash, fat, and protein, as shown in Equation (4) [52].

Experimental Design
The effect of the four independent variables on protein yield and amine concentration was investigated using the central composite design (CCD) and response surface methodology (RSM). A total of thirty experimental runs for the optimization of the extraction parameters were carried out. Five levels (−2, −1, 0, +1, +2) were used for each independent variable. The four independent variables are extraction time (X 1 ), temperature (X 2 ), NH 4 OH concentration (X 3 ), and solvent ratio (X 4 ), while the protein yield (Y 1 ) and amine concentration (Y 2 ) are dependent variables.
The independent variables with their levels and codes are shown in Table 2. The central composite design experiment was setup with 30 experimental runs of independent variables-NH 4 OH solvent system (1, 2.5, 5, 10 and 15%), sample-to-solvent ratios, extraction time, and temperature. A total of 5 g of powdered soybean sample was weighed into an Erlenmeyer flask (100 mL). Then, 50 mL of each solvent concentration was poured into each flask and transferred into a shaker at 55 • C and 130 rpm. Extractions were conducted with times of 6, 12, 18, and 24 h. Collected extracts were centrifuged at 1000 rpm for 5 min and the particulate and soluble fractions were separated via decantation.
The pH of the soluble fractions (supernatant) was adjusted to the protein isoelectric point pH (4-4.5) using dilute HCl, and left overnight to precipitate [21]. The resulting solution was centrifuged at 1000× g for 10 min. The supernatant was discarded, and the precipitate was washed with distilled water and oven dried at 60 • C for 12 h [53] using the Binder ED-56 oven dryer (Horsham, PA, USA). The amount of protein extract was determined using the Bradford assay protocol for protein estimation [54]. The protein standard curve was established using the Bovine Serum Albumin (BSA) assay [54]. The absorbance was obtained with the use of a Tecan Infinite M Nano, single-mode microplate reader (Seestrasse Männedorf, Switzerland) at 595 nm wavelength. The stages of protein extraction are shown in Figure 1.

Total Amine Estimation
Amines react with TNBS (2,4,6-Trinitrobenzene Sulfonic Acid) assay to form chromogenic derivatives that can be quantified spectrophotometrically at 335 nm wavelength. The total amount of amine in the sample was quantified using the TNBS assay method [55,56]. The glycine standard (20 µg/mL) was prepared by dissolving 0.2 g of glycine in 10 mL of distilled water followed by appropriate dilutions. The soybean sample was dissolved in a 0.1 M sodium bicarbonate reaction buffer (pH 8.5). In total, 0.25 mL of 0.01% (w/v) solution of TNBS was added to 0.5 mL of the sample solution and vortexed. The solution was incubated at 37 • C for 2 h. Then, 0.25 mL of 10% SDS and 0.125 mL of 1 N HCl were added to each sample. The absorbance of the resulting solution was taken at the 335 nm wavelength.

Statistical Analysis
The experimental data generated were subjected to multiple regression analysis using open-source statistical package (R) software. An empirical linear and second-order polynomial (pure quadratic) model was used to fit the data generated. Experimental design, data analysis, optimization, and contour plotting were also performed with R statistical software version R-4.2.3. The following model was proposed for the yield: where Y is the response (protein yield and amine concentration); b0 is the value of the fixed response at the central point; b1, b2, b3 and b4 are the coefficients of the linear terms; b11, b22, b33 and b44 are the coefficients of the quadratic terms; and b12, b13, b14, b23, b24 and b34 are the coefficients of the cross products (interactive terms).

Proximate Analysis
The results of the proximate analysis (Table 1) shows that the soybean sample contains 10.78% moisture, 31.50% protein, 19.34% fat, 33.84% carbohydrate, and 4.53% ash. The sample was found to have high protein, oil, and carbohydrate content, with the latter being largely due to the presence of pigmented pericarp [15,57], which can be difficult to grind. The amount of oil (19.34%) found in our sample can be attributed to the high amount of protein, which has the capacity to capture and retain oil [58,59]. These findings are consistent with earlier studies, which reported soybean to contain 10.74% moisture, 17.5% protein, 19.98% fat, 44.26% carbohydrate and 4.29% ash [60][61][62][63]. These results indicate that the proximate composition of the soybean sample used in this study is comparable to those used in previous studies.

Experimental of the Response Surface Methodology
The experimental protein extraction yield (Y 1 ) and amine concentration (Y 2 ) obtained from 30 experimental runs of four independent variables are shown in Table 3. The results showed that a maximum experimental protein yield of 99.88% was reached at positions X 1 , X 2 , X 3 , X 4 = (1, 1, −1, −1), which corresponds to an amine concentration (Y 2 ) of 0.23 mM. The least amine concentration (0.15 mM) was reached at conditions X 1 , X 2 , X 3 , X 4 = (−1, −1, −1, 1), corresponding to a protein extraction yield (Y 1 ) of 56.14%. Table 3. Experimental design codes, actual values of the central composite design, and responses of the surface methodology for soybean protein extraction yield (Y 1 ) and amine concentration (Y 2 ).

Coded Variables Uncoded Variables Responses
Runs The independent and dependent variables were then analyzed using the developed model to create a regression equation that could predict the response within the specified range.

Regression Models for Response Variables Regression Models for Protein Extraction Yield (Y 1 )
The data shown in Table 3 were subjected to multiple regression analysis using the quadratic interaction coefficients for protein extraction yield (Y 1 ). Table 4 shows the analysis of variance (ANOVA) of the independent variables for the extraction optimization of soybean protein. The goodness-of-fit and lack-of-fit test results following the regression model are also presented. The regression model equation for protein extraction yield (Y 1 ) is shown in Equation (6) as follows: The statistical analysis results revealed that only the quadratic and linear interaction coefficients were significant (p < 0.05), while the two-way interaction coefficients with a high p-Value of 0.512 were not significant (p > 0.05) and were removed from the model.
The ANOVA goodness-of-fit, provided by the coefficient of determination (R 2 ), was determined. R 2 measures the percentage of changes in the response variable that can be attributed to independent variables and their interactions. It also evaluates how well a statistical model fits a given set of data. The closer the R 2 value is to 1, the better the model matches the data. The multiple and adjusted R 2 values for the regression model for protein extraction (Y 1 ) were 0.83 and 0.77, respectively, showing that the model was adequate. To further confirm the model's adequacy, the lack-of-fit error test was carried out. The lack-of-fit error test quantifies inaccuracies due to any flaw(s) in a model [64]. The lack of fit is not significant if the error probability, p, of the lack-of-fit F-statistic is larger than the confidence interval. In contrast, if the F-statistic of the lack-of-fit error is greater than the associated error probability, the lack-of-fit test is said to be significant, and the regression model is inadequate to explain the data [65]. In the present study, as shown in Table 4, the lack of fit was found to be non-significant (F = 5.24; p = 0.4965 > 0.05), indicating that the regression model for protein extraction, Y 1 , was sufficient in explaining the experimental data.

Regression Model for Amine Concentration (Y 2 )
The data shown in Table 3 was subjected to the multiple regression analysis using the quadratic interaction coefficients for amine concentration (Y 2 ). The regression model equation for amine concentration (Y 2 ) is shown in Equation (7) as follows: Table 5 shows the analysis of variance (ANOVA) of the independent variables for the degree of hydrolysis of protein (amine concentration). Statistical analysis data showed that the quadratic and linear interaction coefficients were significant (p < 0.05) while the two-way interactions were not significant (p > 0.05). To acquire a satisfactory fit for a model, R 2 values should be >0.80 [66]. The multiple and adjusted R 2 values for the regression model for amine concentration (Y 2 ) were 0.78 and 0.58, respectively. These values are quite low. However, since the error probability of the lack-of-fit F-statistic is larger than the confidence interval (F = 1.00; p = 0.531 > 0.05), the lack of fit was not significant [65], and the model shows adequacy in explaining the variations in data.

Surface Plots for Protein Extraction Yield (Y 1 ) and Amine Concentration (Y 2 )
In this study, the optimum conditions were selected using surface plots. To determine the optimum yield at any points, two of the four independent variables were fixed while varying the remaining two and predicting the response variables.
As shown in Figure 2, interactions between the solvent ratio and extraction time caused an increase in protein extraction yield, resulting in a maximum protein yield of about 65% at a time and solvent ratio of 12 h and 1:10 w/v, respectively. However, a further increase in the time and solvent ratio after reaching the optimum caused a corresponding decrease in protein yield. An increase in temperature and NH 4 OH concentration caused a corresponding increase in protein yield, as shown in Figures 3 and 4, respectively. Interaction between the temperature and time showed that the protein yield was highest when the temperature was around 80 • C and the extraction time was 12 h (Figure 3). In addition, variations in the NH 4 OH concentration and solvent ratio showed that the protein yield was highest when the NH 4 OH concentration was 1.0%. However, at such higher temperatures and concentrations, the nutritional quality of the proteins would have been compromised [35,67,68]. A further increase in extraction time does not have much effect on the protein extraction yield in the selected range of study. To acquire a satisfactory fit for a model, R 2 values should be >0.80 [66]. The multiple and adjusted R 2 values for the regression model for amine concentration (Y2) were 0.78 and 0.58, respectively. These values are quite low. However, since the error probability of the lack-of-fit F-statistic is larger than the confidence interval (F = 1.00; p = 0.531 > 0.05), the lack of fit was not significant [65], and the model shows adequacy in explaining the variations in data.

Surface Plots for Protein Extraction Yield (Y1) and Amine Concentration (Y2)
In this study, the optimum conditions were selected using surface plots. To determine the optimum yield at any points, two of the four independent variables were fixed while varying the remaining two and predicting the response variables.
As shown in Figure 2, interactions between the solvent ratio and extraction time caused an increase in protein extraction yield, resulting in a maximum protein yield of about 65% at a time and solvent ratio of 12 h and 1:10 w/v, respectively. However, a further increase in the time and solvent ratio after reaching the optimum caused a corresponding decrease in protein yield. An increase in temperature and NH4OH concentration caused a corresponding increase in protein yield, as shown in Figure 3 and Figure 4, respectively. Interaction between the temperature and time showed that the protein yield was highest when the temperature was around 80 °C and the extraction time was 12 h (Figure 3). In addition, variations in the NH4OH concentration and solvent ratio showed that the protein yield was highest when the NH4OH concentration was 1.0%. However, at such higher temperatures and concentrations, the nutritional quality of the proteins would have been compromised [35,67,68]. A further increase in extraction time does not have much effect on the protein extraction yield in the selected range of study.      The U-shaped amine concentration surface plots indicate amine concentrati mization, which is a measure of the degree of protein hydrolysis. Results revealed minimum amine concentration was attained at extraction time and solid-to-solv of 12 h and 1:10 respectively ( Figure 5). This finding indicates that a longer extract and higher solvent ratio did not cause any further decrease in amine concentratio over, the influence of extraction time and temperature on amine concentration that the minimum amine concentration was obtained at a temperature of 52.5 °C extraction time of 12 h ( Figure 6). The minimum amine concentration was also at a solid-to-solvent ratio and NH4OH concentration of 1:10 and 0.5%, respectively 7). This suggests that, at these points, the extracted protein was least denatured. A it was observed that a higher amount of soybean protein was extracted with an in NH4OH concentration and temperature [68]; this is detrimental to the nutritio ity and organoleptic properties of protein. Therefore, considering the degree of sis, which must be generally low [41], the maximum protein extraction was reach extraction time of 12 h, a solid-to-solvent ratio of 1:10 w/v, a NH4OH concentration and a temperature of 52.5 °C. The U-shaped amine concentration surface plots indicate amine concentration minimization, which is a measure of the degree of protein hydrolysis. Results revealed that the minimum amine concentration was attained at extraction time and solid-to-solvent ratio of 12 h and 1:10 respectively ( Figure 5). This finding indicates that a longer extraction time and higher solvent ratio did not cause any further decrease in amine concentration. Moreover, the influence of extraction time and temperature on amine concentration showed that the minimum amine concentration was obtained at a temperature of 52.5 • C and an extraction time of 12 h ( Figure 6). The minimum amine concentration was also obtained at a solid-to-solvent ratio and NH 4 OH concentration of 1:10 and 0.5%, respectively (Figure 7). This suggests that, at these points, the extracted protein was least denatured. Although it was observed that a higher amount of soybean protein was extracted with an increase in NH 4 OH concentration and temperature [68]; this is detrimental to the nutritional quality and organoleptic properties of protein. Therefore, considering the degree of hydrolysis, which must be generally low [41], the maximum protein extraction was reached at an extraction time of 12 h, a solid-to-solvent ratio of 1:10 w/v, a NH 4 OH concentration of 0.5%, and a temperature of 52.5 • C.     Similar optimization results for the extraction of proteins from plants have been reported. [69] extracted proteins from soybean flour using the response surface methodology. In the study, the effects of extraction time, temperature, pH and NaOH concentration    Similar optimization results for the extraction of proteins from plants have been reported. [69] extracted proteins from soybean flour using the response surface methodology. In the study, the effects of extraction time, temperature, pH and NaOH concentration Ref. [69] extracted proteins from soybean flour using the response surface methodology. In the study, the effects of extraction time, temperature, pH and NaOH concentration were found to be significant with an optimum protein yield of 48.30% at 70 • C, pH 12.68, and a 44.7 min extraction time. In a similar study on protein extraction optimization from lentil using response surface methodology, ref. [70] came up with an optimum protein yield of 14.5 g/100 g flour at a temperature of 22 • C, a time of 1 h, and a solid-to-solvent ratio of 1:10 (g/mL).
In a related study, ref. [71] investigated protein extraction from Chlorella vulgaris sp. with an alkaline solubilization and acid precipitation technique using the response surface methodology. In the study, the effects of independent variables, such as the precipitation time and pH, were found to be significant and at an optimum at 39.86 min and 3.2, respectively, with an overall protein yield of 81.0%. The results of protein extraction from watermelon seeds with sodium hydroxide using the response surface methodology has also been reported. In the study, ref. [40] discovered the significant effects of temperature, liquid/solid ratio, time and solvent concentration on protein yield. They came to the conclusion that these conditions were the best for extraction: a NaOH concentration of 1.2%, mixing time of 15 min, a temperature of 40 • C, and a solvent/meal ratio of 70:1.
Likewise, ref. [41] investigated the effects of pH, time, temperature, and the liquid/solid ratio, on protein extraction from peanuts (Arachis hypogea L.). They discovered substantial effects of these variables on protein yield and concluded that the maximum protein yield was obtained at a pH of 8.0, solvent ratio of 8:1, a time of 30 min, and a temperature of 50 • C. In this research, the optimum conditions for protein extraction varied slightly from the reported values because, unlike other protein extraction processes, the optimum values of the protein extraction yield were determined at the point when the degree of protein hydrolysis was at the minimum, signifying little or no protein denaturation.

Validation Studies
The experiment was repeated using the optimum conditions (extraction time of 12 h, solid-to-solvent ratio of 1:10 w/v, NH 4 OH concentration of 0.5%, and a temperature of 52.5 • C) derived from the above study. The experimental protein yield at the optimum level was 65.66%, while the computed protein yield using the RSM regression equation was 64.89%. This result validates the regression model.

Conclusions
The regression model for protein and amine concentration optimization exhibited a non-significant lack of fit and a goodness of fit (R 2 ) of 83% and 78%, respectively. This result was validated in the experimental vs. predicted protein extraction yield and amine responses. Following 30 experimental runs of temperature, NH 4 OH concentration, extraction time, and solvent ratio, the experimental protein yield with the least degree of hydrolysis was 65.66%. The surface plots showed that the maximum protein yield with the least degree of denaturation was obtained by extracting soybean with a 0.5% NH 4 OH concentration, 12 h extraction time, and a 1:10 (w/v) solvent ratio at 52.5 • C. It was observed that, at optimum conditions, the experimental protein yield and amine concentration varied slightly with predicted values. The experimental protein yield (65.66%) was slightly higher than the predicted yield (64.89%), while the experimental amine concentration (0.14 Mm) was slightly lower than the predicted yield (0.19 Mm). The closeness of both data (experimental and predicted) shows that the predicted model is more suitable for the experimental data. It is obvious that~40% of proteins are still captured in the soybean fiber as crystals. This could be a great resource in various industrial applications. The optimization results suggest that NH 4 OH is suitable to extract soybean protein without causing degradation.