Unraveling Biotic and Abiotic Factors Shaping Sugarcane Straw Polyphenolic Richness: A Gateway to Artificial Intelligence-Driven Crop Management

Sugarcane straw (Saccharum officinarum) is a valuable coproduct renowned for its abundant polyphenolic content. However, extracting these polyphenols for natural ingredients faces challenges due to their inherent variability, influenced by biotic stress factors and plant characteristics. We explored the impact of five crucial factors on sugarcane straw polyphenolic diversity: (i) production area (Guariba, Valparaíso), (ii) borer insect (Diatraea saccharalis) infestation, (iii) plant age (first to seventh harvest), (iv) harvest season, and (v) plant variety. Response surface methodology (RSM) and artificial neural networks (ANN) were used to optimize polyphenol extraction conditions. A second-order polynomial model guided us to predict ideal sugarcane straw harvesting conditions for polyphenol-rich extracts. The analysis identified CU0618-variety straw, harvested in Guariba during the dry season (October 2020), at the seventh harvest stage, with 13.81% borer insect infection, as the prime source for high hydroxybenzoic acid (1010 µg/g), hydroxycinnamic acid (3119 µg/g), and flavone (573 µg/g) content and consequently high antioxidant capacity. The ANN model surpasses the RSM model, demonstrating superior predictive capabilities with higher coefficients of determination and reduced mean absolute deviations for each polyphenol class. This underscores the potential of artificial neural networks in forecasting and enhancing polyphenol extraction conditions, setting the stage for AI-driven advancements in crop management.


Introduction
Brazil is a relevant food producer as well as the world's top producer of sugarcane (Saccharum spp.), and the National Supply Corporation-CONAB (2019) reported that during the 2018-2019 harvest, there was about 8.59 million hectares of cultivated land and 620.44 million tons of output.The state of São Paulo was the country's most significant producer, representing 53.65% of the total processed sugarcane production [1].Straw residues, often dumped after harvest, are one of the several by-products of sugar production that are generated in large amounts.Those quantities have increased since harvesting transitioned from burning to mechanized [2].The straw represents a sustainable source of polyphenols, due to the presence of 2,5-dihydroxybenzoic acid, caffeoylquinic acid derivatives, flavones (derivatives of luteolin, apigenin, and tricin), and phenolic acids (sinapic, caffeic, and ferulic acids) [3].
In recent years, the development of novel ingredients that can function as reaction chain breakers and scavengers of damaging free radicals and reactive oxygen species and their use in many applications has led to an expansion of the global polyphenol economy.They have been used as food additives and in brewing goods like liquors and wines (e.g., in baked products, noodles, and pasta) [4,5], as well as in cosmetics to delay the onset of skin aging and enhance moisture and smoothness while reducing roughness and wrinkles [6].
Antioxidants 2024, 13, 47 2 of 25 Polyphenols protect against oxidative stress from ultraviolet (UV) irradiation, which is responsible for cutaneous damage and skin cancer [7].Favorable food safety regulations and increasing public awareness of the health benefits drove the global polyphenol market to 1.6 billion USD in 2020.Projections indicate it will reach 2.7 billion USD by 2030, boasting a 5.2% compound annual growth rate, propelled by expanded usage in the food, beverage, pharmaceutical, and cosmetic industries in the following years [8].
Previously reported findings indicated that phenolic compounds in sugarcane straw extracts are mainly hydroxycinnamic acids, with concentrations reaching approximately 1460.39 µg/g.Chlorogenic acid, neochlorogenic acid, and p-coumaric acid dominate this group.Hydroxybenzoic acids form the second-most prevalent class, with concentrations around 727.36 µg/g, while 1-O-vanilloyl-β-D-glucose, 2,4-dihydrobenzoic acid, and 3,4dihydroxybenzaldehyde are the most abundant compounds in this category.Among phenolic compounds, flavones are the least abundant, at 77.56 µg/g.Remarkably, vitexin and isoorientin are the most abundant compounds in this class.These phenolic compounds have been shown to possess potent antioxidant properties, which may contribute to their potential use in various food and pharmaceutical applications [9].
Turning attention to the production of polyphenolic extracts from plant sources, a critical consideration is predicting optimal harvest conditions, especially when dealing with by-products.Plants produce phenolic compounds naturally during growth or as responses to various stimuli, including injuries, infections, or environmental stressors such as heavy metal salts, UV irradiation, and temperature fluctuations [10].These variables, called abiotic and biotic factors, alter the amount and kind of phenolic chemicals [11,12].Abiotic factors encompass the nonliving components of the environment, including chemical and physical elements, which influence the behavior of living organisms and the functioning of ecosystems.These factors include soil, water, air, temperature, moisture, and light.In contrast, biotic factors refer to any living organism that affects another organism.Alongside the effects of animals and humans, biotic elements frequently encompass plants, fungi, and microorganisms [10].
In the context of advancing technology, machine learning (ML) and deep learning (DL) within artificial intelligence (AI) have emerged as promising fields with significant potential to enhance various aspects of agricultural practices, including sugarcane crop production and the extraction of valuable by-products such as polyphenols.AI technologies, including artificial neural networks (ANNs), offer valuable tools for collecting and analyzing diverse data in ways that can significantly enhance sugarcane crop production.Recent studies have demonstrated the potential of ML and DL in various aspects of sugarcane production, including crop yield prediction, determination of soil agricultural aptitude, weed identification, and classification of sugarcane varieties.ANNs, in particular, show promise as prognostic tools in modeling studies related to plant cultures, indicating their potential to contribute to the optimization of sugarcane production and the extraction of valuable by-products such as polyphenols [13][14][15][16].
The present study determines how biotic (borer infection, harvestings) and abiotic (geographic zone and season) factors affect polyphenol content (hydroxybenzoic acid, hydroxycinnamic acids, and flavones) in straw from different sugarcane varieties as a potential by-product to produce natural extracts.An estimation was performed with modeling efficiencies of response surface methodology (RSM) and artificial neural networks (ANNs), and they were statistically compared using various parameters, such as coefficient of determination (R 2 ) and root mean square error (RMSE).

Sampling Plan
To study the impact of two abiotic factors (geographic zone and season) on sugarcane straw polyphenol richness, the samples were collected during the year 2020 between June and October and from two different geographic areas separated by 274 km (Guariba and Valparaíso, both in São Paulo, Brazil).
Samples of 5 kg of straw were systematically collected for each specific combination of variables, including variety, geographic area, season, borer infection level, and plant age (Figure 1).During the sugar extraction harvest, a field worker carefully separated and air-dried the straw before packing it for transport from Brazil to Portugal.The samples were air-shipped at room temperature to the CBQF-UCP laboratory in Porto, Portugal.Subsequently, the material underwent a drying process at 40 °C for 12 h using a ventilated oven (Memmert GmbH + Co.KG, Schwabach, Germany), followed by milling with a grinder (SM100, Retsch, Vila Nova de Gaia, Portugal) to achieve a particle size less than 4 mm.Straw was stored at room temperature and protected from light until beginning assays.Samples of 5 kg of straw were systematically collected for each specific combination of variables, including variety, geographic area, season, borer infection level, and plant age (Figure 1).During the sugar extraction harvest, a field worker carefully separated and air-dried the straw before packing it for transport from Brazil to Portugal.The samples were air-shipped at room temperature to the CBQF-UCP laboratory in Porto, Portugal.Subsequently, the material underwent a drying process at 40 • C for 12 h using a ventilated oven (Memmert GmbH + Co.KG, Schwabach, Germany), followed by milling with a grinder (SM100, Retsch, Vila Nova de Gaia, Portugal) to achieve a particle size less than 4 mm.Straw was stored at room temperature and protected from light until beginning assays.

Polyphenolic Extract Production from Sugarcane Straw
In brief, dried sugarcane straw powder was extracted with 50% (v:v) ethanol in ratio biomass: solvent of 1:10 (w:v) during 24 h at 30 • C under agitation at 120 rpm (Innova 40 New Brunswick, Eppendorf, Hamburgo, Germany) and protected from light.The solid and liquid fractions were separated by filtration with gauze, and the liquid fraction was centrifuged at 18.671× g for 10 min (Sorvall Lynx 4000 centrifuge, Thermo Scientific, Waltham, MA, USA).The ethanol was removed from the liquid fraction evaporation under vacuum with a rotary evaporator at 50 • C, 150 mbar (Heidolph, Walpersdorfer, Germany).
The obtained aqueous fraction was further applied to an Amberlite XAD-2 (Sigma-Aldrich, St. Louis, MO, USA) resin for subsequent purification.The Amberlite XAD-2 was washed with methanol and three times with deionized water.The resin was preconditioned for 12 h in ultrapure water before being used.The resin was used in a ratio of 1:2 (v:w) and left under agitation of 100 rpm overnight at room temperature.After that, the resin was isolated and washed twice with deionized water at pH 2 to remove any adsorbed sugar.The desorption of the phenolic compounds was performed in two steps, first with a 50% ethanolic solution acidified at pH 2 (HCL, 10 M) under the agitation of 100 rpm at 37 • C overnight and a second desorption in the same conditions for 1 h.The ethanolic extracts were combined and recovered by decantation and filtration (type I filter, V Reis, Lisbon, Portugal).The ethanol was evaporated with a rotary evaporator (50 • C; 150 mbar), and the dried extracts were obtained by freeze-drying (Martin Christ, Osterode am Harz, Germany) for further characterization [3].Extraction was conducted in triplicate for each sampling condition.

Phenolic Compounds and Organic Acid Analysis by LC-ESI-UHR-QqTOF-MS
The identification and quantification of all phenolic compounds were conducted using liquid chromatography-electrospray ionization-ultrahigh-resolution-quadruple time of flight-mass spectrometry (LC-ESI-UHR-QqTOF-MS) [17].The dried extracts were first dissolved in a 50% ethanol solution to reach a final concentration of 50 mg/mL and subsequently filtered through a 0.45 µm filter before injection.Separation was carried out on a Bruker Elute series instrument equipped with a UHR-QqTOF mass spectrometer (Impact II, Bruker Daltonics, Bremen, Germany) and a BRHSC18022100 Intensity Solo 2 C18 column (100 × 2.1 mm, 2.2 µm, Bruker, Bremen, Germany).
For the mass spectrometry acquisition, negative ionization mode was employed, with these selected parameters: end-plate offset voltage, 500 V; capillary voltage, 3.0 kV; drying gas temperature, 200 • C; drying gas flow, 8.0 L/min; nebulizing gas pressure, 2 bar; collision radio frequency (RF), ranging from 250 to 1000 Vpp; transfer time, from 25 to 70 µs; collision cell energy, 5 eV.Sodium formate clusters were used for internal mass calibration.Elemental composition was confirmed based on accurate mass and isotope rate calculations designated as mSigma (Bruker Daltonics), and phenolic compounds were identified using their accurate mass [M-H] − using the Bruker Compass DataAnalysis software (version 5.1, Bruker Daltonic GmbH, Bremen, Germany).Quantification results are expressed in micrograms per gram of dry extract.

Antioxidant Activity
The 2.2 ′ -azino-bis (3-ethylbenzothiazoline-6-sulphonic acid) diammonium salt radical cation (ABTS) decolorization experiment was carried out [18] with an ABTS solution prepared by mixing it with K 2 S 2 O 8 solution at a 1:1 ratio and kept in the dark for 16 h.The solution was diluted with deionized water to achieve an initial OD of 0.700 ± 0.020 at 734 nm.Five sample concentrations were prepared through a 1:1 dilution, starting with a 6.25 mg/mL concentration.Each sample (15 µL, duplicated) was mixed with 200 µL of ABTS, incubated for 5 min at 30 • C in a microplate reader (Synergy H1, Biotek, Winooski, VT, USA), and the OD was measured at 734 nm after incubation.
The DPPH radical cation decolorization assay [18] involved the production of a solution with an OD of 0.600 ± 0.100 at 515 nm by mixing 600 µM of DPPH solution (Sigma-Aldrich, St. Louis, MI, USA) with methanol.Five sample concentrations were prepared through a 1:1 dilution, starting at 6.25 mg/mL.In a microplate, 25 µL of each sample (duplicated) was mixed with 175 µL of DPPH solution, followed by 30 min room-temperature incubation.The OD was then measured at 515 nm using a microplate reader.
For both methodologies, Trolox standard solutions (0.075-0.008 mg/mL) were used for the calibration curve, and the results are expressed as IC50 (mg/mL).

Response Surface Methodology (RSM)
Collection time (date) (X 1 ), variety (X 2 ), geographical area (X 3 ), borer infection level (%) (X 4 ), and harvest number (X 5 ) were modulated according to a central composite design (CCD).For that, the STATISTICA version 14.0.0.5 (TIBCO Software Inc., Palo Alto, CA, USA) was used.Although the variety variable was not controlled, its significance in influencing polyphenol variability in plants was duly considered for the study.Response variables were estimated using the response surface model described by the following second-order polynomial equation (Equation ( 1)): where X 1 , X 2 , X 3 , X 4 , and X 5 represent the levels of the factors.β 0 -β 9 represent the coefficient estimates.The variables present in quadratic terms represent the surface curvature, the variables present in linear terms represent the coordinates of the maximum value predicted, and the variables present in bi-factorial cross products represent the axes of the geometric figure formed by partitioning the surface area.The impact of the combinations of the four independent variables on the total phenolic content and phenolic classes' concentration was examined using the response surface approach.
The optimization of the multi-criteria response surface is based on Derringer's desirability function.The function converts each variable's answer into a score of desirability (d) that ranges from 0 (totally unpleasant) to 1 (entirely desirable).The function can be maximized, minimized, or reach a specific goal based on the optimization criterion employed.The desirability function for response variables takes the form of the following equation: where y i,min , and y i,max are the minimum and maximum desired levels of each response variable i, and here the highest and the lowest values of the corresponding quality attribute.
Responses below y i,min were assigned a 0 desirability, while responses above y i,max were assigned a desirability of 1. Between y i,min , and y i,max , the desirability increased linearly by assigning a weight (w i ) of one.A predictive model was used to find the best conditions to obtain the maximum polyphenols in the extracts.

Artificial Neural Networks (ANNs)
STATISTICA version 14.0.0.5 (TIBCO Software Inc.) was used to build and analyze different neural networks to investigate the influence of the input parameters (collection data, production area, borer infection, harvest number, and variety) on the three outputs.From now on, the outputs will be referred to as hydroxybenzoic acids, hydroxycinnamic acids, and flavones.
Data were analyzed using two different types of neural networks: the regression network and the Kohonen network (KN) for categorization (multilayer perceptron, MLP).The experimental dataset was used to generate the RSM model and the ANN models.Of the experimental dataset, 70% (19 points) were used for network training, 15% for validation, and the remaining 15% (4 points) for network testing.
Automated neural networks cluster analysis using the Kohonen training procedure, with the training, testing, and validation data used to build the network.After that, regression neural networks (MLP) were automatically searched for 20 MLP networks, in which all were trained, and five of them were chosen for retention based on their performance throughout training, testing, and validation.The identity, logistic, Tanh, and exponential activation functions were examined for hidden and output neurons.An effective second-order training method was utilized, and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm was selected.The training algorithm made use of the radial basis function.The activation functions for hidden and output neurons were Gaussian and identity functions.The sum of squares (SOS) was employed as the error function for MLP networks.The Pearson correlation coefficient between experimentally determined values and values predicted by neural networks was measured to evaluate the effectiveness of the proposed neural network models.A global sensitivity analysis was conducted to assess the input variables' relative significance for the created neural network models.Upon determining the optimal configuration for the ANN method, we conducted a sensitivity analysis to unveil the significance of each operational variable and pinpoint the components crucial for predicting fouling resistance.We applied Equation (3) to achieve this, leveraging the partitioning of connection weights outlined in the Garson equation.
In Equation ( 3), RI denotes the relative importance of the input variable (x) regarding the output variable.Here, k i and k h refer to the count of input and hidden neurons, while W ab signifies the connection weights between the input layer and the hidden layer, and V b represents the connection weight between the hidden layer and the output layer.It is important to clarify that the numerator in Equation ( 3) signifies the summation of absolute weight products for each input.Conversely, the denominator corresponds to the total of all weights contributing to the hidden unit, considering absolute values.

Comparison of the Prediction Ability of RSM and ANN
The construction of several statistical parameters, including the coefficient of determination (R 2 ) and the root mean square error (RMSE), was employed to compare the estimation skills of response surface methodology (RSM) and artificial neural network (ANN) in the context of the study.RSM is a set of statistical methods used for optimizing process variables, and it involves the calculation of summary statistics such as RMSE, adjusted R 2 , and predicted R 2 .The RMSE, which is the square root of the mean square error, is the standard deviation associated with the experimental error.On the other hand, ANN conducts a sensitivity analysis on each model and displays the results in a spreadsheet, which rates the importance of the model's input variables.The comparison of RSM and ANN in the study involved the assessment of their prediction capacity based on the calculated statistical parameters, providing valuable insights into their respective estimation skills.

Individual Polyphenols Content
The list of individual polyphenols identified among all extracts is given in Table 1, and the quantification of each compound is presented in Tables 2 and 3.The analysis of various sugarcane straw extracts revealed the presence of quinic acid esterified with coumaroyl, caffeic, ferulic acid units, and glycosylated protocatechuic acid among the detected metabolites.Hydroxybenzoic acid and dihydroxybenzoic acid linked to sugar moieties were identified, including gentisic acid 5-O-β-glucoside.Notably, a metabolite akin to salicylic acid, known as gentisic acid (2,5-dihydrobenzoic acid), plays a crucial role as a signaling molecule in plants' defense responses against infections [19].
Two types of glycosylation were detected among the flavones, such as C-glucosylated apigenin, luteolin, diosmetin, and O-glucosylated tricin.These flavone profiles have been extensively described in different sugarcane by-products, such as juice [20] and leaves [21].Flavonoid C-glycosides have been shown to have a variety of properties, including antioxidant, insect antifeedant, antibacterial, mycorrhizal symbiosis promoter, and UV-absorbing pigment.These activities need high local concentrations, and many of these chemicals are toxic to plants [22].
In the natural environment, plants are continuously pressured by biotic and abiotic factors.These adverse circumstances increase reactive oxygen species (ROS) generation, which inhibits plant growth and development and results in significant agricultural output losses [28].As a defense mechanism against various abiotic stimuli, phenolic accumulation is a characteristic of stressed plants that is often consistent [29].When plants are exposed to biotic or abiotic stressors, the activation of the phenylpropanoid pathway leads to the production of chlorogenic acid as the primary phenolic compound [30].With tomato plants subjected to a nematode and water stress simultaneously, a rise in flavonoid and chlorogenic acid levels was also observed [31].
Other plants, such as tea, showed a response to abiotic stressors such as drought, salt, methyl jasmonate, and cold.In this case, the gene expression increased in the phenylpropanoid and lignin pathways and reduced in the flavonoid route.The lignin pathway is crucial for development of plant cell walls, serving as a primary line of defense against environmental stressors.The lignin route is upregulated because of the metabolic flux, where the flavonoid and lignin pathways compete for the same carbon supply.In the presence of abiotic stress and throughout the process of leaf maturation, polyphenols function as the repository for carbohydrates, resulting from photosynthesis in tea plants [32].In addition, 2,5-dihydroxybenzoic acid was found to be substantially raised in tomato plants infected with the citrus exocortis viroid [33].
With the potential to be used in the food and cosmetic industries, sugarcane straw is a by-product that is a rich source of polyphenols, including 5-O-feruloylquinic acid, 2,5-dihydroxybenzoic acid, and luteolin-6-C-glucoside.This makes it a good target for researching the ideal harvesting conditions for high-quality extracts.
The antioxidant potential of sugarcane straw extract powder was assessed using two chemical methods, namely, ABTS and DPPH, with results expressed in Trolox equivalents.As indicated by the ABTS assay, the extract demonstrated the ability to neutralize free radicals at a rate of 0.9-3.6 mg TE/mL, whereas the DPPH assay yielded an antioxidant capacity in a range of 1.0-6.8mg TE/mL (see Table 4).Phenolic compounds found in extracts from sugarcane rods have been identified as exhibiting potent antioxidant properties, as indicated by DPPH and FRAP assays.These assays have shown a significant correlation with the levels of phenolic compounds and flavonoids [34].A comprehensive review summarized the antioxidant potential of various sugarcane products and byproducts, highlighting that the leaves and bagasse contain the highest capacity for neutralizing free radicals, a trait associated with their rich polyphenolic content [35].Furthermore, another study suggested that sugarcane straw extracts possess antioxidative attributes, which could prove beneficial in mitigating oxidative stress-related diseases or their progression [36].The current body of evidence points to the utilization of sugarcane juice as a natural source of dietary antioxidants in functional foods, underscoring its exceptional phenolic content, particularly in terms of flavonoids [37].

RSM Modeling
To study how the effect of independent variables like geographic area, borer infection level, harvest number, variety, and harvesting date can influence the recovery of polyphenols from sugarcane straw, a response surface methodology (RSM) analysis was performed.For that, the combination of the selected parameters was determined using central composite design (CCD), and a quadratic model proposed by STATISTICA software was chosen after the analysis of the R 2 values and p-values.This process guarantees that every factor and how it interacts with others is thoroughly investigated.The fitted second-order quadratic model equations' statistical significance and the importance of each element were evaluated using ANOVA.The adjusted R 2 in the current investigation was close to the limitations that were deemed acceptable (R 2 ≥ 0.80), indicating that the experimental data fit the second-order polynomial equations well [38].Based on the multiple linear regression (MLR) equations, 3D surfaces and contour plots were created to understand the interactions between the independent variables better.The primary and cross-product impacts of the independent variables are more clearly understood thanks to these 3D visualizations that significantly increase on-target replies.
The relationships among the response variables (hydroxybenzoic acid, hydroxycinnamic acid, and flavone content) and the independent variables were evaluated.Based on the analysis of the regression coefficients together with the results of the analyses of variance (ANOVAs) of the second-order polynomial models, the hydroxybenzoic acid class (R 2 = 0.94) was significantly affected (p < 0.05) by the linear term of the variables "variety", "geographic area", "infection level" (X 2 , X 3, and X 4 ), linear and quadratic terms of the variable "harvest number" (X 5 and X 5 2 ), quadratic term of "variety" (X 2 2 ) and interactive effect between the variables harvesting, variety, geographic area, and collection date (X 1 X 2 , X 1 X 3 , X 1 X 4 , X 2 X 3 , X 2 X 5 , and X 3 X 5 ) (Table 5).The linear variables with more potent effects in hydroxybenzoic acid content are represented in 3D surface plots in Figure 2. It was observed that varieties SP803280, SP813250, and CTC9001 were the ones presenting higher content of hydroxybenzoic acids, and the variable "borer infection" had a negative effect, meaning that when the borer infection level increased, the hydroxybenzoic acid content tend to decrease.Through the analysis of the estimated effect for the linear variable (β = -288.93) of "harvest number", it seems to tend to have more hydroxybenzoic acids in younger plants (1st harvest).
Table 5. Summary of the effect of collection time (date) (X 1 ), variety (X 2 ), geographical area (X 3 ), borer infection level (%) (X 4 ), and harvest number (X 5 ) on polyphenolic class content and antioxidant activity (ABTS and DPPH) obtained in extracts evaluated for sugarcane straw according to the factorial experimental design.The chosen models for hydroxycinnamic acids and flavones moderately explain the effects of independent variables, since they presented an adjusted R-squared value of 0.71 and 0.78, respectively (Table 5).The lower adjustment for the polynomial model for these two classes could be due to the high variability observed for each compound, since not all presented the same behavior.Due to the redundant effect, the quadratic term for the geographic area was not considered in the polynomial model.

Estimated Effect of β
The predictive model for the hydroxycinnamic acids was significantly affected by the linear and quadratic term of "harvest number" (X 5 and X 5 2 ) and interactions between linear terms for the different variables (X 1 X 3 , X 2 X 3 , X 2 X 4 , X 2 X 5 , and X 3 X 5 ). Figure 2 represents the combination of variety and harvesting, which had a more substantial effect on hydroxycinnamic acid content.Holder plants (6th-7th harvest) tend to have more of those compounds in combination with varieties like SP813240, SP81340, and CTC9001.
The flavones were significantly affected by the quadratic term "harvest number" (X 5 2 ) and by the interactions between linear terms of "harvesting date", "harvest number", "variety", and "geographic area" (X 1 X 5 , X 2 X 5 , and X 3 X 5 ).In this class, the combination of "harvesting" and "collection date" showed that straw from younger plants (1st harvest) harvested between May and August tends to have more flavones (Figure 2).
Antioxidant activity, as assessed using the ABTS method, exhibited significant sensitivity to the linear and quadratic components of "collection time" (X 1 and X 1 2 ) by the quadratic term of "variety" (X 2 2 ) and the interactions among the linear factors of "harvesting date", "harvest number", "variety", and "geographic area" (X 1 X 2 , X 1 X 5 , and X 2 X 3 ).In contrast, the DPPH method did not reveal any notable effects associated with the variables examined.
Within the constraints of the extraction conditions used, this study sought to maximize the extraction of sugarcane straw phenolic compound content.Each projected response was converted using this method into a dimensionless partial desirability function, di, whose values range from 0 for a completely undesired response to 1 for a fully desired response.
For all responses in the current investigation, only one ideal condition was attained: straw from variety CU0618 collected at Guariba near the end of the harvest season (October 2020), using older plants after seven harvestings and with a high level of borer infection (13.81%) (Table 6).Within the constraints of the extraction conditions used, this study sought to maximize the extraction of sugarcane straw phenolic compound content.Each projected response was converted using this method into a dimensionless partial desirability Table 6.Prediction values (±confidence intervals at a 95% confidence level) and desirability for the optimal content of polyphenol classes (hydroxybenzoic acids, hydroxycinnamic acids, and flavones, (µg/g dry extract) for the best harvesting conditions considering collection date, variety, geographic area, borer infection level and harvest number according to the central composite design (CCD).The desirability function D = 1.0 was used to determine the ideal conditions and anticipated values.A positive value for D (>0) denotes that all replies are concurrently in a suitable range.The response numbers are around the goal values because values near 1 suggest that the sum of the various criteria is a global maximum.The predicted values are presented in Table 6, with 977.59 µg/g for hydroxybenzoic acids, 1336.16µg/g for hydroxycinnamic acid, and 1660.49µg/g for flavones.For the antioxidant activity, the IC 50 was 4.84 and 9.96 mg/mL.

Artificial Neural Network (ANN) Modeling
ANNs are a complex optimization and simulation tool with high potential for prediction.According to several published findings, ANN outperforms RSM in terms of its prediction powers [39,40].As a result, a nonlinear connection between the five input (independent) variables and responses (target outputs) was defined by creating an ANN-based model using a feed-forward back-propagation technique and a topology optimization procedure.
The data from the fifty-six experimental points utilized to create the RSM model were used to train and validate neural networks.Three layers of neurons coupled the feed-forward technique with the multilayer perceptron networks.The first (input) layer of neurons comprised five components, each of which stood for an independent variable.The intermediate (hidden) layer was built to create a model with the least variation between anticipated and experimentally obtained values, and the intermediate (hidden) layer was built.Twenty neurons were present in the intermediate layer of the created ANN model.Three dependent variables were represented by five neurons in the third (output) layer.The model with the highest coefficient of determination (R 2 ) and the lowest root mean square error (RMSE) as indicators of the best validation statistics were chosen.
Networks were constructed, and the findings for the best MLP networks (training, test, and validation data), which were chosen based on their performance during the network construction, are shown in Table 7.The names of neural networks indicate how many neurons are present in the input, hidden, and output layers.The gathered results show that in general, the MLP network could recognize and simulate the effect of the input variables on the intended outputs.The input variables that are particularly crucial in the constructed models for the accurate prediction of the desired output variables were found using sensitivity analysis.Sensitivity analysis was carried out for the MLP models for all target outputs.The results are presented in Figure 3 and demonstrate that geographic area, variety, and collection date were the three variables with higher impact on sugarcane straw polyphenol variation for all MPL models.
The validation process employed herein involved the utilization of 15% of the dataset, a choice made due to practical constraints.However, it is essential to acknowledge that this approach may not fully capture the inherent complexities of the entire dataset.Recognizing the imperfections inherent in this validation strategy, leave-one-out cross-validation (LOOCV) would have been a preferable alternative for this specific dataset.LOOCV, by systematically leaving out individual data points during the validation process, offers a more exhaustive and robust evaluation of model performance.While our chosen validation approach was pragmatic, the inclusion of this limitation underscores the need for future investigations to consider employing more comprehensive validation methodologies, such as LOOCV, to further enhance the rigor and generalizability of our findings.
The input variables that are particularly crucial in the constructed models for the accurate prediction of the desired output variables were found using sensitivity analysis.Sensitivity analysis was carried out for the MLP models for all target outputs.The results are presented in Figure 3 and demonstrate that geographic area, variety, and collection date were the three variables with higher impact on sugarcane straw polyphenol variation for all MPL models.Root mean square error (RMSE) and coefficient of determination (R 2 ) were also used to compare the RSM and ANN models.Due to its ubiquitous ability to mimic nonlinear systems, ANN has a substantially higher predictive capability than RSM according to the results.In contrast, RSM is only valid for systems with a second-order polynomial regression structure.The RSM requires numerous runs under a standard experimental design for multi-response optimization.However, the ANN can calculate multiple responses in a single run and is independent of the experimental design [41].To optimize the harvesting conditions of sugarcane to produce the straw extract with a high content of phenolics, the ANN architecture is therefore superior to the RSM model in terms of predictability.It fits the measured responses (hydroxybenzoic acids, hydroxycinnamic acids, and flavones content) (Table 8).
The network MLP 20-5-5 allowed us to get the best fit for all the polyphenolic classes and antioxidant capacity (ABTS and DPPH).
Significant sugarcane losses are a result of biotic stress, and it has been estimated that around 10% of these crop losses are attributable to insect pests, the most significant of which is the sugarcane stem borer (Diatraea saccharalis Fabr., Lepidoptera, Crambidae).The plant's response to this pest still needs to be fully comprehended.Some authors suggest that the mechanism behind plant protection against insect damage involves the activation of defense proteins.It was reported that sugarcane leaf phenolic extracts show increased proteins after a stimulus with oral secretions from Diatraea saccharalis [42,43].
Sugarcane plants may respond to injury by producing secondary compounds, like phenols.A recent study demonstrates a rise in chlorogenic acid and other caffeic acid conjugates produced in the sugarcane leaves of the SP791011 variety in response to herbivory by Diatraea saccharalis [44].Chlorogenic acid is an intermediate in forming insoluble phenolic compounds (e.g., lignin) associated with plant resistance to stress.During herbivory, higher accumulations of chlorogenic acid have been described as an essential defense metabolite in plants [43].Higher levels of intensity of sugarcane borer infestation can also contribute to the accumulation of phenolic compounds through the action of pathogens that cause red rot.SP80-3280 plants infected with sugarcane borer (19-25%) showed increased phenolic content [45].Through a process known as ratooning, sugarcane may grow again after being harvested, resulting in repeated harvests of the same crop, usually every 11 to 16 months.During each harvest, sugarcane plants produced less sugar, and this phenomenon was related to management approaches that increased the pressure from pests, diseases, and weeds, reduced soil fertility, compacted the soil, and physically damaged the crop during harvest [46].We have yet to be aware of any study investigating the influence of the number of sugarcane harvests might have on sugarcane polyphenol content.According to the polynomial model, the "harvest number" variable had a strong influence on the variability of all phenolic classes (Table 6) and the best harvest conditions indicated that after the seventh harvesting (Table 6), the sugarcane straw will contribute to an extract richer in phenolic compounds.Based on the appointed factors associated with "ratooning", extracts richer in polyphenols should be expected for more harvests since factors like susceptibility to diseases or physically damaged crops tend to produce those secondary metabolites.Mechanically damaged plants create a physical barrier to prevent tissue destruction, including synthesizing polyphenols such as lignin and suberin [43,47].
According to the ANN modeling, the harvest month and variety and geographic area represent the three main variables affecting the phenolic compound content variability (Figure 3).
For meteorological analyses, winter is considered as the quarter of June, July, and August.The winter of 2020 in the capital of São Paulo had rain and above-average temperatures according to measurements carried out by INMET at the meteorological station of Mirante de Santana, in the north of the city of São Paulo.The total rainfall was 198.2 mm, 30% above the historical average.The average maximum temperature was 25.2 • C, and the minimum was 14.7 • C, values that were 1.6 and 1.7 • C, respectively, above the historical average.However, between the months of this study (June-November), when looking for the historical data of precipitation and air temperature, the maximum registered was between October and November with 35 mm and 36.2 • C, respectively.Also, the global radiation was higher in September-December, with a maximum of 3972 Kj/m 2 in November [48].
In this experiment, the harvesting season was a variable that influenced flavone content, exhibiting higher values when plants were harvested between May and September, a period during which the precipitation was low (Figure 4A).As the polynomial model (Table 6), straw gathered in October was the best month for extracting polyphenols, with the prior months being dry.According to a recent study on sugarcane juice, the crop season (year and season) mainly influenced flavones by lowering rainfall volume in the months before harvest [20].Activation of the phenylpropanoid biosynthesis pathway in response to drought stress has been observed in several plants, which supports the current study's findings [29].Once gathered in the cytoplasm, flavonoids can detoxify H 2 O 2 molecules produced by drought stress [49].
The maximum amount of global radiation was observed in August-September, and higher average temperatures in September (Figure 4B,C) may also have contributed to the predicted higher polyphenolic content in plants harvested in October according to the meteorological data collected during the study (June-November 2020).Plants produce polyphenols to defend themselves under stress, which can be brought on by radiation, heat, dehydration, etc.As UV-B screens, polyphenols shield the plant from ionizing radiation [5].
In Brazil, there are three main breeding programs run by the Sugarcane Technology Center (CTC varieties), a private company, Instituto Agronômico de Campinas (IAC varieties), supported by the government of São Paulo state, and the Inter-University Network for the Sugar and Ethanol Development (RIDESA-RB varieties), supported by the federal government.In 2015, the variety census for São Paulo state indicated that the four most planted varieties were RB966928 (18%), RB867515 (16%), RB92579 (10%), and RB855156 (7.8%).Two other kinds, RB855453 and SP81-3250, were among the five most cultivated, but not among the five most planted.More resistant types are replacing SP81-3250 because of its vulnerability to orange rust.RB92579 and CTC4 (tenth most cultivated and fifth most planted) might rise in the coming years, given that they were among the five most planted varieties [50].
New sugarcane varieties with greater yields are continually being developed and tested to increase the productivity of the Brazilian industry.An appropriate sugarcane variety should be well suited to local changes in temperature, soil type, and cutting technique (manual or mechanical) or ratooning.It should be resistant to pests, infections, and water stress and accumulate high concentrations of sucrose [51].
Plant polyphenol composition is highly dependent on the growing environment, and plants from different geographical regions have demonstrated higher variations in phenolic content due to the different climates and soil compositions [52].Phenolic compounds and olive leaves calcium, nitrogen, and sodium contents were positively associated.These components have been raised by plants to speed up photosynthetic rate, New sugarcane varieties with greater yields are continually being developed and tested to increase the productivity of the Brazilian industry.An appropriate sugarcane variety should be well suited to local changes in temperature, soil type, and cutting technique (manual or mechanical) or ratooning.It should be resistant to pests, infections, and water stress and accumulate high concentrations of sucrose [51].
Plant polyphenol composition is highly dependent on the growing environment, and plants from different geographical regions have demonstrated higher variations in phenolic content due to the different climates and soil compositions [52].Phenolic compounds and olive leaves calcium, nitrogen, and sodium contents were positively associated.These components have been raised by plants to speed up photosynthetic rate, promote plant development, and boost resistance to drought stress [53].An example is the olive leaf's total phenolic level which decreases as geographic altitude decreases.The primary reason for this behavior was attributed to climate variations.Phenolic compounds tend to be less abundant in the leaves of trees grown in windy, humid environments (close to sea level) and more abundant in high-altitude environments terrestrial and Mediterranean climates, where there are sizable annual temperature variations [11].The soil nutrient's influence on plant growth greatly depends on the relationship between water and air in the soil pores.A recent study on different soil media in Hibiscus sabdariffa var.growth and phenolic composition showed no significant influence [54].
In this study, the distance between the two cities was 274 km (Guariba-Valparaiso, São Paulo).The climate at Valparaíso-SP is considered tropical, and soils were classified as sandy loam soil [55].In Guariba-SP, the soil was classified as clayey according to the Brazilian System of Soil Classification-SIBCS [4].Valparaíso is characterized as having a higher temperature frequency than Guariba [55].
In wines, for example, it was reported that soil influences vine water status through sandy soils, which have lower water-holding capacity, resulting in wines richer in anthocyanins [56].Sandy soils are more prone to soil health degradation than clayey soils, and healthier soils were associated with higher sugarcane stalk yields [3].Based on such presumptions, it should be predicted that plants from Valparaíso will have more polyphenols than plants from Guariba, although other factors associated with the region, such as the climate, may be more important in explaining the polyphenolic difference between them than soil characteristics.
This study presents some limitations, including limited generalizability due to specific conditions in Guariba and Valparaíso and the complexity of models.Findings may not extend to diverse regions or settings with different environmental factors or sugarcane varieties.While optimizing polyphenol extraction, advanced models like RSM and ANN can be resource-intensive and challenging to implement in practice in resource-constrained agricultural settings, potentially limiting their widespread applicability and interpretation.

Conclusions
In this study, the impact of five biotic and abiotic stresses (geographic area production (Guariba, Valparaíso), level of borer insect (Diatraea saccharalis) infection, harvest number (first to seventh), harvest season and plant variety) were evaluated in sugarcane straw as a potential by-product for natural extract production richer in polyphenols.The ideal extraction settings were found using a response surface methodology (RSM) based on a central composite design to optimize the yield of all target compounds concurrently.The optimal conditions were plants from the CU0618 variety collected at Guariba in the dry season (October 2020), using holder plants (seventh harvest), with a level of borer infection of 13.81%.The three most significant factors for the richness of sugarcane straw polyphenols, according to the artificial neural network (ANN) model, are season, region, and variety.
Compared to RSM models, the ANN model had better coefficients of determination values, indicating a superior potential for prediction.The study's findings that are being presented advance our understanding of the extraction of essential compounds needed for further development, separation, purification, and scale-up processes at the industrial level.
In practice, the developed model aid in monitoring year-round biotic and abiotic conditions, offering predictive insights for strategic straw usage in extract production.These findings highlight the potential for utilizing sugarcane straw as a source of valuable polyphenols, contributing to the development of sustainable practices in the sugar production industry.

Figure 1 .
Figure 1.Sampling plan for the sugarcane straw biotic and abiotic effects evaluated in polyphenol content.Figure 1. Sampling plan for the sugarcane straw biotic and abiotic effects evaluated in polyphenol content.

Figure 1 .
Figure 1.Sampling plan for the sugarcane straw biotic and abiotic effects evaluated in polyphenol content.Figure 1. Sampling plan for the sugarcane straw biotic and abiotic effects evaluated in polyphenol content.

Figure 2 .
Figure 2. Response surface plot for polyphenol classes (hydroxybenzoic acids, hydroxycinnamic acids, and flavones) quantified in extracts from sugarcane straw harvested under different biotic and abiotic conditions according to the fractional experimental design.

Figure 2 .
Figure 2. Response surface plot for polyphenol classes (hydroxybenzoic acids, hydroxycinnamic acids, and flavones) quantified in extracts from sugarcane straw harvested under different biotic and abiotic conditions according to the fractional experimental design.

Figure 3 .
Figure 3. Sensitivity analysis for neural network models that successfully predict phenolic compound classes hydroxybenzoic acids, hydroxycinnamic acids, and flavones and antioxidant activity (ABTS and DPPH) in extracts produced from sugarcane straw considering collection date, variety, geographic area, borer infection level (%), and harvest number.

3. 4 .
Comparison between RSM and ANNHere, the prediction performance and estimate skills of the RSM and ANN models were examined.The predicted values of the three target responses (Y 1 , Y 2 , and Y 3 ) from the ANN model were statistically evaluated by creating comparative similarity plots.The results show that the ANN model outperformed the RSM model in terms of accuracy, precision, and estimate skills when fitting experimental data to all target answers.The RSM model showed more variance in the residuals, which are the differences between anticipated and actual values, than the ANN model, which showed stable residuals with less change.

Figure 4 .
Figure 4. Monthly precipitation (A), air temperature (B), and global radiation (C) from 2020 at the station of Mirante de Santana, São Paulo, Brazil.

Figure 4 .
Figure 4. Monthly precipitation (A), air temperature (B), and global radiation (C) from 2020 at the station of Mirante de Santana, São Paulo, Brazil.

Table 1 .
Phenolic compounds identified in sugarcane straw extracts.

Table 2 .
(a) Polyphenols identified with concentrations > 20 µg/g extract in sugarcane straw extracts of plants collected in the Guariba area with borer infection (BHI) and low infection (BLI) between June and August 2020 from different harvests (1st-7th).Values represent the average ± standard deviation.(b) Polyphenols identified with concentrations > 20 µg/g extract in sugarcane straw extracts of plants collected in the Guariba area with borer infection (BHI) and low infection (BLI) between September and November 2020 from different harvests (1st-7th).Values represent the average ± standard deviation.

Table 3
. (a) Polyphenols identified with concentrations > 20 µg/g extract in sugarcane straw extracts from plants collected in the Valparaiso area with borer high-infection (UHI) and low-infection (ULI) levels between June and August 2020 from different harvests (1st-7th).Values represent the average ± standard deviation.(b) Polyphenols identified with concentrations > 20 µg/g extract in sugarcane straw extracts from plants collected in the Valparaiso area with borer high-infection (UHI) and low-infection (ULI) levels between September and November 2020 from different harvests (1st-7th).Values represent the average ± standard deviation.

Table 4 .
Antioxidant activity (ABTS and DPPH) measured in sugarcane straw extracts from plants collected in the Guariba and Valparaiso areas with borer high-and low-infection levels from plants harvested between June and November 2020 from different harvests (1st-7th).

Table 7 .
The optimal multilayer perceptron (MLP) networks developed for single-variable outputs.

Table 8 .
Comparison of optimization and prediction capabilities of response surface methodology (RSM) and artificial neuron network (ANN) for the extraction of phenolic compounds, organized by class, and for antioxidant activity (ABTS and DPPH) from sugarcane straw harvested under different biotic and abiotic conditions.