Next Article in Journal
Quality, Nutritional Composition, and Antioxidant Potential of Muffins Enriched with Flax Cake
Next Article in Special Issue
One-Pot Improvement of Stretchable PEDOT/PSS Alginate Conductivity for Soft Sensing Biomedical Processes
Previous Article in Journal
Energy Production and Process Costing for Biomass Obtained from Underutilized Plant Species in México and Colombia
Previous Article in Special Issue
AI-Driven Maintenance Optimisation for Natural Gas Liquid Pumps in the Oil and Gas Industry: A Digital Tool Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Intelligence and Extraction of Bioactive Compounds: The Case of Rosemary and Pressurized Liquid Extraction

by
Martha Mantiniotou
1,
Vassilis Athanasiadis
1,
Konstantinos G. Liakos
2,
Eleni Bozinou
1 and
Stavros I. Lalas
1,*
1
Department of Food Science and Nutrition, University of Thessaly, Terma N. Temponera Street, 43100 Karditsa, Greece
2
Department of Electrical and Computer Engineering, University of Thessaly, Sekeri Street, 38334 Volos, Greece
*
Author to whom correspondence should be addressed.
Processes 2025, 13(6), 1879; https://doi.org/10.3390/pr13061879
Submission received: 19 May 2025 / Revised: 10 June 2025 / Accepted: 12 June 2025 / Published: 13 June 2025

Abstract

Rosemary (Rosmarinus officinalis or Salvia rosmarinus) is an aromatic herb that possesses numerous health-promoting and antioxidant properties. Pressurized Liquid Extraction (PLE) is an efficient, environmentally friendly technique for obtaining valuable compounds from natural sources. The optimal PLE conditions were established as 25% v/v ethanol at 160 °C for 25 min, and a liquid-to-solid ratio of 10 mL/g. The optimal extract exhibited high polyphenol and antioxidant content through various assays. The recovered bioactive compounds possess potential applications in the food, pharmaceutical, and cosmetics sectors, in addition to serving as feed additives. This research compares two distinct optimization models: one statistical, derived from experimental data, and the other based on artificial intelligence (AI). The objective was to evaluate if AI could replicate experimental models and ultimately supplant the laborious experimental process, yielding the same results more rapidly and adaptably. To further enhance data interpretation and predictive capabilities, six machine learning models were implemented on the original dataset. Due to the limited sample size, synthetic data were generated using Random Forest (RF)-based resampling and Gaussian noise addition. The augmented dataset significantly improved the model performance. Among the models tested, the RF algorithm achieved the highest accuracy.

1. Introduction

Rosemary (Rosmarinus officinalis L.), a perennial species of the Lamiaceae family, is distinguished for its distinctive scent, culinary use, and therapeutic properties [1]. It has been established by phylogenetic studies that rosemary is classified within the genus Salvia, specifically referred to as Salvia rosmarinus [2]. Rosemary originates from the Mediterranean region; however, it has been cultivated successfully in numerous other locations globally [3]. This is an aromatic plant characterized by its needle-like foliage, widely cultivated around the world [3]. Rosemary’s therapeutic properties have been utilized in traditional folk medicine to address a range of ailments, such as pain relief, headaches, stomach discomfort, respiratory disorders, and others [1,4,5,6].
Several methodologies have been investigated for the recovery of bioactive compounds, mainly rosmarinic acid and carnosic acid, from rosemary leaves at a laboratory scale. Certain extraction processes, especially conventional methods, are often associated with various disadvantages, including the utilization of hazardous solvents, the degradation of target compounds resulting from elevated temperatures, prolonged extraction durations, challenges in implementation, and significant economic and energy expenditures. In recent years, the concepts of “Green chemistry” and “eco-extraction” have emerged [7]. Recent studies indicate that extraction processes have become more energy-efficient, safer for users, and environmentally friendly compared to previous methods, all while maintaining extraction efficiency. The intensification of extraction processes, considering these various aspects, should emerge as a new challenge for the design of such processes.
Despite extensive research on rosemary leaf extracts, gaps remain in exploring less-studied aspects and emerging opportunities driven by technological advancements. One such critical area is the use of green, non-toxic solvents for the sustainable recovery of bioactive compounds from plant materials. This study aims to bridge this gap by investigating the combined effects of green solvent mixtures, optimizing extraction conditions, and evaluating their performance. Rosemary leaves were subjected to Pressurized Liquid Extraction (PLE), which combines elevated pressure to enhance mass transfer and elevated temperatures to help facilitate the diffusion of the solvent into the sample by diminishing its viscosity [8]. The study examined the influence of eco-friendly solvent mixtures, specifically water and ethanol, alongside key process parameters such as temperature and extraction duration. Additionally, a partial least squares (PLS) model was utilized to identify the optimal extraction conditions.
In parallel, the growing integration of artificial intelligence (AI), particularly machine learning (ML) and Deep Learning (DL), has enabled more accurate modeling, prediction, and optimization across the food, beverage, pharmaceutical, and cosmetic industries. ML techniques are increasingly applied in bioactive compound prediction, formulation optimization [9], sensory analysis, and green extraction process modeling [10]. Recent studies demonstrate the effectiveness of algorithms such as Random Forest (RF), Support Vector Machines (SVMs), and Artificial Neural Networks (ANNs) in predicting antioxidant capacity, total polyphenol content (TPC), and other physicochemical properties from experimental variables [11]. In this context, the present study incorporates multiple machine learning approaches to predict the antioxidant potential of rosemary extracts under varying PLE conditions, thereby enhancing process efficiency and supporting sustainable product development in food, nutraceutical, and cosmetic applications [12].
While ML methods have been increasingly used for modeling extraction processes, most prior studies rely on relatively large experimental datasets or focus on specific extraction methods with extensive data availability. In contrast, PLE of rosemary leaves is a process with high experimental cost and limited data availability, which hinders the effective application of conventional ML techniques. This study addresses this gap by integrating ML models with data augmentation strategies to enhance model robustness under small-sample conditions. To our knowledge, this is one of the first studies applying such an approach to optimize green extraction of bioactive compounds from rosemary, offering insights that can support broader adoption of AI-assisted extraction workflows.

2. Materials and Methods

2.1. Chemicals and Reagents

A deionizing column was used to produce deionized water for all the experiments performed. The deionized column contains mixed-bed ion exchange resin, ensuring conductivity below 1 µS/cm, with a standard flow rate and operating pressure. All polyphenolic standards for the HPLC determination, along with L-ascorbic acid (99%), 2,4,6-tris(2-pyridyl)-s-triazine (TPTZ) (≥98%), 2,2-diphenyl-1-picrylhydrazyl (DPPH), and hydrochloric acid (37%), were bought from Sigma-Aldrich (Darmstadt, Germany) and were at least 97% purity or higher. Acetonitrile was acquired from Labkem (Barcelona, Spain). Sodium carbonate (anhydrous, 99.5%), rutin (≥94%), and formic acid (99.8%) were bought from Penta (Prague, Czech Republic). Iron (III) chloride hexahydrate (97%) was obtained from Merck (Darmstadt, Germany). Folin–Ciocalteu reagent, gallic acid (97%), and ethanol (99.8%) were acquired from Panreac Co. (Barcelona, Spain).

2.2. Rosemary Leaves’ Raw Material and Sample Preparation

Rosemary leaves were obtained from a local plant shop from the Karditsa region (Central Greece). The rosemary leaves were washed carefully and manually dried with paper towels. Then, they were subjected to lyophilization through a Biobase BK-FD10 (Jinan, China) freeze-drier. The moisture content was determined as 53.2 ± 3.8%. Then, the dried material was sieved in an Analysette 3 PRO (Fritsch GmbH, Oberstein, Germany) sieving machine, and the powder, consisting of an average particle diameter of 497 μm, was obtained. The obtained powder was kept in a freezer at up to –40 °C until further analysis.

2.3. Experimental Design

A custom-designed Response Surface Methodology (RSM) with four factors at five levels was employed to optimize the extraction conditions for TPC, antioxidant activity (FRAP and DPPH assays), and ascorbic acid content (AAC) using the Pressurized Liquid Extraction (PLE) technique on rosemary powder. A Pressurized Liquid Extraction (PLE) system (Fluid Management Systems, Inc., Watertown, MA, USA) was used to facilitate all extractions. The independent variables examined included the ethanol concentration (C, % v/v) as X1, liquid-to-solid ratio (R, mL/g) as X2, extraction temperature (T, °C) as X3, and extraction time (t, min) as X4, each assigned five levels. To assess the method’s repeatability, 17 experimental runs, including one central point, were conducted, with each run replicated three times, and the average response values were documented for subsequent analysis.
Stepwise regression was utilized to refine the model’s predictive precision by reducing variance from superfluous term estimation, leading to a second-order polynomial equation that delineates the interactions between the three independent variables:
Y k = β 0 + i = 1 2 β i X i + i = 1 2 β ii X i 2 + i = 1 2 j = i + 1 3 β ij X i X j
where the independent variables are denoted by Xi and Xj, and the predicted response variable is defined by Yk. In the model, the intercept and regression coefficients β0, βi, βii, and βij represent the linear, quadratic, and interaction terms, respectively.

2.4. Total Polyphenolic Content (TPC) Determination Through Spectrophotometric Evaluation

The Folin–Ciocalteu methodology [13] was used to evaluate TPC and express the results in milligrams of gallic acid equivalents (GAEs) per gram of dry weight (dw). A calibration curve (10–100 mg/L of gallic acid, R2 = 0.9996) in water was used to assess the results. Briefly, after mixing 100 μL of the properly diluted extract with 100 μL of the Folin–Ciocalteu reagent for 2 min, 800 μL of a 5% w/v sodium carbonate solution was subsequently added. Following a 20 min incubation at 40 °C, in the absence of light exposure, the absorbance of the solution was measured at 740 nm in a Shimadzu UV-1900i UV/Vis spectrophotometer (Kyoto, Japan). Sample incubation at 40 °C was conducted utilizing an Elmasonic P70H ultrasonic bath from Elma Schmidbauer GmbH (Singen, Germany). Each analysis was performed in triplicate and the average was used to assess the results.

2.5. Ferric-Reducing Antioxidant Power (FRAP) Evaluation of Antioxidant Activity

A previously established study provides a thorough description of the method used to test the antioxidant capacity of the extracts utilizing the common electron-transfer method [13]. This method entailed identifying the decrease in the iron oxidation state from +3 to +2. Briefly, 50 μL of the properly diluted sample was combined with 50 μL of FeCl3 solution (4 mM in 0.05 M HCl). Subsequently, the samples were incubated at 37 °C for 30 min. After a 5 min interval, 900 μL of TPTZ solution (1 mM in 0.05 M HCl) was added, and the absorbance was measured at 620 nm. A calibration curve of ascorbic acid (50–500 μM in 0.05 M HCl, R2 = 0.9997) was utilized, and the results were expressed as μmol of ascorbic acid equivalents (AAEs) per gram of dw. Each analysis was performed in triplicate and the average was used to evaluate the results.

2.6. Evaluation of Radical Scavenging Activity

A previously described assay [14] for DPPH scavenging was employed. The absorbance at 515 nm was initially measured immediately and 30 min later by combining 25 μL of properly diluted sample extract with 975 μL of DPPH solution (100 μmol/L in methanol). A calibration curve of the antiradical activity of ascorbic acid (100–1000 μmol/L in methanol, R2 = 0.9926) was used, and the results were expressed as μmol of ascorbic acid equivalents (AAEs) per gram of dw. Each analysis was performed in triplicate and the average was used to evaluate the results.

2.7. HPLC Quantification of Polyphenolic Compounds

High-Performance Liquid Chromatography coupled with Diode Array Detector (HPLC-DAD) identification of individual polyphenols from the rosemary leaves’ extracts was based on our prior research [15]. The liquid chromatograph (model CBM-20A) and diode array detector (model SPD-M20A) utilized in this investigation were supplied by Shimadzu Europa GmbH, Duisburg, Germany. The detection wavelength ranges from 200 to 800 nm. The compounds were injected at a volume of 20 μL and separated at 40 °C using a Phenomenex Luna C18(2) column (100 Å, 5 μm, 4.6 mm × 250 mm) from Phenomenex Inc. in Torrance, CA, USA. The mobile phase consisted of 0.5% formic acid in acetonitrile (B) and 0.5% formic acid in aqueous solution (A). The gradient program involved a gradual initiation from 0 and increase to 40% B, followed by 50% B for 10 min, 70% B for another 10 min, and a constant value for 10 min. The mobile phase flow rate was kept constant at 1 mL/min. By comparing the absorbance spectrum and retention time to those of purified standards, the compounds were identified and subsequently quantified using calibration curves (0–50 μg/mL).

2.8. Ascorbic Acid Content (AAC)

The ascorbic acid content of the samples was quantified as mg/g of dry weight, as previously described by Athanasiadis et al. [15]. A total of 500 μL of 10% (v/v) Folin–Ciocalteu reagent and 100 μL of sample extract were combined with 900 μL of 10% (w/v) trichloroacetic acid in an Eppendorf tube. The absorbance was promptly assessed at 760 nm following 10 min of storage in darkness.

2.9. Statistical Analysis

The RSM and distribution analysis were statistically evaluated utilizing JMP® Pro 16 software (SAS, Cary, NC, USA). The Kolmogorov–Smirnov test assessed the normality of the data. ANOVA and the Tukey HSD multiple comparison test were employed to ascertain any significant differences. The results were reported as means accompanied by measures of variability.

2.10. Initial Data Set Exploration and Visualization

The initial dataset comprised 17 experimental samples of rosemary extract that, as mentioned, were evaluated under varying PLE conditions. For the development of ML models, four features were used as inputs: ethanol concentration (% v/v), liquid-to-solid ratio (mL/g), extraction temperature (°C), and extraction time (min). Additionally, four features were used as the outputs: TPC, FRAP, DPPH, and AAC. The initial dataset comprised only 17 samples. Of the data, 80% was allocated for training and 20% for testing our ML models. Figure 1 presents the distribution of experimental variables and antioxidant responses, illustrating balanced sampling across extraction parameters and greater variability in antioxidant outcomes.
To assess variability and central tendencies, a combined boxplot was created (Figure 2). The extraction parameters (C, RL/S, T, and t) showed narrow interquartile ranges and symmetry, indicating controlled conditions. In contrast, the antioxidant responses—especially FRAP and DPPH—exhibited wide variability and outliers, reflecting greater sensitivity to extraction settings. TPC showed moderate spread, while AAC remained tightly clustered. These patterns suggest that antioxidant outcomes are more affected by experimental variation than the input parameters.
To assess the variation across extraction conditions and antioxidant responses, two complementary heatmaps were produced and are presented together in Figure 3. Plot (A) displays the raw value matrix, highlighting absolute differences among samples. The high-intensity region in the FRAP column (design point 13) corresponds to elevated antioxidant activity, also reflected in TPC. AAC values remained consistently low across all samples. Plot (B) shows the standardized (z-score) version of the same matrix, enabling scale-independent comparison. The same sample exhibited z-scores above +2 in FRAP and TPC, confirming its outlier status. Other samples with moderate DPPH or AAC responses became more distinct through normalization. Together, the heatmaps reveal both high-performing conditions and hidden patterns across the dataset.
To explore the variable relationships, a Pearson correlation matrix was computed (Figure 4). Strong positive correlations between TPC, FRAP, and DPPH (r = 0.81–0.91) indicate phenolics’ central role in antioxidant capacity. The ethanol concentration (C, %) was negatively correlated with both TPC and FRAP, suggesting diminishing returns at higher concentrations. The temperature and solvent ratio showed moderate positive correlations with AAC, while the extraction time had minimal influence on any response. These findings align with the heatmap results and underscore the compound-specific effects of extraction parameters.

2.11. ML Regressor Development

To be able to develop our ML-based regressors for our initial data set, we trained six regression algorithms that were applied to model the relationships between the extraction parameters and the antioxidant responses. The models included Linear Regression [16], Ridge Regression [17], Lasso Regression [18], RF regression, Gradient Boosting (GB) Regression [19], and Adaptive Boosting (AdaBoost) Regression [20].
Each model was implemented as a multi-output regressor. Hyperparameter tuning was conducted using grid search with 5-fold cross-validation, due to the limited data. Ridge and Lasso regressors were tuned for regularization strength ( α ) , while tree-based models were optimized for the number of estimators, maximum depth, learning rate for boosting models, and minimum samples per split. The full list of model parameters and their tested values is shown in Table 1.
The selected hyperparameters were chosen for their direct impact on model complexity, generalization, and performance. For regularized linear models, Ridge and Lasso, the regularization strength “ α ” controls the degree of penalty applied to large coefficients—helping to reduce overfitting, especially in small datasets. Smaller values allow more flexibility, while larger values enforce stronger shrinkage of less informative predictors. To further mitigate the risk of overfitting, all models were trained and evaluated using 5-fold cross-validation. In addition, regularization techniques (Ridge and Lasso penalties) and parameter tuning were applied to control model complexity and improve generalization performance, particularly given the small size of the original dataset.
In the tree-based models RF, GB, and AdaBoost, the “number of estimators” determines how many trees are used to build the ensemble; more trees generally improve performance but increase computation. The “maximum tree depth” controls how complex each tree can be, balancing fit versus overfitting. The “minimum samples split” parameter sets the minimum number of samples required to split a node, helping to regularize the model by preventing overly deep trees. The “learning rate”, used in boosting algorithms, scales how much each tree contributes to the final prediction—lower rates typically yield better generalization at the cost of longer training.

2.12. Machine Learning Regressor Evaluation

In this study, the performance of the regression models was evaluated using four standard metrics: Mean Absolute Error (MAE) [21], Mean Squared Error (MSE) [21], Root Mean Squared Error (RMSE) [22], and the Coefficient of Determination (R2) [23]. These metrics quantify the difference between the predicted values y i ^ and actual experimental values y i , based on a total of n observations.
MAE measures the average magnitude of errors in a set of predictions, without considering their direction. It is calculated as the mean of the absolute differences between actual and predicted values (2).
MAE   = 1 n i = 1 n y i y i ^
MSE penalizes larger errors more strongly by squaring them. It is the average of the squared differences between actual and predicted values (3).
MSE   = 1 n i = 1 n y i y i ^ 2
RMSE is the square root of the MSE and provides an error measure in the same units as the original response variable, making it more interpretable (4):
RMSE   = 1 n i = 1 n y i y i ^ 2
R2 represents the proportion of variance in the actual values that is predictable from the independent variables. A value of 1 indicates perfect prediction, while 0 means the model explains none of the variance (5):
R 2 = 1 i = 1 n y i y i ^ 2 i = 1 n y i y ¯ 2
These metrics together offer a robust framework for comparing the prediction accuracy, error dispersion, and explanatory power of each machine learning regressor tested in this study.

2.13. Generative Model Development

To address the limitations imposed by the small sample size (n = 17), a synthetic dataset was generated to enhance the modeling capacity and generalization of the machine learning algorithms. New input combinations were uniformly sampled within the observed range of the original extraction parameters: ethanol concentration (C, % v/v), liquid-to-solid ratio (RL/S, mL/g), extraction temperature (T, °C), and extraction time (t, min). A total of 100 synthetic input samples were created to expand the dataset in a balanced and controlled manner.
To estimate the corresponding antioxidant responses—total polyphenol content (TPC), ferric reducing antioxidant power (FRAP), DPPH radical scavenging activity, and ascorbic acid content (AAC)—a pre-trained RF model, previously identified as the best-performing regressor, was employed. RF is an ensemble learning method based on decision trees that captures nonlinear relationships by aggregating the predictions of multiple base learners. The predicted output y ^ for a given input vector x is computed as the average of predictions from all trees in the forest (6):
y ^ = 1 T t = 1 T f t x
where T is the total number of decision trees in the ensemble, and ft(x) is the prediction from the t-th tree for input x. In this study, RF was applied both as a predictive model and, in the generative phase, to estimate antioxidant outcomes for synthetically generated feature combinations.
To simulate natural variability and reduce overfitting to deterministic predictions, Gaussian noise was added to the RF-generated outputs. This augmentation mimics experimental uncertainty and improves the realism of synthetic samples. The final synthetic response y ~ was computed as follows (7):
y ~ = y ^ + ϵ , ϵ N 0 , σ 2
where ϵ is a noise term drawn from a normal distribution with zero mean and variance σ 2 . In this study, σ was set to 5% of the standard deviation of the respective real target variable, providing a balance between stability and variability.
The resulting synthetic data points were merged with the original experimental dataset to create a mixed dataset. The rationale for this approach is that the RF model captures complex nonlinear interactions in the original data, and the controlled addition of Gaussian noise (set at 5% of the standard deviation of each real target variable) introduces realistic variability while avoiding overfitting to deterministic predictions. This method balances model fidelity with enhanced generalization potential.
The resulting synthetic data points were merged with the original experimental dataset to create a mixed dataset. Due to the limitations of our available computational resources, the size of the synthetic dataset was intentionally kept small (100 samples) to perform an initial proof-of-concept evaluation. Future work will explore more extensive data augmentation using more advanced generative techniques and more powerful hardware.

3. Results and Discussion

3.1. Optimization of PLE Parameters

The extraction procedure may be challenging due to the presence of several distinct bioactive compounds which lead to variations in solubility and polarity [24]. Moreover, various processing parameters along with the extraction technique might significantly affect both the extract yield and antioxidant capacity. Table 2 presents how the variables under investigation affect the examined responses, while in Table 3 the ANOVA applied to the RSM quadratic polynomial model is presented.

3.1.1. Model Analysis

The following Equations (8)–(11) represent regression models related to the extraction process, predicting key response variables: total phenolic content (TPC), ferric reducing antioxidant power (FRAP), DPPH radical scavenging capacity, and ascorbic acid content (AAC). Each equation includes linear, quadratic, and interaction terms, highlighting the complex relationships between experimental factors. The models contain only significant terms. The regression models highlight the impact of solvent composition, temperature, and duration on extraction efficiency. Notably, the linear and quadratic terms suggest nonlinear relationships between variables, indicating optimal conditions for maximizing antioxidant yield. The FRAP and DPPH equations show a strong dependency on the extraction conditions, particularly the extraction time (X4). The presence of interaction terms suggests that the combined effects of multiple variables influence antioxidant potential, emphasizing the need for precise parameter optimization. Longer extraction times allow more bioactive compounds, including antioxidants, to dissolve into the solvent. The presence of quadratic terms (X42) and interactions (X2X4 and X1X4) suggests that the extraction time has an optimal range—too short may limit compound release, while excessive duration could lead to degradation or reduced efficiency. The interaction terms imply that the extraction time does not act alone. For example, X2X4 in DPPH suggests that time interacts with another variable, the liquid-to-solid ratio, to influence antioxidant capacity.
TPC = 11.74 + 0.49X1 − 1.22X2 + 0.69X3 + 1.51X4 − 0.006X12 + 0.017X22 − 0.002X32 + 0.002X1X2 − 0.016X1X4 − 0.002X2X3 − 0.015X2X4
FRAP = −58.84 + 6.67X1 − 1.73X2 + 3.51X3 + 63.39X4 − 0.107X12 − 1.31X42 + 0.11X1X2 − 0.038X1X3 − 0.44X2X4
DPPH = −330.79 + 4.24X1 − 6.60X2 + 10.86X3 + 23.97X4 − 0.041X12 + 0.089X22 − 0.036X32 + 0.061X1X2 − 0.018X1X3 − 0.182X1X4 − 0.029X2X3 − 0.274X2X4
AAC = 0.14 + 0.13X1 + 0.08X2 + 0.05X3 − 0.0007X12 − 0.0003X1X3
Figure 5 shows how each parameter and their combinations affect the responses of the parameters under study. The predicted optimal values of PLE parameters along with the predicted TPC and FRAP, DPPH, and AAC values, along with the desirability of the model, are presented in Table 4.

3.1.2. Impact of Extraction Parameters on Assays Through Pareto Plot Analysis

In a Pareto plot (Figure 6), the orthogonal estimate typically refers to a statistical method used to estimate the effects of different factors while minimizing the correlation between them. This approach helps in identifying the most significant contributors to a given outcome by ensuring that the estimates are independent of each other.
It seems that temperature (parameter X3) has a significantly positive effect on all responses. Another factor that seems to be very important is the solvent composition (X1), where increasing the percentage of ethanol has a negative effect on all responses except ascorbic acid, where it has a positive effect. It is worth noting that the extraction duration (X4) does not significantly affect any of the responses, but there is a trend where increased extraction times positively affect all responses.

3.2. Principal Component Analysis (PCA) and Multivariate Component Analysis (MCA)

The interactions between assays and extraction conditions were investigated through correlation analyses, which included PCA and MCA, as illustrated in Figure 7 and described in Table 5, respectively. The correlation analyses were conducted to ascertain the relationships between the variables and TPC, FRAP, DPPH, and AAC within the context of PCA. The chart demonstrates that PC1 and PC2 each contributed 67.6% and 24.9% of the variance, respectively, accounting for 92.5% of the variance. The analysis was deemed to be significantly influenced by the independent variables. The graph demonstrated that TPC, FRAP, DPPH, and extraction temperature and duration (X3 and X4) were positively correlated within both components and were represented in close proximity. AAC was considerably improved by the increased concentration of ethanol (X1) and liquid-to-solid ratio (X2), which explains their strong correlation. Their combined impact on extraction parameters was comparable. Conversely, the favorable placement of AAC in PC2, which is situated at a significant distance from the other variables, may indicate a diminished relationship between them. Previous research has suggested a positive correlation between an increase in ethanol concentration and AAC recovery [25].
In addition, the MCA provides further insights into the interrelationships between variables. The primary benefit of this approach is its ability to determine the degree of positive or negative correlation between the variables under investigation. Table 5 delineates the results of this investigation. The pattern of robust positive correlations (>0.77) between antioxidant assays and total phenolic content (TPC) was previously substantiated [26]. Ultimately, the negative correlation between ascorbic acid (AAC) and all other responses (TPC, FRAP, and DPPH) is highly emphasized. Nevertheless, it is particularly noteworthy that molecules exhibiting considerable antioxidant activity demonstrate a negative correlation with antioxidant assays.

3.3. Partial Least Squares (PLS) Analysis

The PLS model was employed to assess the influence of the extraction condition parameters (X1, X2, X3, and X4). Figure 8 illustrates the prediction profiler alongside a desirability function that features extrapolation control and includes a variable importance plot (VIP). The extraction of bioactive compounds is significantly influenced by various factors, with temperature, solvent composition, and extraction duration being the most critical [27]. Initially, it is important to note that the extraction process can be complicated by the differing solubility and polarity of polyphenols [28]. Concerning the PLE technique, it is evident that the X1 parameter exhibited the most statistically significant impact (p < 0.05) compared to other parameters in the extraction process, as demonstrated by the Variance Importance Plot (VIP) presented in Figure 8B. The observations previously noted from the 3D models of the response surface were corroborated in Figure 8A, indicating that the optimal concentration was 25% v/v aqueous ethanol, a liquid-to-solid ratio of 10 mL/g, and the optimal temperature was 160 °C. The high efficiency observed at 160 °C is likely due to the enhanced solubility of polyphenols and increased solvent penetration into plant material. Elevated temperatures reduce surface tension, improving mass transfer and extraction yield. Concerning the duration of extraction, it appeared to exert the least significant influence on the process; consequently, the longest duration was favored, as it favored AAC recovery. The extraction process was not significantly influenced by the temperature or extraction duration; nevertheless, elevated temperatures coupled with long extraction times were favored. The solute–matrix interaction can be significantly reduced by the PLE technique, which is primarily due to the influence of van der Waals forces or hydrogen bonds, particularly in the presence of elevated temperature and pressure. This reduces energy demands, improves the efficacy of solute molecular extraction, and decreases the viscosity of the solvent. This reduces the solvent’s resistance to the matrix, thereby facilitating its diffusion into the sample [29]. The model exhibited a prolonged extraction duration, as prior research has substantiated the effectiveness of both brief [30] and prolonged [28] intervals. While elevated temperatures facilitate the extraction of bioactive compounds by enhancing their solubility in other techniques, like stirring [31], it is important to note that many thermolabile compounds may experience degradation under these conditions [32].
Table 6 shows the values of TPC, FRAP, DPPH, and AAC of the optimal extract. The results of the present study are worth comparing with those of our previous work, where four different extraction techniques from rosemary leaves were studied, namely stirring, pulsed electric field (PEF)-assisted extraction, and ultrasound probe- and ultrasound bath-assisted extraction [31]. It is noteworthy that in that work, the highest TPC was given by stirring, and yet in the present work, PLE gave ~320% higher yield. A similar pattern was observed for FRAP, where PLE resulted in a ~455% higher yield. The highest value of DPPH was observed in ultrasound bath-assisted extraction; however, PLE gave a ~516% greater result. Regarding AAC, ultrasound probe-assisted extraction was the best value, and PLE only gave a ~10% greater performance than PEF-assisted extraction. Unlike conventional methods, PLE offers improved recovery of bioactive compounds in a shorter time frame, reducing energy consumption and solvent waste. A comparison with traditional extraction methods, such as ultrasound-assisted extraction (UAE), exhibits lower polyphenol recovery compared to PLE. Some other researchers also studied the TPC and antioxidant capacity of rosemary leaves. More specifically, Hashem Hashempur et al. [33] utilized a deep eutectic solvent, consisting of ammonium acetate and lactic acid, along with ultrasound, and their TPC was 334% lower than our result. Kabubii et al. [34] also determined a TPC in crude extracts, which was ~52% lower than ours. In general, PLE treatment of rosemary leaves seems to lead to higher yields than other extraction techniques.
The experimental results and PLS model predictions exhibit outstanding concordance, as evidenced by the high correlation coefficient of 0.981 and substantial R2 value of 0.962. Furthermore, the p-value being less than 0.0001 indicates that the deviations between the actual and predicted values are statistically insignificant.
Table 7 presents a list of the individual polyphenols identified in the optimal extract by HPLC-DAD, while Table 8 provides information on the equations of the standard compounds. The compound with the highest concentration is hesperidin, followed by rosmarinic acid and Quercetin 3-D-galactoside. In our previous work, the compound with the highest concentration was rosmarinic acid in all cases, and here it is worth noting how the parameters applied by each different technique during the extraction process greatly affect the profile of the final extracts obtained. However, the same compounds were also identified in this work. Other researchers, like Xie et al. [35], Sammer and Samarrai [36], Baptista et al. [37], and Miljanović et al. [38], determined compounds like hesperidin, apigenin and its derivatives, rosmarinic acid, and carnosic acid in rosemary leaves.

3.4. Performance of Machine Learning Regressors on the Original Data

In the following sections, we focus on reporting the key findings regarding the performance of the ML regressors, with an emphasis on practical insights relevant to extraction optimization. The detailed technical analysis is intentionally limited, in line with the overall scope of this experimental study.
From Figure 9, it can be observed that all regression models achieved relatively good performance on the training dataset. In particular, RF, GB, and AdaBoost demonstrated very high training accuracy, with GB achieving an R2 of 1.00 and RF and AdaBoost closely following with R2 values of 0.87 and 0.99, respectively. These results indicate that ensemble-based models fit the training data extremely well, though they may risk overfitting.
In contrast, Figure 10 reveals substantial performance degradation across all models when evaluated on the testing dataset. The Linear and Ridge regressors showed moderate predictive ability with test R2 scores around −3.29 and −1.31, respectively. Among all models, RF achieved the best test performance, with lower error scores, 0.81 MAE, 0.91 MSE, 0.91 RMSE, and a test R2 of −1.66, outperforming the other regressors under the given constraints.
Despite the overall lower performance on test data, RF was selected as the most efficient regressor due to its balance between training fit and test error, as well as its suitability for data generation in the augmentation phase that followed. Based on this model, synthetic data were produced and evaluated in combination with the original dataset.

3.5. Performance of Machine Learning Regressors on the Synthetic Dataset

Our next step was to compare our regressors based on our synthetic data. A synthetic dataset, consisting of 100 samples, was generated using RF-based predictions with Gaussian noise added to introduce controlled variability. This synthetic dataset maintained the same feature distribution as the original dataset to ensure comparability.
Ensemble-based models demonstrated high performance compared to linear models. Specifically, the GB regressor achieved the highest training accuracy with an R2 of 1.00, followed by the RF regressor with an R2 of 0.98 and AdaBoost with an R2 of 0.95. These models also recorded notably low error metrics, indicating an almost perfect fit to the training data. In contrast, Linear Regression, Ridge Regression, and Lasso Regression yielded identical training performance, each reaching an R2 of 0.71, which indicates a moderate capacity to model the underlying relationships within the synthetic data. Their MAE and RMSE values were also consistently higher than those of the ensemble models (Figure 11).
For the test set, the GB-based and RF-based regressors again outperformed the other models, with R2 values of 0.93 and 0.91 respectively, along with the lowest RMSE scores, suggesting strong predictive ability on unseen data. The AdaBoost regressor also performed well with an R2 of 0.88, although with slightly higher error values. Linear models showed consistent but comparatively limited predictive performance on the test data, with all three achieving an R2 of 0.70 and similar error magnitudes (Figure 12).

3.6. Performance of Machine Learning Regressors on the Mixed Dataset

To further evaluate the model performance in a data-rich scenario, we trained and tested our regressors using a mixed dataset consisting of both real experimental samples from our initial dataset and synthetically generated data.
The model training results showed clear performance differentiation among algorithm families. Ensemble models again demonstrated high accuracy. The GB-based regressor achieved the highest training performance, with an R2 of 1.00 and near-zero error values across all metrics: MAE = 0.03, MSE = 0.001, and RMSE = 0.04. The RF-based regressor followed with an R2 of 0.95 and comparatively low error values: MAE = 0.13, MSE = 0.05, and RMSE = 0.22. The AdaBoost regressor also performed well during training, with an R2 of 0.93. In contrast, the linear models—Linear Regression, Ridge Regression, and Lasso Regression—yielded nearly identical training outcomes, each with an R2 of 0.61 and RMSE of approximately 0.39 to 0.40. These results indicate that the linear models were only moderately successful in capturing the increased variance introduced through the mixed data (Figure 13).
The testing performance followed a similar trend but revealed more nuanced differences in generalization capability. The GB-based regressor achieved a test R2 of 0.76, while the RF-based regressor reached 0.84, indicating that both models retained strong generalization on unseen data. The AdaBoost regressor also maintained respectable performance with a test R2 of 0.74. Notably, the GB and RF regressors both achieved low RMSE values on the test set of 0.16 and 0.21, respectively, underscoring their effectiveness in handling diverse and noise-augmented data distributions. The linear models again exhibited limited predictive strength on the test set, with all three achieving an R2 from 0.59 to 0.60 and RMSE values ranging from 0.28 to 0.29 (Figure 14).

3.7. Cross-Evaluation of RF Models on Real, Synthetic, and Mixed Data

Based on the previous results, our best regressor is the RF mixed-based regressor which was developed based on the original and synthetic data. To further examine the robustness and generalization capabilities of the RF mixture-based model, we conducted a cross-dataset evaluation in which models trained on one dataset were tested on different datasets. The training datasets included the original experimental data, a synthetically generated dataset, and a mixed dataset combining both sources. Each model’s predictive accuracy was then assessed across all three datasets using standard performance metrics.
The variability introduced by the Gaussian noise in synthetic data was controlled and systematically evaluated, as described in Section 2.13, to ensure that the augmented dataset preserved meaningful variance while remaining consistent with the statistical characteristics of the original data.
When the model trained on the original dataset was tested on the synthetic dataset, it produced a test MAE of 0.60 and a RMSE of 0.80, with an R2 of 0.48. This indicated moderate generalization capacity to the artificial data. Slightly better performance was observed when the same model was evaluated on the mixed dataset, yielding a lower test RMSE of 0.62 and a marginally improved R2 of 0.43.
In contrast, the model trained solely on the synthetic data demonstrated poor performance on the original data, with an R2 of only 0.04 and an RMSE of 0.44, suggesting a substantial gap in representational fidelity between the synthetic and real data distributions. However, when the same synthetically trained model was tested on the mixed dataset, performance improved drastically, achieving an R2 of 0.86 and RMSE of 0.31. This highlights that the synthetic model generalized well within synthetic-heavy contexts but struggled with real experimental variability.
The model trained on the mixed dataset exhibited the strongest overall generalization. It achieved a low RMSE of 0.27 and a high R2 of 0.63 when tested on the original data. Most notably, it yielded the best cross-dataset performance when tested on the synthetic dataset, with an RMSE of 0.38 and an R2 of 0.88. With our new RF mixed regressor we had a 20% increase in the performance, which is satisfactory due to the lack of samples (Figure 15).
These results validate our objective of generating new samples, demonstrating that synthetic data can enhance the development of robust machine learning regressors for accurately predicting total phenolic content.

3.8. Feature Importance Analysis Across RF-Based Models

To investigate how the model training data influences the learned relationships between extraction parameters and antioxidant responses, feature importance scores were extracted from each RF model trained on the original, synthetic, and mixed datasets. These scores were derived from the individual estimators trained for each target (TPC, FRAP, DPPH, and AAC), and visualized side by side (Figure 16).
In the model trained on the original dataset, the most influential feature overall was temperature (T, °C), particularly for FRAP and TPC, where it accounted for over 35–40% of total importance. This aligns with experimental expectations, as thermal energy often enhances compound release. RL/S and C (%) also showed moderate contributions, while extraction time had the lowest influence across all targets.
In contrast, the model trained exclusively on synthetic data placed much greater emphasis on C (%), especially for FRAP (0.78) and TPC (0.72). This shift likely reflects the statistical bias introduced during synthetic generation, where concentration appeared as a dominant predictor due to its nonlinear interactions captured by RF. Meanwhile, T (°C) and t (min) showed very low influence (<0.10) across all targets.
In the mixed model, a balanced importance distribution emerged. C (%) again held strong predictive power for TPC and FRAP (~0.66), but now RL/S and T (°C) also gained relevance, particularly for AAC (0.57) and DPPH (0.39), indicating a more nuanced learning of underlying relationships. Time remained the least influential, consistent across all models.
This comparison reveals how the training dataset affects not only model accuracy but also which experimental parameters are deemed most critical. Mixed training produces models that are both accurate and biologically plausible, while synthetic-only training can exaggerate the significance of specific variables.
However, it should be noted that the feature importance results presented here are influenced by the synthetic component of the mixed dataset. As such, there is a potential risk of bias in the interpretation of variable importance, particularly for features that may exhibit amplified or diminished effects in synthetic samples. This limitation highlights the need for cautious interpretation and suggests that future studies should validate these findings using larger experimental datasets.

3.9. Actual vs. Predicted Performance Across RF-Based Models

To visually assess prediction accuracy and generalization, scatter plots comparing actual vs. predicted values were generated for all four antioxidant targets, TPC, FRAP, DPPH, and AAC, using the RF-based models trained on the original, synthetic, and mixed datasets (Figure 17). All values were plotted in standardized, z-score space, and each subplot includes the R2 as a quantitative measure of fit.
The model trained exclusively on the original dataset exhibited poor predictive performance across all targets. R2 values were consistently negative, indicating substantial overfitting and a lack of generalization to unseen data. DPPH with R2 = −3.11 and AAC with R2 = −1.40 showed a complete breakdown of predictive capacity, while even the best-performing targets, TPC with R2 = −1.16 and FRAP with R2 = −0.97, failed to demonstrate any meaningful alignment between predicted and actual values.
In contrast, the model trained on synthetic data produced improved results, with three of the four targets yielding positive R2 values. DPPH was best predicted with R2 = 0.86, followed by TPC with R2 = 0.51 and FRAP with R2 = 0.17. Nevertheless, AAC continued to exhibit poor predictability, with an R2 of −1.38. These results suggest that while synthetic data can partially capture the data structure of some antioxidant properties, it remains insufficient for accurately modeling targets like AAC without real data inputs.
The RF-based model trained on the mixed dataset, which combined both the original and synthetic samples, delivered the most balanced and reliable performance. TPC and DPPH both achieved strong fits with R2 values of 0.88, while FRAP also performed well with R2 = 0.70. AAC, however, remained a challenging target, with only a marginally positive R2 of 0.07. The mixed data approach thus demonstrated the strongest generalization overall, effectively integrating the empirical variability of real measurements with the expanded coverage of synthetically generated patterns.
These results highlight that while the RF-based mixed model improved the predictive alignment for TPC and DPPH, challenges remain in accurately modeling AAC, which may reflect inherent biological variability or limited representation in the training data.

3.10. Partial Dependence Analysis of RF-Based Models

To better interpret how individual input features influenced the model predictions, partial dependence plots (PDPs) were generated for each antioxidant response (TPC, FRAP, DPPH, and AAC) across the four predictors: concentration of solvent (C, %), solvent ratio (RL/S, mL/g), temperature (T, °C), and extraction time (t, min). The PDPs visualize the marginal effect of each predictor after averaging out the influence of other variables, offering insight into the modeled relationships learned by RF models trained on different datasets (original, synthetic, and mixed).
For the model trained on the original dataset, PDPs (Figure 18) revealed generally smooth but shallow response curves, indicating low model sensitivity to input variation. The results of C (%) and RL/S showed modest negative slopes for most targets, particularly TPC and FRAP, suggesting that higher solvent concentrations and solvent ratios tended to reduce predicted antioxidant values. Temperature exhibited a more pronounced positive relationship with TPC and FRAP, while time (t) influenced predictions positively in nearly all cases, though effects on AAC appeared erratic. These modest trends are consistent with the limited representational power of the model trained on the small original dataset, which likely restricted the learned functional relationships to low-variance approximations.
In contrast, the synthetic-trained model (Figure 19) exhibited steeper and more structured response patterns across nearly all features. For example, C (%) and RL/S displayed strong nonlinear declines for TPC, FRAP, and DPPH, whereas temperature exhibited a clear sigmoidal increase, particularly evident for DPPH. The time variable contributed positively to predictions, with increasing slopes across most plots. These sharper transitions suggest that the synthetic model was able to capture more defined patterns between variables, although some over-smoothing and artifacts were apparent in less reliable targets such as AAC, where PDP curves were more erratic.
The model trained on the mixed dataset (Figure 20) presented the most coherent and biologically plausible trends. The PDPs across targets demonstrated well-defined, monotonic relationships. C (%) consistently showed negative associations with antioxidant capacity, while RL/S exhibited declining effects, particularly for FRAP and TPC. Temperature maintained a strong positive relationship, especially for DPPH and FRAP. Extraction time (t) showed clear and mostly monotonic increases in partial dependence, indicating its significant influence on yield-related outcomes. Compared to the other models, the mixed-trained model exhibited more stable and interpretable PDPs, which is consistent with its higher predictive accuracy and generalization capacity.
Overall, the partial dependence analysis reinforces the conclusion that models trained on a mixture of real and synthetic data achieve superior learning of underlying relationships between process variables and antioxidant outcomes. The synthetic model was able to capture sharp feature effects, but only the mixed model exhibited smooth, consistent trends aligned with expected extraction behavior. These findings support the inclusion of controlled synthetic data to augment and stabilize learning in low-sample experimental contexts.
Despite these encouraging results, several limitations remain. The relatively small size of the original dataset constrains the ability of the models to fully capture the underlying variability of the extraction process. The current synthetic data generation approach, while effective, is based on RF predictions with Gaussian noise, which may not fully reflect the true complexity of the system. Future work should explore more advanced generative modeling techniques, such as Variational Autoencoders or Generative Adversarial Networks, to enhance the diversity and realism of synthetic data. Additionally, expanding the experimental dataset and incorporating additional physicochemical or spectral variables could further improve model accuracy and generalization, supporting more robust and transferable AI-assisted extraction models.

3.11. Model Prediction Accuracy at Optimal Conditions

To assess how accurately RF models predicted antioxidant outcomes under optimal extraction conditions, we compared model predictions against experimentally reported values for four antioxidant metrics: TPC, FRAP, DPPH, and AAC. Figure 21 presents a comparative bar plot showing the predicted values from each model along with their absolute errors, visualized as value ± error for each target.
The model trained solely on the original data exhibited the highest prediction error across all targets. For TPC, the model predicted 55.9 compared to the reported 78.2, yielding an absolute error of 22.4. Similarly, FRAP and DPPH were underestimated by 768.3 and 480.7, corresponding to absolute errors of 146.5 and 398.0, respectively. AAC was also significantly underestimated with 9.9, with an error of 7.9. These results highlight the limitations of training exclusively on small, original datasets, especially when attempting to extrapolate to optimal regions.
The model trained on the synthetic data demonstrated improved performance across most targets. TPC was predicted at 60.4 with error = 17.8, while FRAP reached 832.8 with error = 82.0, and DPPH was predicted at 463.3 with error = 415.4. Although the DPPH prediction remained notably poor, AAC predictions were marginally closer to the reported value, with a predicted value of 9.1 with an error = 8.8.
The model trained on the mixed dataset yielded the most accurate and consistent predictions across all targets. TPC was predicted at 64.0 with error = 14.2, and FRAP at 865.2 with error = 49.6, showing strong alignment with the experimental values. DPPH prediction also improved, with a value of 541.8 and an error of 336.9. AAC was predicted at 9.2 with an error of 8.7. While some targets, particularly DPPH and AAC, remained challenging to predict accurately, the mixed model consistently outperformed the other models in terms of proximity to the experimental data.
In summary, these results confirm that training on a mixed dataset comprising both real and synthetic samples enhances the model’s ability to generalize and make reliable predictions under optimal conditions. The mixed model showed the lowest aggregate absolute error and the closest alignment to the reported values across all antioxidant targets. Models trained solely on synthetic data exhibited poor generalization to real samples, underscoring the need for empirical grounding. In addition, predictions for specific targets such as AAC and DPPH remained less accurate, likely due to high intrinsic variability or limited representation within the training data.

4. Conclusions

This study successfully optimized the extraction conditions for rosemary leaves using PLE, demonstrating its potential as an efficient and environmentally friendly technique for recovering bioactive compounds. The optimized PLE parameters yielded extracts rich in antioxidants, polyphenols, and ascorbic acid, highlighting the suitability of PLE for such applications.
In parallel, ML approaches were applied to model and predict antioxidant responses based on extraction parameters. While the RF-based mixed model showed improved generalization compared to models trained solely on experimental or synthetic data, the small sample size and reliance on data augmentation introduce limitations to the robustness of the conclusions. In particular, feature importance results may be influenced by synthetic data, and test set performance indicates that further model refinement is needed to ensure reliable predictions in real-world scenarios.
Future research should focus on expanding experimental datasets to improve model training and validation, applying advanced generative methods for more realistic data augmentation, and conducting real-world testing of model predictions. Additionally, evaluating model transferability across different plant matrices and extraction systems, as well as validating the process at industrial scale, will be important steps toward broader practical implementation of AI-assisted extraction optimization.

Author Contributions

Conceptualization, V.A. and S.I.L.; methodology, V.A., software, V.A.; validation, V.A.; formal analysis, M.M. and V.A.; investigation, M.M. and E.B.; resources, S.I.L.; data curation, M.M. and K.G.L.; writing—original draft preparation, M.M. and K.G.L.; writing—review and editing, V.A., M.M., K.G.L., E.B. and S.I.L.; visualization, M.M. and K.G.L.; supervision, V.A. and S.I.L.; project administration, S.I.L.; funding acquisition, S.I.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ghasemzadeh Rahbardar, M.; Hosseinzadeh, H. Toxicity and Safety of Rosemary (Rosmarinus officinalis): A Comprehensive Review. Naunyn. Schmiedebergs Arch. Pharmacol. 2025, 398, 9–23. [Google Scholar] [CrossRef] [PubMed]
  2. Aamer, H.A.; Al-Askar, A.A.; Gaber, M.A.; El-Tanbouly, R.; Abdelkhalek, A.; Behiry, S.; Elsharkawy, M.M.; Kowalczewski, P.Ł.; El-Messeiry, S. Extraction, Phytochemical Characterization, and Antifungal Activity of Salvia Rosmarinus Extract. Open Chem. 2023, 21, 20230124. [Google Scholar] [CrossRef]
  3. de Macedo, L.M.; Santos, É.M.d.; Militão, L.; Tundisi, L.L.; Ataide, J.A.; Souto, E.B.; Mazzola, P.G. Rosemary (Rosmarinus officinalis L., Syn Salvia rosmarinus Spenn.) and Its Topical Applications: A Review. Plants 2020, 9, 651. [Google Scholar] [CrossRef]
  4. Ahmed, H.M.; Babakir-Mina, M. Investigation of Rosemary Herbal Extracts (Rosmarinus officinalis) and Their Potential Effects on Immunity. Phytother. Res. 2020, 34, 1829–1837. [Google Scholar] [CrossRef]
  5. González-Minero, F.J.; Bravo-Díaz, L.; Ayala-Gómez, A. Rosmarinus officinalis L. (Rosemary): An Ancient Plant with Uses in Personal Healthcare and Cosmetics. Cosmetics 2020, 7, 77. [Google Scholar] [CrossRef]
  6. Aziz, E.; Batool, R.; Akhtar, W.; Shahzad, T.; Malik, A.; Shah, M.A.; Iqbal, S.; Rauf, A.; Zengin, G.; Bouyahya, A.; et al. Rosemary Species: A Review of Phytochemicals, Bioactivities and Industrial Applications. S. Afr. J. Bot. 2022, 151, 3–18. [Google Scholar] [CrossRef]
  7. Dhenge, R.; Rinaldi, M.; Ganino, T.; Lacey, K. Recent and Novel Technology Used for the Extraction and Recovery of Bioactive Compounds from Fruit and Vegetable Waste. In Wealth out of Food Processing Waste; CRC Press: Boca Raton, FL, USA, 2024; ISBN 978-1-00-326919-9. [Google Scholar]
  8. Christoforidis, A.; Mantiniotou, M.; Athanasiadis, V.; Lalas, S.I. Caffeine and Polyphenolic Compound Recovery Optimization from Spent Coffee Grounds Utilizing Pressurized Liquid Extraction. Beverages 2025, 11, 74. [Google Scholar] [CrossRef]
  9. Galanakis, C.M.; Aldawoud, T.M.S.; Rizou, M.; Rowan, N.J.; Ibrahim, S.A. Food Ingredients and Active Compounds against the Coronavirus Disease (COVID-19) Pandemic: A Comprehensive Review. Foods 2020, 9, 1701. [Google Scholar] [CrossRef]
  10. Martins, R.; Barbosa, A.; Advinha, B.; Sales, H.; Pontes, R.; Nunes, J. Green Extraction Techniques of Bioactive Compounds: A State-of-the-Art Review. Processes 2023, 11, 2255. [Google Scholar] [CrossRef]
  11. Kim, H.C.; Ha, S.Y.; Yang, J.-K. Antioxidant Activity of Ultrasonic Assisted Ethanol Extract of Ainsliaea acerifolia and Prediction of Antioxidant Activity with Machine Learning. BioResours. 2024, 19, 7637–7652. [Google Scholar] [CrossRef]
  12. Kunjiappan, S.; Ramasamy, L.K.; Kannan, S.; Pavadai, P.; Theivendren, P.; Palanisamy, P. Optimization of Ultrasound-Aided Extraction of Bioactive Ingredients from Vitis Vinifera Seeds Using RSM and ANFIS Modeling with Machine Learning Algorithm. Sci. Rep. 2024, 14, 1219. [Google Scholar] [CrossRef] [PubMed]
  13. Kalompatsios, D.; Athanasiadis, V.; Mantiniotou, M.; Lalas, S.I. Optimization of Ultrasonication Probe-Assisted Extraction Parameters for Bioactive Compounds from Opuntia macrorhiza Using Taguchi Design and Assessment of Antioxidant Properties. Appl. Sci. 2024, 14, 10460. [Google Scholar] [CrossRef]
  14. Shehata, E.; Grigorakis, S.; Loupassaki, S.; Makris, D.P. Extraction Optimisation Using Water/Glycerol for the Efficient Recovery of Polyphenolic Antioxidants from Two Artemisia Species. Sep. Purif. Technol. 2015, 149, 462–469. [Google Scholar] [CrossRef]
  15. Athanasiadis, V.; Chatzimitakos, T.; Mantiniotou, M.; Kalompatsios, D.; Bozinou, E.; Lalas, S.I. Investigation of the Polyphenol Recovery of Overripe Banana Peel Extract Utilizing Cloud Point Extraction. Eng 2023, 4, 3026–3038. [Google Scholar] [CrossRef]
  16. Freedman, D.A. Statistical Models: Theory and Practice, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009; ISBN 978-0-511-81586-7. [Google Scholar]
  17. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  18. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  19. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  20. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  21. Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  22. Chai, T.; Draxler, R.R. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)?—Arguments against Avoiding RMSE in the Literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  23. Kvålseth, T.O. Cautionary Note about R2. Am. Stat. 1985, 39, 279–285. [Google Scholar] [CrossRef]
  24. Thoo, Y.; Ng, S.Y.; Khoo, M.; Mustapha, W.; Ho, C. A Binary Solvent Extraction System for Phenolic Antioxidants and Its Application to the Estimation of Antioxidant Capacity in Andrographis paniculata Extracts. Int. Food Res. J. 2013, 20, 1103–1111. [Google Scholar]
  25. Mantiniotou, M.; Athanasiadis, V.; Kalompatsios, D.; Lalas, S.I. Optimization of Carotenoids and Other Antioxidant Compounds Extraction from Carrot Peels Using Response Surface Methodology. Biomass 2025, 5, 3. [Google Scholar] [CrossRef]
  26. Segovia, F.J.; Luengo, E.; Corral-Pérez, J.J.; Raso, J.; Almajano, M.P. Improvements in the Aqueous Extraction of Polyphenols from Borage (Borago officinalis L.) Leaves by Pulsed Electric Fields: Pulsed Electric Fields (PEF) Applications. Ind. Crops Prod. 2015, 65, 390–396. [Google Scholar] [CrossRef]
  27. More, P.R.; Jambrak, A.R.; Arya, S.S. Green, Environment-Friendly and Sustainable Techniques for Extraction of Food Bioactive Compounds and Waste Valorization. Trends Food Sci. Technol. 2022, 128, 296–315. [Google Scholar] [CrossRef]
  28. Athanasiadis, V.; Mantiniotou, M.; Kalompatsios, D.; Makrygiannis, I.; Alibade, A.; Lalas, S.I. Evaluation of Antioxidant Properties of Residual Hemp Leaves Following Optimized Pressurized Liquid Extraction. AgriEngineering 2025, 7, 1. [Google Scholar] [CrossRef]
  29. Zhou, J.; Wang, M.; Carrillo, C.; Zhu, Z.; Brncic, M.; Berrada, H.; Barba, F.J. Impact of Pressurized Liquid Extraction and pH on Protein Yield, Changes in Molecular Size Distribution and Antioxidant Compounds Recovery from Spirulina. Foods 2021, 10, 2153. [Google Scholar] [CrossRef] [PubMed]
  30. Anticona, M.; Blesa, J.; Lopez-Malo, D.; Frigola, A.; Esteve, M.J. Effects of Ultrasound-Assisted Extraction on Physicochemical Properties, Bioactive Compounds, and Antioxidant Capacity for the Valorization of Hybrid Mandarin Peels. Food Biosci. 2021, 42, 101185. [Google Scholar] [CrossRef]
  31. Athanasiadis, V.; Chatzimitakos, T.; Mantiniotou, M.; Kalompatsios, D.; Kotsou, K.; Makrygiannis, I.; Bozinou, E.; Lalas, S.I. Optimization of Four Different Rosemary Extraction Techniques Using Plackett–Burman Design and Comparison of Their Antioxidant Compounds. Int. J. Mol. Sci. 2024, 25, 7708. [Google Scholar] [CrossRef]
  32. Antony, A.; Farid, M. Effect of Temperatures on Polyphenols during Extraction. Appl. Sci. 2022, 12, 2107. [Google Scholar] [CrossRef]
  33. Hashem Hashempur, M.; Zareshahrabadi, Z.; Shenavari, S.; Zomorodian, K.; Rastegari, B.; Karami, F. Deep Eutectic Solvent-Based Extraction of Rosemary Leaves: Optimization Using Central Composite Design and Evaluation of Antioxidant and Antimicrobial Activities. New J. Chem. 2025, 49, 4495–4505. [Google Scholar] [CrossRef]
  34. Kabubii, Z.N.; Mbaria, J.M.; Mathiu, P.M.; Wanjohi, J.M.; Nyaboga, E.N. Diet Supplementation with Rosemary (Rosmarinus officinalis L.) Leaf Powder Exhibits an Antidiabetic Property in Streptozotocin-Induced Diabetic Male Wistar Rats. Diabetology 2024, 5, 12–25. [Google Scholar] [CrossRef]
  35. Xie, L.; Li, Z.; Li, H.; Sun, J.; Liu, X.; Tang, J.; Lin, X.; Xu, L.; Zhu, Y.; Liu, Z.; et al. Fast Quantitative Determination of Principal Phenolic Anti-Oxidants in Rosemary Using Ultrasound-Assisted Extraction and Chemometrics-Enhanced HPLC–DAD Method. Food Anal. Methods 2023, 16, 386–400. [Google Scholar] [CrossRef]
  36. Samer, A.J.A.; Samarrai, O.R.A. Phytochemical Screening, Antioxidant Power and Quantitative Analysis by HPLC of Isolated Flavonoids from Rosemary. Samarra J. Pure Appl. Sci. 2025, 6, 15–29. Available online: https://sjpas.com/index.php/sjpas/article/view/861 (accessed on 10 June 2025).
  37. Baptista, A.; Menicucci, F.; Brunetti, C.; dos Santos Nascimento, L.B.; Pasquini, D.; Alderotti, F.; Detti, C.; Ferrini, F.; Gori, A. Unlocking the Hidden Potential of Rosemary (Salvia rosmarinus Spenn.): New Insights into Phenolics, Terpenes, and Antioxidants of Mediterranean Cultivars. Plants 2024, 13, 3395. [Google Scholar] [CrossRef]
  38. Miljanović, A.; Dent, M.; Grbin, D.; Pedisić, S.; Zorić, Z.; Marijanović, Z.; Jerković, I.; Bielen, A. Sage, Rosemary, and Bay Laurel Hydrodistillation By-Products as a Source of Bioactive Compounds. Plants 2023, 12, 2394. [Google Scholar] [CrossRef]
Figure 1. Histograms with kernel density estimates showing the distribution of extraction parameters and antioxidant response variables for rosemary samples. While extraction settings are uniformly distributed due to the design structure, antioxidant responses such as FRAP and DPPH exhibit skewed distributions, indicating variability in sample performance.
Figure 1. Histograms with kernel density estimates showing the distribution of extraction parameters and antioxidant response variables for rosemary samples. While extraction settings are uniformly distributed due to the design structure, antioxidant responses such as FRAP and DPPH exhibit skewed distributions, indicating variability in sample performance.
Processes 13 01879 g001
Figure 2. Combined boxplot of all extraction parameters and antioxidant response variables. Antioxidant metrics (FRAP and DPPH) exhibit larger spread and more outliers than extraction parameters, indicating greater variability and sensitivity to experimental conditions.
Figure 2. Combined boxplot of all extraction parameters and antioxidant response variables. Antioxidant metrics (FRAP and DPPH) exhibit larger spread and more outliers than extraction parameters, indicating greater variability and sensitivity to experimental conditions.
Processes 13 01879 g002
Figure 3. Plot (A) is a heatmap of raw data values for extraction parameters and antioxidant responses across 17 experimental conditions. Brighter colors indicate higher absolute values. The most intense FRAP activity was observed in sample #13. Plot (B) is a z-score normalized heatmap of the same dataset. Standardization enables direct comparison across all features. Red shades represent values above the mean, while blue shades indicate values below the mean. Strong positive deviations in FRAP and TPC are evident in sample #13.
Figure 3. Plot (A) is a heatmap of raw data values for extraction parameters and antioxidant responses across 17 experimental conditions. Brighter colors indicate higher absolute values. The most intense FRAP activity was observed in sample #13. Plot (B) is a z-score normalized heatmap of the same dataset. Standardization enables direct comparison across all features. Red shades represent values above the mean, while blue shades indicate values below the mean. Strong positive deviations in FRAP and TPC are evident in sample #13.
Processes 13 01879 g003
Figure 4. Pearson correlation matrix among extraction parameters and antioxidant response variables. Values range from −1 (perfect negative correlation) to +1 (perfect positive correlation). Strong internal consistency was observed among antioxidant metrics (TPC, FRAP, and DPPH), while concentration (C, %) negatively correlates with TPC and FRAP.
Figure 4. Pearson correlation matrix among extraction parameters and antioxidant response variables. Values range from −1 (perfect negative correlation) to +1 (perfect positive correlation). Strong internal consistency was observed among antioxidant metrics (TPC, FRAP, and DPPH), while concentration (C, %) negatively correlates with TPC and FRAP.
Processes 13 01879 g004
Figure 5. TPC, showing the (A) covariation of X1 (ethanol concentration, C, % v/v) and X2 (liquid-to-solid ratio, R, mL/g); (B) covariation of X1 and X4 (extraction time, t, min); (C) covariation of X2 and X3 (extraction temperature, T, °C); and (D) covariation of X2 and X4. FRAP, showing the (E) covariation of X1 and X2; (F) covariation of X1 and X3; and (G) covariation of X2 and X4. DPPH, showing the (H) covariation of X1 and X2; (I) covariation of X1 and X3; (J) covariation of X1 and X4; (K) covariation of X2 and X3; and (L) covariation of X2 and X4. AAC, showing the (M) covariation of X1 and X3.
Figure 5. TPC, showing the (A) covariation of X1 (ethanol concentration, C, % v/v) and X2 (liquid-to-solid ratio, R, mL/g); (B) covariation of X1 and X4 (extraction time, t, min); (C) covariation of X2 and X3 (extraction temperature, T, °C); and (D) covariation of X2 and X4. FRAP, showing the (E) covariation of X1 and X2; (F) covariation of X1 and X3; and (G) covariation of X2 and X4. DPPH, showing the (H) covariation of X1 and X2; (I) covariation of X1 and X3; (J) covariation of X1 and X4; (K) covariation of X2 and X3; and (L) covariation of X2 and X4. AAC, showing the (M) covariation of X1 and X3.
Processes 13 01879 g005
Figure 6. Pareto plots illustrating the significance of parameter estimates for the PLE technique across TPC (A), FRAP (B), DPPH (C), and AAC (D), with a pink asterisk marking significant values (p < 0.05). Positive estimates are shown in blue, while negative ones are represented in red.
Figure 6. Pareto plots illustrating the significance of parameter estimates for the PLE technique across TPC (A), FRAP (B), DPPH (C), and AAC (D), with a pink asterisk marking significant values (p < 0.05). Positive estimates are shown in blue, while negative ones are represented in red.
Processes 13 01879 g006
Figure 7. PCA for the measured variables. Each X variable is presented with a blue color.
Figure 7. PCA for the measured variables. Each X variable is presented with a blue color.
Processes 13 01879 g007
Figure 8. Plot (A) shows the optimization of the PLE technique for rosemary extracts through the partial least squares (PLS) prediction profiler and a desirability function with extrapolation control. Plot (B) shows the Variable Importance Plot (VIP) graph, showing the VIP values for each predictor variable in the PLE technique, with a red dashed line marking the 0.8 significance level.
Figure 8. Plot (A) shows the optimization of the PLE technique for rosemary extracts through the partial least squares (PLS) prediction profiler and a desirability function with extrapolation control. Plot (B) shows the Variable Importance Plot (VIP) graph, showing the VIP values for each predictor variable in the PLE technique, with a red dashed line marking the 0.8 significance level.
Processes 13 01879 g008
Figure 9. Histograms of the performance of our six ML models on our original training set.
Figure 9. Histograms of the performance of our six ML models on our original training set.
Processes 13 01879 g009
Figure 10. Histograms of the performance of our six ML models on our original test set.
Figure 10. Histograms of the performance of our six ML models on our original test set.
Processes 13 01879 g010
Figure 11. Histograms of the performance of our six ML models on our synthetic training set.
Figure 11. Histograms of the performance of our six ML models on our synthetic training set.
Processes 13 01879 g011
Figure 12. Histograms of the performance of our six ML models on our synthetic test set.
Figure 12. Histograms of the performance of our six ML models on our synthetic test set.
Processes 13 01879 g012
Figure 13. Histograms of the performance of our six ML models on our mixed training set.
Figure 13. Histograms of the performance of our six ML models on our mixed training set.
Processes 13 01879 g013
Figure 14. Histograms of the performance of our six ML models on our mixed test set.
Figure 14. Histograms of the performance of our six ML models on our mixed test set.
Processes 13 01879 g014
Figure 15. Cross-dataset evaluation of the RF regression model trained on the original, synthetic, and mixed datasets. Each group of bars represents the performance metrics MAE, MSE, RMSE, and R2 obtained when the model trained on one dataset was tested on another. Results highlight the generalization ability of each training regime across data domains. Models trained on the mixed dataset showed superior cross-domain performance, particularly when evaluated on both the original and synthetic test sets.
Figure 15. Cross-dataset evaluation of the RF regression model trained on the original, synthetic, and mixed datasets. Each group of bars represents the performance metrics MAE, MSE, RMSE, and R2 obtained when the model trained on one dataset was tested on another. Results highlight the generalization ability of each training regime across data domains. Models trained on the mixed dataset showed superior cross-domain performance, particularly when evaluated on both the original and synthetic test sets.
Processes 13 01879 g015
Figure 16. Feature importance scores from RF models trained on the (left) original, (middle) synthetic, and (right) mixed datasets. Importance scores reflect each feature’s contribution to predicting antioxidant targets (TPC, FRAP, DPPH, and AAC). Feature influence varies significantly depending on the dataset used for training.
Figure 16. Feature importance scores from RF models trained on the (left) original, (middle) synthetic, and (right) mixed datasets. Importance scores reflect each feature’s contribution to predicting antioxidant targets (TPC, FRAP, DPPH, and AAC). Feature influence varies significantly depending on the dataset used for training.
Processes 13 01879 g016
Figure 17. Predicted vs. actual standardized values for antioxidant targets using RF-based models trained on the original (top row), synthetic (middle row), and mixed (bottom row) datasets. Diagonal red dashed lines represent the ideal 1:1 relationship. R2 values quantify model fit for each case.
Figure 17. Predicted vs. actual standardized values for antioxidant targets using RF-based models trained on the original (top row), synthetic (middle row), and mixed (bottom row) datasets. Diagonal red dashed lines represent the ideal 1:1 relationship. R2 values quantify model fit for each case.
Processes 13 01879 g017
Figure 18. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF original model.
Figure 18. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF original model.
Processes 13 01879 g018
Figure 19. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF synthetic model.
Figure 19. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF synthetic model.
Processes 13 01879 g019
Figure 20. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF mixed model.
Figure 20. Partial dependence plots showing the marginal effect of each feature on antioxidant predictions from the RF mixed model.
Processes 13 01879 g020
Figure 21. Comparison of predicted versus reported antioxidant values under optimal extraction conditions using RF-based models trained on the original, synthetic, and mixed datasets. Bars represent predicted values for TPC, FRAP, DPPH, and AAC, with numeric labels showing predicted value ± absolute error relative to the experimentally reported values. The mixed model produced the most accurate predictions overall, with reduced errors across most targets.
Figure 21. Comparison of predicted versus reported antioxidant values under optimal extraction conditions using RF-based models trained on the original, synthetic, and mixed datasets. Bars represent predicted values for TPC, FRAP, DPPH, and AAC, with numeric labels showing predicted value ± absolute error relative to the experimentally reported values. The mixed model produced the most accurate predictions overall, with reduced errors across most targets.
Processes 13 01879 g021
Table 1. Summary of machine learning regression models and the corresponding hyperparameters tuned during grid search. Default parameters were used for Linear Regression, while regularization strengths (α) and core structural parameters (number of estimators, max depth, min samples split, tree depth, and learning rate) were varied for the other models.
Table 1. Summary of machine learning regression models and the corresponding hyperparameters tuned during grid search. Default parameters were used for Linear Regression, while regularization strengths (α) and core structural parameters (number of estimators, max depth, min samples split, tree depth, and learning rate) were varied for the other models.
ModelTuned ParametersValues Tested
Linear RegressionNone-
Ridgeα[0.1, 1.0, 10.0]
Lassoα[0.001, 0.01, 0.1, 1.0]
RFn_estimators, max_depth, min_samples_split[100, 200], [None, 10], [2, 5]
GBn_estimators, learning_rate, max_depth[100, 200], [0.05, 0.1], [3, 5]
AdaBoostn_estimators, learning_rate[50, 100], [0.5, 1.0]
Table 2. Experimental results for the four examined independent variables and the dependent variables’ responses to the PLE technique.
Table 2. Experimental results for the four examined independent variables and the dependent variables’ responses to the PLE technique.
Design PointIndependent VariablesActual PLE Responses *
C (%) (X1)RL/S (mL/g) (X2)T (°C) (X3)t (min) (X4)TPCFRAPDPPHAAC
17510402037.20612.74243.817.47
275551302026.84366.39152.889.46
325251601044.43740.02389.299.95
4055130537.03451.66230.2212.02
501040528.84331.6867.143.14
6100251602518.96214.42108.739.90
7100701601034.14413.30174.8214.65
810025701011.26119.6866.678.35
92570701041.10467.27185.2611.05
1025701602547.41615.91251.7515.56
117510130560.26286.87388.6911.89
12055402031.88233.39100.865.45
130101302075.851089.81816.366.38
14755540530.09393.47150.1610.24
1510070702519.17169.9385.1113.34
162525702549.64696.79427.468.44
1750401001552.69893.17497.1112.45
* Values represent the mean of triplicate determinations; TPC, total polyphenol content (in mg GAE/g dw); FRAP, ferric reducing antioxidant power (in μmol AAE/g dw); DPPH, antiradical activity (in μmol AAE/g dw); AAC, ascorbic acid content (in mg/g dw).
Table 3. Analysis of variance (ANOVA) is performed for the response surface quadratic polynomial model in the context of the PLE technique.
Table 3. Analysis of variance (ANOVA) is performed for the response surface quadratic polynomial model in the context of the PLE technique.
FactorTPCFRAPDPPHAAC
Stepwise regression coefficients
Intercept42.88 *715.8 *356.2 *11.04 *
X1—ethanol concentration−10.7 *−170 *−96.8 *1.204 *
X2—liquid-to-solid ratio−5.69−81.1−103 *2.283 *
X3—temperature7.18396.73 *93.54 *1.956 *
X4—extraction time0.85466.939.24-
X1X23.746166.6 *92.01-
X1X3-−114−54.1−1.02
X1X4−8.19-−91-
X2X3−4.4-−52.1-
X2X4−4.51−131 *−82.1-
X3X4----
X12−13.8−268 *−103−1.69
X2215.5-79.73-
X32−8.56-−130-
X42-−131--
ANOVA
F-value4.5077.7164.36210.65
p-Value0.0545 ns0.0067 *0.0833 ns0.0006 *
R20.9080.9080.9290.829
Adjusted R20.7070.7910.7160.751
RMSE8.747121.9104.71.633
PRESS3913363,375580,65357.61
CV42.4655.9377.0332.76
DF (total)16161616
* The values significantly affected responses at a probability level of 95% (p < 0.05). TPC, total polyphenol content; FRAP, ferric reducing antioxidant power; DPPH, antiradical activity; AAC, ascorbic acid content; ns, non-significant; F-value, test for comparing model variance with residual (error) variance; p-Value, probability of seeing the observed F-value if the null hypothesis is true; RMSE, root mean square error; PRESS, predicted residual error sum of squares; CV, coefficient of variation; DF, degree of freedom.
Table 4. Maximum predicted responses and optimum extraction conditions for the dependent variables.
Table 4. Maximum predicted responses and optimum extraction conditions for the dependent variables.
ParametersIndependent VariablesDesirabilityStepwise Regression
C (%) (X1)RL/S (mL/g) (X2)T (°C) (X3)t (min) (X4)
TPC (mg GAE/g dw)2510130200.929276.19 ± 17.92
FRAP (μmol AAE/g dw)2510160200.99071117.68 ± 212.88
DPPH (μmol AAE/g dw)010130200.8728799.03 ± 271.68
AAC (mg/g dw)5070160-0.931815.28 ± 2.23
Table 5. Multivariate correlation analysis of measured variables.
Table 5. Multivariate correlation analysis of measured variables.
ResponsesTPCFRAPDPPHAAC
TPC0.77960.9129−0.0738
FRAP 0.8549−0.0139
DPPH −0.0722
AAC
Table 6. The partial least squares (PLS) prediction profiler determined the maximum desirability for all variables under optimal extraction condition for PLE technique.
Table 6. The partial least squares (PLS) prediction profiler determined the maximum desirability for all variables under optimal extraction condition for PLE technique.
ParametersIndependent VariablesDesirabilityPartial Least Squares (PLS) RegressionExperimental Values
C (%) (X1)RL/S (mL/g) (X2)T (°C) (X3)t (min) (X4)
TPC (mg GAE/g dw)2510160250.842980.2978.23 ± 0.63
FRAP (μmol AAE/g dw)1118.22914.82 ± 1.53
DPPH (μmol AAE/g dw)817.20878.7 ± 6.34
AAC (mg/g dw)10.2017.83 ± 0.25
Table 7. Optimal extraction conditions for polyphenolic compounds using the PLE technique of rosemary extraction.
Table 7. Optimal extraction conditions for polyphenolic compounds using the PLE technique of rosemary extraction.
Polyphenolic CompoundConcentration (μg/g dw)
Catechin239 ± 11
Quercetin 3-D-galactoside1114 ± 45
Luteolin-7-glucoside236 ± 13
Kaempferol-3-glucoside442 ± 19
Hesperidin3711 ± 96
Rosmarinic acid1570 ± 58
Apigenin245 ± 7
Kaempferol72 ± 2
Rosmanol731 ± 27
Carnosic acid889 ± 32
Total identified9250 ± 311
Values represent the mean of triplicate determinations ± standard deviation.
Table 8. Equation of calibration curves for each compound identified through HPLC-DAD.
Table 8. Equation of calibration curves for each compound identified through HPLC-DAD.
Polyphenolic Compounds (Standards)Equation (Linear)R2Retention Time (min)UVmax (nm)
Catechiny = 11,920.78x − 128.190.99720.933278
Quercetin 3-D-galactosidey = 41,489.69x − 35,577.550.99334.598257
Luteolin-7-glucosidey = 34,875.94x − 16,827.360.99935.949347
Kaempferol-3-glucosidey = 50,916.85x − 42,398.830.99638.724265
Hesperidiny = −30,502.75x − 30,502.750.99540.249283
Rosmarinic acidy = 50,281.27x − 113,633.310.99541.644329
Apigeniny = 95,483.52x − 5214.260.99855.860227
Kaempferoly = 93,385.02x − 18,613.030.99956.883265
Rosmanoly = 5509.45x − 10,899.230.99465.924288
Carnosic acidy = 8883.45x + 101,483.300.99277.870284
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mantiniotou, M.; Athanasiadis, V.; Liakos, K.G.; Bozinou, E.; Lalas, S.I. Artificial Intelligence and Extraction of Bioactive Compounds: The Case of Rosemary and Pressurized Liquid Extraction. Processes 2025, 13, 1879. https://doi.org/10.3390/pr13061879

AMA Style

Mantiniotou M, Athanasiadis V, Liakos KG, Bozinou E, Lalas SI. Artificial Intelligence and Extraction of Bioactive Compounds: The Case of Rosemary and Pressurized Liquid Extraction. Processes. 2025; 13(6):1879. https://doi.org/10.3390/pr13061879

Chicago/Turabian Style

Mantiniotou, Martha, Vassilis Athanasiadis, Konstantinos G. Liakos, Eleni Bozinou, and Stavros I. Lalas. 2025. "Artificial Intelligence and Extraction of Bioactive Compounds: The Case of Rosemary and Pressurized Liquid Extraction" Processes 13, no. 6: 1879. https://doi.org/10.3390/pr13061879

APA Style

Mantiniotou, M., Athanasiadis, V., Liakos, K. G., Bozinou, E., & Lalas, S. I. (2025). Artificial Intelligence and Extraction of Bioactive Compounds: The Case of Rosemary and Pressurized Liquid Extraction. Processes, 13(6), 1879. https://doi.org/10.3390/pr13061879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop